Logo

dev-resources.site

for different kinds of informations.

Scrape Unscrapeable Amazon Dataset with BrightData, React.js and Node.js

Published at
12/28/2024
Categories
brightdatachallenge
devchallenge
webdev
api
Author
Alex Anie
Scrape Unscrapeable Amazon Dataset with BrightData, React.js and Node.js

This is a submission for the Bright Data Web Scraping Challenge: Scrape Data from Complex, Interactive Websites

What I Built

This project uses Brightdata to scrape data from Amazon and return the data output on the page. You can search anything you want and expect to see it load on the page as long what you search can be found on amazon.

Demo
The project uses two different GitHub repo. One for the frontend the other for the backend.

Image description

How I Used Bright Data

Project is built using bright data.

I used Brightdata Scraping browser to retrieve the data set from amazon.

import 'dotenv/config'
import { Router } from 'express';
import puppeteer from 'puppeteer-core';
import process from 'node:process';

const router = Router();

// Scraping logic using Puppeteer and BrightData
const scrapeData = async (searchTerm) => {
  const BROWSER_WS = process.env.BROWSER_WS; // set your bright data proxy credential here
  const URL = "https://www.amazon.com";

  const browser = await puppeteer.connect({
    browserWSEndpoint: BROWSER_WS,
  });

 // ... some code here

  await browser.close();
  return products;
};

// Define the API route for scraping
router.get('/scrape', async (req, res) => {
     // ... some code here
  }
});

export default router;

The Brightdata scraping browser uses puppeteer-core to scrape amazon data and return the contents as a json respones.

I used express.js to create an api endpoint and server for the frontend appication which is a React and vite.js setup.

import express from 'express';
import scrapeRouter from './index.js'; // Import the logic from index.js
import cors from 'cors';

const app = express();

// allow all origin
app.use(cors());

// Use the scrapeRouter for /api routes
app.use('/api', scrapeRouter);

// Set the port
const PORT = 4040;

// Start the server
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

Tailwindcss is used for the staling and React Icons for the Icons. Other Stacks are listed below.

Deployment

The backend express app is deployed seperately

  • Backend deployed on Render.com
  • Frontend deployed on Netlify.com

Stacks Used

  • React
  • Vite
  • Tailwindcss
  • React Icons
  • Axios
  • Cors
  • Brightdata (for proxy and data fetching)
  • Render (for api hosting)
  • Dotenv (load env)
  • express (to setup server and routes)
  • nodemon (local dev)
  • puppeteer-core (Scraping data from Amazon)

Featured ones: