This is a submission for the Bright Data Web Scraping Challenge: Most Creative Use of Web Data for AI Models
What I Built
A web app that aggregates industry specific news into three KPI.
The value this app provides is that users only need to take a single glance at these KPI values to see if something is going on in their industry.
I started off developing this app to solve a business problem, but figured using AI as an alternative to a more strict algorithm seemed to be a great addition.
How it works
Users will specify
- their sources (websites and selectors)
- their scores (weighed keywords)
before the app calculates three indexes:
-
Relevance Index: How relevant their sources are for their scoring (high index = better)
-
Impact Index: The impact happening in the industry right now (low index = better)
-
Industry Index: The combined result of relevance and impact (high index means there's something going on in the industry users should be aware of)
- AI will also provide a summary of the analysis as part of the result
I'm leaving out some details about prompts and scoring here, but if you're curious, you can find them in the codebase:
Demo
Codebase
You can find the repository on Github. It's written in Deno+Fresh and quickly setup, follow the readme.md
instructions to get started. I've added some sources and scorings so you can quickly get started.
Industry Watchdog
This project is a prototype project for the Bright Data challenge on dev.to. IW lets users take a quick glance at a single KPI to see if something is going on in their industry.
Getting Started
- Clone the repo
- Install Deno
- Rename
.env.example
to .env
and set your BROWSER_WS
variable
- Run
deno task start
- Navigate to
http://localhost:8000
, add your sources and scores and run the indexing process
How to use
- Remove all sources and scores
- Follow the steps on the
Home
-page
- Run the indexing process
Screenshots
Overview & starting page
Source maintenance
Scoring maintenance
How I Used Bright Data
Bright data provides security and scalability for browser scraping, which is crucial to the availability and integrity of the index data. Industry Watchdog uses Bright Data browser scraping to scrape multiple sources at once and circumvent possible Captcha issues. Using their broad proxy network ensures that critical articles are being considered for the analysis.
Basically, this project could also qualify for Prompt 2: Build a Web Scraper API to Solve Business Problems, however instead of using Brightdata's API, it's using the scraping browser.
This app proves useful for analytical firms and BI departments who use internal, as well as external data to monitor their business strategies and operations, and would like to extend their KPI collection by the Industry Index.