dev-resources.site
for different kinds of informations.
Extracting data from e-commerce websites
Basic Web Scraping is one of the essentials for a Data Analyst. The ability to get your own data for Project Purpose is an undervalued task.
I recently scraped some data from 4 big art shops (websites) in Nigeria and I would like to share the codes (ChatGPT included codes) for learning purposes(Other Data analyst who might find it useful).
The first website is Crafts Village I scarped the Art-tools category.
code for scraping the website
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
# Initialize lists to store the data
product_names = []
prices = []
# Scrape all 6 pages
for page in range(1, 7):
url = f"https://craftsvillage.com.ng/product-category/art-tools/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
# Find the relevant HTML elements for product information
products = soup.find_all("li", class_="product")
# Extract data from each product element
for product in products:
# Product name
name_element = product.find("a", class_="woocommerce-LoopProduct-link")
name = name_element.text.replace("\n", "").strip()
name = re.sub(r"[₦\,|–]", "", name) # Remove unwanted characters
product_names.append(name)
# Price
price_element = product.find("bdi")
price = price_element.text if price_element else None
prices.append(price)
# Create a Pandas DataFrame from the scraped data
data = {
"Product Name": product_names,
"Price": prices
}
df = pd.DataFrame(data)
# Remove "\n\n\n\n\n" from "Product Name" column
df["Product Name"] = df["Product Name"].str.replace("\n", "")
# Display the Data Frame
print(df)
To get the name element class, I inspected the name class from my browser by putting the cursor on the product name right click my mouse pad and clicking on inspect.
I also did same for the price too
The code above extracted the product name and prices from all the 6 pages in the Art tool category.
Here is how I scraped information from Crafties Hobbies
import requests
from bs4 import BeautifulSoup
import pandas as pd
base_url = 'https://craftieshobbycraft.com/product-category/painting-drawing/page/{}/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
# Create lists to store data
categories = []
product_names = []
product_prices = []
# Iterate over each page
for page in range(1, 8):
url = base_url.format(page)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
category_elements = soup.find_all('p', class_='category uppercase is-smaller no-text-overflow product-cat op-7')
product_names_elements = soup.find_all('a', class_='woocommerce-LoopProduct-link woocommerce-loop-product__link')
product_prices_elements = soup.find_all('bdi')
for category_element, product_name_element, product_price_element in zip(category_elements, product_names_elements, product_prices_elements):
category = category_element.get_text(strip=True)
product_name = product_name_element.get_text(strip=True)
product_price = product_price_element.get_text(strip=True)
categories.append(category)
product_names.append(product_name)
product_prices.append(product_price)
# Create a pandas DataFrame
data = {
'Category': categories,
'Product Name': product_names,
'Product Price': product_prices
}
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
Here is how I scraped data from Kaenves store
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Create empty lists to store the data
product_names = []
prices = []
# Iterate through each page
for page in range(1, 4):
# Send a GET request to the page
url = f"https://www.kaenves.store/collections/floating-wood-frame?page={page}"
response = requests.get(url)
# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Find all span elements with the specified class
price_elements = soup.find_all('span', class_='price-item price-item--regular')
name_elements = soup.find_all('h3', class_='card__heading h5')
# Extract the prices and product names
for price_element, name_element in zip(price_elements, name_elements):
price = price_element.get_text(strip=True)
name = name_element.get_text(strip=True)
product_names.append(name)
prices.append(price)
# Create a pandas DataFrame
data = {'Product Name': product_names, 'Price': prices}
df = pd.DataFrame(data)
# Save the DataFrame as a CSV file
df.to_csv('paperandboard.csv', index=False)
Here is how I scraped data from Art Easy
import requests
from bs4 import BeautifulSoup
import pandas as pd
prices = []
product_names = []
# Iterate over all 2 pages
for page_num in range(1, 3):
url = f"https://arteasy.com.ng/product-category/canvas-surfaces/page/{page_num}/"
# Send a GET request to the URL
response = requests.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.text, "html.parser")
# Find all the span elements with class "price"
product_prices = [span.get_text(strip=True) for span in soup.find_all("span", class_="price")]
# Find all the h3 elements with class "product-title"
product_names += [product_name.get_text(strip=True) for product_name in soup.find_all("h3", class_="product-title")]
# Add the prices to the list
prices += product_prices
# Check if the lengths of product_names and prices are equal
if len(product_names) == len(prices):
# Create a pandas DataFrame
data = {"Product Name": product_names, "Price": prices}
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
else:
print("Error: The lengths of product_names and prices are not equal.")
If you want to reuse this code ensure to change the URL to your preferred e-commerce website and also change the class to your URL product name and product price class
These informations scraped can be used for the following;
Price comparison: You can use the scraped data to compare prices of products across different websites. This can help you find the best deal on the product you are looking for.
Product research: You can use the scraped data to research products. This can help you learn more about a product's features, specifications, and reviews.
Market analysis: You can use the scraped data to analyze the market for a particular product. This can help you identify trends and opportunities.
Product recommendations: You can use the scraped data to recommend products to users. This can help you increase sales and improve customer satisfaction.
Featured ones: