dev-resources.site
for different kinds of informations.
Web scraping Apple App Store Search with Nodejs
What will be scraped
๐Note: in this blog post I'll show you how to scrape Apple App Store Search and receive the result exactly like on Apple iMac, because search results on Mac are absolutely different than results on PC. The screenshots below show you the difference:
Full code
If you don't need an explanation, have a look at the full code example in the online IDE
import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
const engine = "apple_app_store"; // search engine
const resultsLimit = 50; // hardcoded limit for demonstration purpose
const params = {
api_key: process.env.API_KEY, //your API key from serpapi.com
term: "image viewer", // Parameter defines the query you want to search
country: "us", // Parameter defines the country to use for the search
lang: "en-us", // Parameter defines the language to use for the search
device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop", "tablet", or "mobile" (default)
num: "10", // Parameter defines the number of results you want to get per each page
page: 0, // Parameter is used to get the items on a specific page
};
const getResults = async () => {
const results = [];
while (true) {
const json = await getJson(engine, params);
if (json.organic_results) {
results.push(...json.organic_results);
params.page += 1;
} else break;
if (results.length >= resultsLimit) break;
}
return results;
};
getResults().then((result) => console.dir(result, { depth: null }));
Why use Apple App Store Search Scraper API from SerpApi?
Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:
- Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
- No need to create a parser from scratch and maintain it.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation if there's a need to extract data in large amounts faster.
Head to the Playground for a live and interactive demo.
Preparation
First, we need to create a Node.js* project and add npm
packages serpapi
and dotenv
.
To do this, in the directory with our project, open the command line and enter:
$ npm init -y
And then:
$ npm i serpapi dotenv
*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.
SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.
dotenv package is a zero-dependency module that loads environment variables from a
.env
file intoprocess.env
.
Next, we need to add a top-level "type" field with a value of "module" in our package.json
file to allow using ES6 modules in Node.JS:
For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.
Code explanation
First, we need to import dotenv
from dotenv
library and call config()
method, then import getJson
from serpapi
library:
import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
-
config()
will read your.env
file, parse the contents, assign it toprocess.env
, and return an Object with aparsed
key containing the loaded content or anerror
key if it failed. -
getJson()
allows you to get a JSON response based on search parameters.
Next, we write search engine
, set how many results we want to receive (resultsLimit
constant) and write the necessary search parameters for making a request:
const engine = "apple_app_store"; // search engine
const resultsLimit = 50; // hardcoded limit for demonstration purpose
const params = {
api_key: process.env.API_KEY, //your API key from serpapi.com
term: "image viewer", // Parameter defines the query you want to search
country: "us", // Parameter defines the country to use for the search
lang: "en-us", // Parameter defines the language to use for the search
device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop", "tablet", or "mobile" (default)
num: "10", // Parameter defines the number of results you want to get per each page
page: 0, // Parameter is used to get the items on a specific page
};
You can use the next search params:
-
api_key
parameter defines the SerpApi private key to use. -
term
parameter defines the query you want to search. You can use any search term that you would use in a regular App Store search. -
country
parameter defines the country to use for the search. It's a two-letter country code. (e.g.,us
(default) for the United States,uk
for United Kingdom, orfr
for France). Head to the Apple Regions for a full list of supported Apple Regions. -
lang
parameter defines the language to use for the search. It's a four-letter country code. (e.g.,en-us
(default) for the English,fr-fr
for French, oruk-ua
for Ukranian). Head to the Apple Languages for a full list of supported Apple Languages. -
num
parameter defines the number of results you want to get per each page. It defaults to10
. Maximum number of results you can get per page is200
. Any number greater than maximum number will default to200
. -
page
parameter is used to get the items on a specific page. (e.g.,0
(default) is the first page of results,1
is the 2nd page of results,2
is the 3rd page of results, etc.). -
disallow_explicit
parameter defines the filter for disallowing explicit apps. It defaults tofalse
. -
property
parameter allows to search the property of an app.developer
allows searching the developer title of an app ( e.g.,property: "developer"
andterm: "Coffee"
gives apps with "Coffee" in their developer's name. (Ex:Coffee Inc.
). -
no_cache
parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set tofalse
(default) to allow results from the cache, ortrue
to disallow results from the cache.no_cache
andasync
parameters should not be used together. -
async
parameter defines the way you want to submit your search to SerpApi. It can be set tofalse
(default) to open an HTTP connection and keep it open until you got your search results, ortrue
to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results.async
andno_cache
parameters should not be used together.async
should not be used on accounts with Ludicrous Speed enabled. -
device
parameter defines the device to use to get the results. It can be set todesktop
to use a Mac App Store,tablet
to use an iPad App Store, ormobile
(default) to use an iPhone App Store.
Next, we declare the function getResult
that gets data from the page and return it:
const getResults = async () => {
...
};
In this function we need to declare an empty results
array and using while
loop get json
with results, add organic_results
from each page and set next page index (to params.page
value).
If there is no more results on the page or if the number of received results more thanresultsLimit
we stop the loop (using break
) and return an array with results:
const results = [];
while (true) {
const json = await getJson(engine, params);
if (json.organic_results) {
results.push(...json.organic_results);
params.page += 1;
} else break;
if (results.length >= resultsLimit) break;
}
return results;
And finally, we run the getResults
function and print all the received information in the console with the console.dir
method, which allows you to use an object with the necessary parameters to change default output options:
getResults().then((result) => console.dir(result, { depth: null }));
Output
[
{
"position": 1,
"id": 1507782672,
"title": "Pixea",
"bundle_id": "imagetasks.Pixea",
"version": "1.4",
"vpp_license": true,
"age_rating": "4+",
"release_note": "- New icon - macOS Big Sur support - Universal Binary - Bug fixes and improvements",
"seller_link": "https://www.imagetasks.com",
"minimum_os_version": "10.12",
"description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them. Supported formats: JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives. Export formats: JPEG, JPEG-2000, PNG, TIFF, BMP. Found a bug? Have a suggestion? Please, send it to [email protected] Follow us on Twitter @imagetasks!",
"link": "https://apps.apple.com/us/app/pixea/id1507782672?mt=12&uo=4",
"serpapi_product_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
"serpapi_reviews_link": "https://serpapi.com/search.json?country=us&engine=apple_reviews&page=1&product_id=1507782672",
"release_date": "2020-04-20 07:00:00 UTC",
"price": {
"type": "Free"
},
"rating": [
{
"type": "All Times",
"rating": 0,
"count": 0
}
],
"genres": [
{
"name": "Photo & Video",
"id": 6008,
"primary": true
},
{
"name": "Graphics & Design",
"id": 6027,
"primary": false
}
],
"developer": {
"name": "ImageTasks Inc",
"id": 450316587,
"link": "https://apps.apple.com/us/developer/id450316587"
},
"size_in_bytes": 5838181,
"supported_languages": ["EN"],
"screenshots": {
"general": [
{
"link": "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/b1/8c/fb/b18cfb80-cb5c-d67d-2edc-ee1f6666e012/35b8d5a7-b493-4a80-bdbd-3e9d564601dd_Pixea-1.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/96/08/83/9608834d-3d2b-5c0b-570c-f022407ff5cc/1836573e-1b6a-421c-b654-6ae2f915d755_Pixea-2.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/58/fd/db/58fddb5d-9480-2536-8679-92d6b067d285/98e22b63-1575-4ee6-b08d-343b9e0474ea_Pixea-3.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is2-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/c3/f3/f3/c3f3f3b5-deb0-4b58-4afc-79073373b7b9/28f51f38-bc59-4a61-a5a1-bff553838267_Pixea-4.jpg/800x500bb.jpg",
"size": "800x500"
}
]
},
"logos": [
{
"size": "60x60",
"link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/60x60bb.png"
},
{
"size": "512x512",
"link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/512x512bb.png"
},
{
"size": "100x100",
"link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/100x100bb.png"
}
]
},
...and other results
]
Links
- Code in the online IDE
- Apple App Store Search Scraper API documentation
- Apple App Store Search Scraper API Playground
If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.
Add a Feature Request๐ซ or a Bug๐
Featured ones: