Amazon Cell Phones Reviews

🐱 Scrape (un)locked cell phone ratings and reviews on Amazon 📱

Features ✨

Scrapes basic metadata with ratings and reviews
Scrape all or specific brands
Scrape unlocked, locked, or both cell phones
Use multiple Puppeteer pages as workers

Read more on personalizing setting at the configuration section.

Download Data 📫

You can download pre-scraped datasets at Kaggle.

Manual Scrape 🔧

Requirements 📃

Node.js
Yarn (optional)

Packages Used 📦

puppeteer for browser-based scraping
prettier for formatting source codes
ts-node for running TypeScript scripts

Steps 👨‍🔬

Preparation

Make sure the dependencies are downloaded by running npm install or yarn.
(Optional) Copy config.default.ts (this file is ignored with git) to config.ts and customize config variables on config.ts.

Using Visual Studio Code

Open the project directory in Visual Studio Code.
Select and execute Scrape Search Results in the launch options on the Debug tab (exported to ./data/yyyymmdd-results.csv).
Then select and execute Scrape Item Reviews (exported to ./data/yyyymmdd-reviews.csv).

Using Command Line

Run npm run scrape:items or yarn scrape:items first to scrape initial item results (exported to ./data/yyyymmdd-results.csv).
Then run npm run scrape:reviews or yarn scrape:reviews to scrape item reviews (exported to ./data/yyyymmdd-reviews.csv).

Available Scripts 📝

scrape:items

Scrapes and saves entry results for review scraping.
scrape:reviews

Scrapes and saves entry reviews based on scrape:items data.
format

Format all .ts files.
format:data

Format .json files in /data.

Configuration 🛠

brands - string[]

Self explanatory.

Defaults to ten major phone manufacturers, set to [] (empty array) to disable brand filtering and select all available brands.

Note that by selecting all brands will not assign what brand it is, probably will implement this in future versions.
brandKeywords - {brand: string, keywords: string[]}

Brand alternative names or keywords for brand assignment.

Since the search page does not explicitly tell what brand it is, after scraping the results it determines from the items' URL and title by comparing brands and brandKeywords values.
categories - 'unlocked' | 'locked' | 'both'

Also self explanatory.

Whether scrape unlocked, locked, or both categories. If both, workers will scrape unlocked results first then locked results.
numberOfWorkers - number

Number of active 'workers' or pages to use for scraping.

Note that Amazon's server will assume too many requests or workers as an unusual traffic and will return a captcha page instead of the intended result page

Default Values

{
  brands: [
    'ASUS',
    'Apple',
    'Google',
    'HUAWEI',
    'Motorola',
    'Nokia',
    'OnePlus',
    'Samsung',
    'Sony',
    'Xiaomi',
  ],
  brandKeywords: [
    { brand: 'Apple', keywords: ['iPhone'] },
    { brand: 'Google', keywords: ['Pixel'] },
    { brand: 'HUAWEI', keywords: ['Honor'] },
    { brand: 'Motorola', keywords: ['Moto'] },
    { brand: 'Samsung', keywords: ['Haven'] },
    { brand: 'Sony', keywords: ['Xperia'] },
  ],
  categories: 'both',
  numberOfWorkers: 8,
}

License 👮‍♂️

CC0 1.0 Universal

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
.vscode		.vscode
data		data
src		src
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
config.default.ts		config.default.ts
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Cell Phones Reviews

Features ✨

Download Data 📫

Manual Scrape 🔧

Requirements 📃

Packages Used 📦

Steps 👨‍🔬

Preparation

Using Visual Studio Code

Using Command Line

Available Scripts 📝

Configuration 🛠

Default Values

License 👮‍♂️

About

Sponsor this project

Languages

License

grikomsn/amazon-cell-phones-reviews

Folders and files

Latest commit

History

Repository files navigation

Amazon Cell Phones Reviews

Features ✨

Download Data 📫

Manual Scrape 🔧

Requirements 📃

Packages Used 📦

Steps 👨‍🔬

Preparation

Using Visual Studio Code

Using Command Line

Available Scripts 📝

Configuration 🛠

Default Values

License 👮‍♂️

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages