Setup

$ mkvirtualenv -a $(pwd) ikeascraper
$ pip install -r requirements.txt
$ python manage.py makemigrations 
$ pythonmanage.py migrate

The webdriver uses headless Chrome. It looks for chromedriver at /usr/local/bin.

Usage

This project uses a custom crawl Django command that crawls the IKEA site with Selenium.

The script does three additional things:

Prints the scraped data to stdout
Saves the scraped data to the database
Saves the scraped data to a JSON file

You can provide a filename when you run the command. Otherwise, the file will be items.json

$ workon ikeascraper
$ python manage.py crawl [filename]

$ python manage.py crawl sofas.json

Cleaning database...
Scraping items...
 11%|███████████████████▌                                                                                                                                                            | 1/9 [00:14<01:54,  0.07it/s]

[
 ...
 {'colors': [],
  'imageUrl': 'https://www.ikea.com/es/es/images/products/kivik-chaise-longue-hillared-anthracite__0479950_PE619104_S5.JPG?f=xs',
  'name': 'KIVIK',
  'type': 'Chaiselongue'}]
Dumped 428 items to sofas.json

The source code of the crawl command is here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
main		main
scraperproject		scraperproject
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Usage

About

Releases

Packages

Languages

License

WorkShoft/ikeascraper

Folders and files

Latest commit

History

Repository files navigation

Setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages