$ mkvirtualenv -a $(pwd) ikeascraper
$ pip install -r requirements.txt
$ python manage.py makemigrations
$ pythonmanage.py migrate
The webdriver uses headless Chrome. It looks for chromedriver at /usr/local/bin.
This project uses a custom crawl Django command that crawls the IKEA site with Selenium.
The script does three additional things:
- Prints the scraped data to stdout
- Saves the scraped data to the database
- Saves the scraped data to a JSON file
You can provide a filename when you run the command. Otherwise, the file will be items.json
$ workon ikeascraper
$ python manage.py crawl [filename]
$ python manage.py crawl sofas.json
Cleaning database...
Scraping items...
11%|███████████████████▌ | 1/9 [00:14<01:54, 0.07it/s]
[
...
{'colors': [],
'imageUrl': 'https://www.ikea.com/es/es/images/products/kivik-chaise-longue-hillared-anthracite__0479950_PE619104_S5.JPG?f=xs',
'name': 'KIVIK',
'type': 'Chaiselongue'}]
Dumped 428 items to sofas.json
The source code of the crawl command is here.