You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Expose "scraper" object in the handler functions
@select(css="a")defget_link(element, scraper): # <-- pass scraper objecturl=element.get_attribute("href")
scraper.follow(url) # <-- add to the URLs that will be scraped by the scraper return {"url": url}
### Include the URLs in return
@select(css="a")defget_link(element):
url=element.get_attribute("href")
return {"url": url}, [url, ...] # <- return a tuple of dict result and list of URLs
Final implementation
Just use --follow-urls or pass follow_urls=True to run(). #90
This is less complicated when compared to managing the URLs to crawl by yourself inside the code.
WARNING: Do not use until #27 is implemented as this option will crawl indefinitely and will not save the data.
The text was updated successfully, but these errors were encountered:
Possible ways to implement a simple Spider
### Expose "scraper" object in the handler functions### Include the URLs inreturn
Final implementation
Just use
--follow-urls
or passfollow_urls=True
torun()
. #90This is less complicated when compared to managing the URLs to crawl by yourself inside the code.
WARNING: Do not use until #27 is implemented as this option will crawl indefinitely and will not save the data.
The text was updated successfully, but these errors were encountered: