-
Notifications
You must be signed in to change notification settings - Fork 0
haywood/Web-Crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
crawler.py usage: crawler.py pages children timelimit Given a seed file called seeds, crawler.py will crawl the web, starting with the urls therein. Each unique url it finds will be added to its database. If a url is not unique, that url's inbound links count will be incremented by 1. The script requires MongoDB and PyMongo. pages the minimum number of pages to be crawled. children the maximum number of child processes to generate for page processing. timelimit an integer number of seconds. The timelimit is applied to the crawling phase itself and takes precedence over pages.
About
A web crawler implemented in Python
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published