Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow manual URL list to be passed to crawler #107

Merged
merged 4 commits into from
Dec 7, 2021
Merged

Conversation

stooit
Copy link
Contributor

@stooit stooit commented Dec 6, 2021

This adds an option to the crawler to parse a file containing a list of URLs to add to the queue.

Usage: quant crawl --urls-file=/path/to/file.json

Where file.json is a simple array of absolute URLs:

[
  "https://www.google.com/url-a",
  "https://www.google.com/url-b"
  ...
]

This is useful for allowing orphaned URLs to be added to ongoing crawl operations.

@stooit stooit merged commit 75b6616 into main Dec 7, 2021
@stooit stooit deleted the feature/crawl-urls-file branch December 7, 2021 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant