Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check if a spider exists before schedule it #8

Closed
wants to merge 1 commit into from
Closed

check if a spider exists before schedule it #8

wants to merge 1 commit into from

Conversation

artemdevel
Copy link
Contributor

A simple check before schedule a spider.

By default it is possible to schedule anything using Scrapyd API:
curl http://localhost:6800/schedule.json -d project=default -d spider=nospider

this command returns {"status": "ok", "jobid": "455c1444a6c611e29f650800272a6d06"}
and also a misleading log like this https://gist.github.com/artem-dev/8a567c1775ab9bb8d122 will be created (this gist contains the complete log)

With this check nothing will be scheduled and the response will be like this:
{"status": "error", "message": "spider 'nospider' not found"}

@pablohoffman
Copy link
Member

This runs get_spider_list (which is an expensive operation) on every schedule call, as opposed to just do it on deploy (addversion) calls.

I think it would be better if scrapyd caches the list of spiders somewhere, it could be on a sqlite.

@artemdevel
Copy link
Contributor Author

yeah, agree.. I'll think about sqlite approach

@pablohoffman
Copy link
Member

See followup #17

@jpmckinney jpmckinney added pr: replaced for unmerged PRs that were replaced by a PR or commit and removed topic: scheduling labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: replaced for unmerged PRs that were replaced by a PR or commit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants