Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support EventSource/long polling in scrapyd #55

Closed
nramirezuy opened this issue Jul 25, 2014 · 8 comments
Closed

support EventSource/long polling in scrapyd #55

nramirezuy opened this issue Jul 25, 2014 · 8 comments

Comments

@nramirezuy
Copy link

Moved from: scrapy/scrapy#335
Originally by: @graingert

When I run a long running crawl task, I'd like to be notified when it's done through the API. I know this is possible with the callbacks, but I'd rather it was built in.

The solution to this in REST/HTTP is the EventSource API.

https://developer.mozilla.org/en-US/docs/Server-sent_events/Using_server-sent_events

@dfockler
Copy link
Contributor

Could something like this work through a register functionality, where the client can request that scrapyd push results or a "finished" or "error" message about a specific job using SSE? I would be interested in trying to write code for it.

@graingert
Copy link

Yes you subscribe/register to an EventSource resource with a GET request.

@aleroot
Copy link

aleroot commented Nov 22, 2017

I'm interested in the enhancement implementation too, can I help with the implementation ?

@Digenis
Copy link
Member

Digenis commented Nov 22, 2017

@aleroot, sure.
My concern with the ideas in this issue is that they are browser-oriented.
I think scrapyd should have an API helpful for everyone, not just webapp developers.
Let's first see in which scrapyd component it'd be best implemented.
We'll later decide how users will interact with it.

@aleroot
Copy link

aleroot commented Nov 22, 2017

@Digenis In the meantime I've added a simple status.json to being able to poll the status of a specific job from the client . After knowing that the status id finished I can then call the website url : http://localhost:6800/items/myscraper/myspider/c9514588cf9511e7a2140242ac110003.jl

See Pull request #260 .

@nitinheadrun
Copy link

currently I am checking at every 5 secs to get the result of the scrapyd job.. Can't we just use the crawler callback method from scrapyd server?

@jpmckinney
Copy link
Contributor

jpmckinney commented Sep 23, 2021

The solution in my projects is to use the spider_closed signal in a Scrapy extension. This seems like the appropriate level, rather than it being Scrapyd's responsibility.

@jpmckinney
Copy link
Contributor

I will close for now as wontfix, since to push messages to known subscribers, you can simply create a Scrapy extension, and implement the push logic in spider_closed.

If there is a real need for pushing messages to unknown subscribers (like pub/sub model), then that would have to be done at the Scrapyd level, I believe, since the subscriber could subscribe after the crawl started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants