support EventSource/long polling in scrapyd #55

nramirezuy · 2014-07-25T22:28:27Z

Moved from: scrapy/scrapy#335
Originally by: @graingert

When I run a long running crawl task, I'd like to be notified when it's done through the API. I know this is possible with the callbacks, but I'd rather it was built in.

The solution to this in REST/HTTP is the EventSource API.

https://developer.mozilla.org/en-US/docs/Server-sent_events/Using_server-sent_events

dfockler · 2015-01-20T21:19:52Z

Could something like this work through a register functionality, where the client can request that scrapyd push results or a "finished" or "error" message about a specific job using SSE? I would be interested in trying to write code for it.

graingert · 2015-01-20T21:40:03Z

Yes you subscribe/register to an EventSource resource with a GET request.

aleroot · 2017-11-22T15:04:39Z

I'm interested in the enhancement implementation too, can I help with the implementation ?

Digenis · 2017-11-22T16:07:55Z

@aleroot, sure.
My concern with the ideas in this issue is that they are browser-oriented.
I think scrapyd should have an API helpful for everyone, not just webapp developers.
Let's first see in which scrapyd component it'd be best implemented.
We'll later decide how users will interact with it.

aleroot · 2017-11-22T17:31:31Z

@Digenis In the meantime I've added a simple status.json to being able to poll the status of a specific job from the client . After knowing that the status id finished I can then call the website url : http://localhost:6800/items/myscraper/myspider/c9514588cf9511e7a2140242ac110003.jl

See Pull request #260 .

nitinheadrun · 2020-07-16T11:12:07Z

currently I am checking at every 5 secs to get the result of the scrapyd job.. Can't we just use the crawler callback method from scrapyd server?

jpmckinney · 2021-09-23T15:20:55Z

The solution in my projects is to use the spider_closed signal in a Scrapy extension. This seems like the appropriate level, rather than it being Scrapyd's responsibility.

jpmckinney · 2021-09-23T23:24:51Z

I will close for now as wontfix, since to push messages to known subscribers, you can simply create a Scrapy extension, and implement the push logic in spider_closed.

If there is a real need for pushing messages to unknown subscribers (like pub/sub model), then that would have to be done at the Scrapyd level, I believe, since the subscriber could subscribe after the crawl started.

nramirezuy mentioned this issue Jul 25, 2014

support EventSource/long polling in scrapyd scrapy/scrapy#335

Closed

Digenis added the type: enhancement label May 21, 2016

Digenis mentioned this issue Jan 5, 2017

Job Status Notification. #199

Closed

Digenis mentioned this issue Apr 20, 2019

can you add some callback Mechanisms when finish a job? #293

Closed

This was referenced Apr 9, 2021

Run a script when job finishes #395

Closed

'heavy' spider arguments yield OSError in schedule.json action #381

Closed

Digenis added the topic: scheduling label Apr 13, 2021

jpmckinney added the status: wontfix label Sep 23, 2021

jpmckinney closed this as completed Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support EventSource/long polling in scrapyd #55

support EventSource/long polling in scrapyd #55

nramirezuy commented Jul 25, 2014

dfockler commented Jan 20, 2015

graingert commented Jan 20, 2015

aleroot commented Nov 22, 2017

Digenis commented Nov 22, 2017

aleroot commented Nov 22, 2017

nitinheadrun commented Jul 16, 2020

jpmckinney commented Sep 23, 2021 •

edited

Loading

jpmckinney commented Sep 23, 2021

support EventSource/long polling in scrapyd #55

support EventSource/long polling in scrapyd #55

Comments

nramirezuy commented Jul 25, 2014

dfockler commented Jan 20, 2015

graingert commented Jan 20, 2015

aleroot commented Nov 22, 2017

Digenis commented Nov 22, 2017

aleroot commented Nov 22, 2017

nitinheadrun commented Jul 16, 2020

jpmckinney commented Sep 23, 2021 • edited Loading

jpmckinney commented Sep 23, 2021

jpmckinney commented Sep 23, 2021 •

edited

Loading