-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How would you kill a specific job? #124
Comments
Also wondering... how would you define a callback for your jobs to gracefully handle TERM signals? When I'm using Resque, I would handle Resque::TermException in the on_failure hook and put any cleanup/shutdown logic in there. I took a quick look at the source and I do see SolidQueue::Processes::GracefulTerminationRequested but trying to rescue that via rescue_from in my job didn't seem to do anything. |
Hey @wheee! These are great questions, and I'm afraid Solid Queue doesn't have a way to support your scenario:
As you saw,
That's right. The supervisor would notice the worker has exited and would start it again. If the worker didn't have time to finish the job within the configured If you have any ideas or suggestions, feel free to contribute them! 🙏 |
Hey @rosa, thanks for responding!
I think there's merit in being able to bubble up the 'shutdown' signal/exception to the jobs (and not just the workers). Although, I can see why it may not be as useful unless the ability to kill a job via signals was a possibility. That being said, I do have scenarios where I leverage these jobs as long running processes - that remain up and running until explicitly shut down by user command. These jobs typically follow a pub/sub paradigm and can take commands from the UI while streaming data from an external data source. In these cases, it would be nice to be able to clean up gracefully when a TERM signal is received within the allotted duration before the QUIT signal is issued. EDIT: the fact that we have a configuration for shutdown_timeout would suggest that jobs should have the ability to respond to the TERM signal... otherwise, why provide the extra time before the QUIT signal is sent? As a side note, while exploring GoodJob in more detail, I did run across this useful bit: Happy to report that this works pretty nicely with SolidQueue, so while it may not be possible to explicitly kill jobs via signals, at least the use of Timeout provides assurances that jobs that get stuck for whatever reason will eventually timeout and can be handled gracefully and return to the pool. And more importantly, allow me to decide whether I wish to fail the job or retry, etc. |
To give time to the jobs in-flight to finish, and not take any other jobs. If we don't provide any extra time, any job in-flight will be stopped right away. With the extra time, the worker knows it shouldn't pick up any more jobs, just wait until the ones running finish and then finish. |
For gems like https://github.com/Shopify/job-iteration it is useful to have a way to know a graceful shutdown was initiated, so that it can stop after the current iteration is finished and do it's own graceful shutdown (pushing the job back on the queue with the persisted progress). It interacts in various ways with background queues: it uses the Sidekiq IMO a callback is a bit more flexible since for other use cases one might not be able to/want to poll for this. |
Some news about this one: First, you can now simply kill the worker that's running a process via a Second, I have another issue to support Shopify's job-iteration gem explicitly: #282 (comment), and I plan to get this in before version 1.0, so I'm going to close this issue in favour of that one. Thanks again, everyone, for the ideas and help here! 🙏 |
@rosa will this be wrapped in to the mission_control-jobs gem...with maybe a "STOP" button? |
@atstockland I was hoping to do that using #337, but not sure yet! In any case, yes, I'm planning to add this to Mission Control 😊 I'll be back to work on Mission Control in about 3 weeks. |
Thank you so much for all your hardwork!!!!! |
Scenario would be a long-running job that is taking too long and the user wishes to kill it and not have it restarted.
If I were to send the TERM signal to the supervisor pid, I've noticed it has this weird side effect of restarting everything (not just solid_queue) in my procfile (when using foreman).
I also noticed if I were send a TERM signal to the worker (assuming it was a 1 thread/1 process worker), then the worker would get restarted and pick up the same job again.
I suppose it's possible to modify the Job in the solid_queue_jobs table such that its finished_at is set then send the TERM signal to the worker but that seems hack-ish.
Also, what if the worker has 5 threads and they're all processing jobs that I don't want to kill?
Would appreciate some direction on this, thanks!
The text was updated successfully, but these errors were encountered: