-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Fix possible deadlock in PBS-based scheduler backends when a job is cancelled immediately after submission #1301
[bugfix] Fix possible deadlock in PBS-based scheduler backends when a job is cancelled immediately after submission #1301
Conversation
* Create the stdout, stderr files if the don't exist to make the torque scheduler treat the job as finished.
28ca7e4
to
bc47e46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is more fundamental, because this deadlock could happen in normal operation. I guess that a better solution would be to fix that in the PBS/Torque backends, such that if the job is cancelled, we should not do the additional check for its stdout/stderr files to mark it as done.
Then maybe open a new issue to find a fundamental solution, while temporarily have this one so that it does not block the CI? |
No, the solution is quite easy. Set an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
to make the torque scheduler treat the job as finished.
Fixes #1298