-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to normally stop and purge system job with csi plugin #11758
Comments
Hi @ygersie and thanks for providing such a detailed reproduction. I ran through this locally and got the same results as you detailed. These results are very unexpected. |
I suspect this is related to another CSI plugin counts issue; I'm taking a pass through our open CSI issues over the next few weeks and will look at this as part of that work. |
Noting here that I've marked #11114 as a duplicate of this one. #10073 may also ultimately be a duplicate but I'll leave that open for the time being as the cause is subtly different. It looks like there are two parts to this:
|
Ok, so following #12027, #10073, and #12078 we've almost got this one resolved. There's just one bug left, which is that we can't deregister the job because it's looking to delete the plugin that doesn't exist:
|
Fixed in #12114! That'll ship in Nomad 1.3.0 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
v1.2.3
Operating system and Environment details
MacOS nomad agent -dev setup
Issue
Unable to stop and purge a
failed
system job which has a csi_plugin stanza and unexpected start of the job when-purge
is passed.Reproduction steps
Run below example job.
Now wait until the job transitions to the failed state, then stop + purge the job.
Now check the status of the job:
This should've returned a not found error but it's still there and the
Desired
column states run. Re-runningnomad job stop -purge example
doesn't change the outcome until a GC has been run. Now trigger a GC withnomad system gc
and rerun the stop -purge again, the result becomes:Instead of stopping it actually recreates the allocation again..
The text was updated successfully, but these errors were encountered: