-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: always run alloc cleanup hooks on final update #15855
Conversation
b1803a6
to
27e6363
Compare
27e6363
to
6d6d5df
Compare
This PR fixes a bug where alloc pre-kill hooks were not run in the edge case where there are no live tasks remaining, but it is also the final update to process for the (terminal) allocation. We need to run cleanup hooks here, otherwise they will not run until the allocation gets garbage collected (i.e. via Destroy()), possibly at a distant time in the future. Fixes #15477
6d6d5df
to
672dc16
Compare
client/allocrunner/alloc_runner.go
Outdated
// there are no live runners left | ||
|
||
// run AR pre-kill hooks if this alloc is terminal; any post-stop | ||
// tasks would regularly run in this state anyway (?) | ||
if done { | ||
ar.preKillHooks() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will erroneously run prekill hooks on client agent shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a guard with ar.isShuttingDown()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if preKillHooks()
shouldn't be run during agent shutdown, would it make sense to put the guard inside of preKillHooks()
instead of here? more broadly, if this func is a potential foot-shoot, is there a way to make it less foot-shoot-able?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question @gulducat. I'm afraid it needs to be up to the caller to decide whether prekill hooks should run or not because only the callers knows whether a kill event (eg a task dying) was triggered or not. Once your in prekill, a shutdown may be issued concurrently, but that doesn't change the fact that the prekill was triggered by a kill event. So I think this is the right approach as @shoenig's code is able to differentiate a kill event from an agent shutdown event.
That being said every operation in Nomad needs to be crash safe which is why prekill hooks are also run when garbage collecting dead allocations. So prekill should always eventually be called on every terminal allocation regardless of the triggering event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand things now 😅
preKillHooks
is only called by killTasks()
and killTasks
is called in a few places:
destroyImpl
which is quite expected.handleTaskStateUpdates
which also makes sense since we may need to kill all tasks if one of the tasks fails.handleAllocUpdate
which again makes sense since the alloc is stopped we need to kill the tasks.
The part that was confusing me was why handleAllocUpdate
was not triggering the preKillHooks
: the allocation does eventually die and is marked with ClientStatus: failed
.
The reason is that this is a client-only alloc update. When watching allocations the client ignores changes that it already knows about and so the ar.Update()
method is not called when the allocation fails client-side.
We don't want to destroy the alloc runner so the only place left to call the hooks is in handleTaskStateUpdates
like @shoenig did.
The second thing that was confusing to me is why we only check for killEvent
if there are live runners, I feel like we should always be checking that and calling killTasks()
if not nil? Something like https://github.com/hashicorp/nomad/compare/wip-luiz-kill-ar (this probably breaks other stuff since task lifecycle is finicky 😅).
All of this to say that I think we can use if killEvent != nil
here to guard the preKillHooks
so it matches the other conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💀 ✅
This PR fixes a bug where alloc pre-kill hooks were not run in the
edge case where there are no live tasks remaining, but it is also
the final update to process for the (terminal) allocation. We need
to run cleanup hooks here, otherwise they will not run until the
allocation gets garbage collected (i.e. via Destroy()), possibly
at a distant time in the future.
Fixes #15477