-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a way to cleanly shut down nodes #2052
Comments
Consul has the |
@groggemans Nomad 0.8 added advanced node draining features. Some useful links: Node drain command |
I know, and it solves/implements te draining part, but then the node should still gracefully leave the cluster. And I think the only way to do this now is by stopping/interrupting the service (with Setting For servers there's the |
A relevant discussion happened in #4305. @insanejudge, @schmichael. Indeed draining node on shutdown is the best and service file could be adjusted to do that. However graceful restart can not be implemented in systemd service because it can not distinguish shut down from restart. Regardless, |
Not sure if this is relevant or related - but even when I have
but when I check
is that expected behavior? I would expect once the node leaves the cluster it doesn't appear in the status anymore. |
@onlyjob Does systemd allow configuring different signals for reloads, restarts, and shutdowns? If so we could use SIGHUP, SIGINT, and SIGTERM respectively to separate the shutdown behaviors. Adding APIs+CLI commands would also be useful. This is definitely something we're hoping to do, but I don't know if it will make it into 0.9.0. @dcparker88 Unfortunately |
No it doesn't... There is |
@onlyjob Nomad will continue to support inplace upgrades (restarting without draining) for at least a couple of reasons:
That being said we've definitely come close to dropping support for inplace upgrades. I could see it happening someday but for now we intend to support restarts that don't affect tasks. |
It is OK if you are committed to support restart without draining. However this is unsafe and therefore should be configurable. Moreover draining node on restart must be default behaviour. It is not OK to leave dangling VMs because they are not cheap to restart. Betting on perfect stability of the Nomad client is a strategy for the perfect world, like saying that defects in your code (will) never happen. |
@schmichael ah thanks - that makes sense then. Do you know if that's a planned feature, or should I just continue to use GC to clean out down nodes? |
We hoped shutdown improvements would land in 0.9.0, but some larger features (eg plugins) take priority so you may want to continue using GC for the time being. If they don't make it in 0.9.0, hopefully we'll get them out in a patch release.
This is the plan!
This is precisely reason 2 I gave above for supporting inplace upgrades. A guiding principle in Nomad's design is in the face of errors: do not stop user services! Nomad downtime should prevent further scheduling, but it should avoid causing service downtime as much as possible. |
Thanks. :) I think there is a flaw in this reasoning... We need to separate two issues: avoiding stopping services during normal operations and a case when Nomad client itself is restarting. What if updated Nomad disagrees with running Docker on version of API? |
Basically what we want is, if nomad itself is running into unexpected issues, leave the task runtime alone and confine nomad issue to be just nomad issue as much as possible (smallest blast radius possible). On the other hand, if it's an intentional shutdown of nomad client, provides a way to trigger a clean shutdown of task runtimes I think there might be a fine line here @schmichael Ideally, if nomad client itself crashes or shutdown b/c not operator initiated reasons, it should not trigger task shutdown. Only if it's an operator initiated shutdown, it triggers (and waits for the finish of) clean shutdown of all tasks. So would a signal be a good way to indicate it's an intentional shutdown? Like, instead of having client.drain_shutdown = true, how about client.drain_shutdown_signal = SIGINT something along that line.
|
I understand the reasoning for many people in this thread wanting this feature. It is surely a safer option in many environments. However a few comments implied that the current behaviour is always undesirable, which is not the case. There are some workloads where it absolutely makes sense to keep allocations running during a restart (or crash) of the nomad client. Assuming of course that those allocations can be re-adopted by a new nomad process. In-place upgrades and the general ease of upgrades with zero or minimal disruption in Nomad are one of the big features over Kubernetes for me. So yes, by all means a way to combine drain+shutdown/restart would be great, but not because it's the only way that makes sense. |
@tgross what discussion is needed to figure out next steps here? I would love to help move this along! |
@tgross ping |
Hey @ketzacoatl given that you already cross-linked this somewhere the Nomad team was asking for inputs, I'm sure they'll have some thoughts for you at some point. But I'm not at HashiCorp for a while now and there aren't any non-HashiCorp maintainers so pinging me probably won't help move things along. 😁 That being said, if it were up to me (and it's not!), I'd say there's not much to this issue:
I'm sure the Nomad team would be open to a patch that provides client configuration that causes the node to drain on graceful shutdown. |
@tgross apologies for the ping! |
There was a suggestion to use systemd inhibitor locks to achieve this. Noting here in case it is helpful if this gets picked up. |
Implemented (finally!) in https://github.com/hashicorp/nomhttps://github.com/hashicorp/nomad/pull/16827ad/pull/16827, which will ship in the next release of Nomad. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad v0.5.0
There doesn't appear to be a way to cleanly shut down a client node in a way that allows allocations to be moved to other nodes and also accounts for the data in sticky ephemeral disks to be migrated. I wrote a script to help my systemd service delay stopping the service until allocations have been moved, but there doesn't appear to be a way to monitor for the status of the migrated data. If the data's not moved quickly, it could be lost when the node shuts down.
Something like
nomad shutdown
that blocks until the agent is completely idle would be ideal.The text was updated successfully, but these errors were encountered: