Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a way to cleanly shut down nodes #2052

Closed
blalor opened this issue Dec 1, 2016 · 21 comments · Fixed by #16827
Closed

Need a way to cleanly shut down nodes #2052

blalor opened this issue Dec 1, 2016 · 21 comments · Fixed by #16827

Comments

@blalor
Copy link
Contributor

blalor commented Dec 1, 2016

Nomad v0.5.0

There doesn't appear to be a way to cleanly shut down a client node in a way that allows allocations to be moved to other nodes and also accounts for the data in sticky ephemeral disks to be migrated. I wrote a script to help my systemd service delay stopping the service until allocations have been moved, but there doesn't appear to be a way to monitor for the status of the migrated data. If the data's not moved quickly, it could be lost when the node shuts down.

Something like nomad shutdown that blocks until the agent is completely idle would be ideal.

@groggemans
Copy link
Contributor

Consul has the leave command. Would be nice to have a similar command in nomad, which would trigger a node drain, wait for it to complete, and then gracefully leave the cluster.

@preetapan
Copy link
Contributor

preetapan commented Jun 4, 2018

@groggemans Nomad 0.8 added advanced node draining features. Some useful links:

Node drain command
Blog post that explains node draining features

@groggemans
Copy link
Contributor

I know, and it solves/implements te draining part, but then the node should still gracefully leave the cluster. And I think the only way to do this now is by stopping/interrupting the service (with leave_on_terminate = true or leave_on_interrupt = true).

Setting leave_on_interrupt or leave_on_terminate to true isn't always desirable, but it should still be possible to do a graceful leave from the cli even when both options are false (default).

For servers there's the force-leave option, but for clients there's no command to do a graceful leave. A universal leave that would work for both servers and clients which also triggers a node drain seems to be missing.

@onlyjob
Copy link
Contributor

onlyjob commented Aug 15, 2018

A relevant discussion happened in #4305. @insanejudge, @schmichael.

Indeed draining node on shutdown is the best and service file could be adjusted to do that. However graceful restart can not be implemented in systemd service because it can not distinguish shut down from restart. Regardless, KillMode=control-group (default) is better than KillMode=process because the latter does not guarantee cleanup. It is important to leave no unmanaged processes behind.

@dcparker88
Copy link

Not sure if this is relevant or related - but even when I have leave_on_terminate set in my config - it doesn't seem to fully leave. I've been doing some testing with the stuff above, and I can see in my logs the node is cleanly shutting down:

nomad: ==> Caught signal: terminated
nomad: ==> Gracefully shutting down agent...
nomad[2525]: agent: requesting shutdown
nomad: 2018/09/12 16:11:52.306399 [INFO] agent: requesting shutdown
nomad: 2018/09/12 16:11:52.306468 [INFO] client: shutting down
nomad[2525]: client: shutting down
nomad: 2018/09/12 16:11:52.320998 [INFO] agent: shutdown complete
nomad[2525]: agent: shutdown complete

but when I check nomad node status it still shows down:

$ nomad node status
ID        DC                 Name   Class   Drain  Eligibility  Status
4995dacd  east     agent1     <none>  false  ineligible   down

is that expected behavior? I would expect once the node leaves the cluster it doesn't appear in the status anymore.

@schmichael
Copy link
Member

@onlyjob Does systemd allow configuring different signals for reloads, restarts, and shutdowns? If so we could use SIGHUP, SIGINT, and SIGTERM respectively to separate the shutdown behaviors. Adding APIs+CLI commands would also be useful. This is definitely something we're hoping to do, but I don't know if it will make it into 0.9.0.

@dcparker88 Unfortunately leave_on_terminate is not implemented for clients, so yes, that is expected.

@onlyjob
Copy link
Contributor

onlyjob commented Sep 12, 2018

No it doesn't... There is ExecStop but not ExecRestart... Anyway IMHO it is wrong to distinguish. Node should be drained on restart as well because it is the only safe approach. If updated executable fail to start then system will end up with dangled unaccounted services.

@schmichael
Copy link
Member

@onlyjob Nomad will continue to support inplace upgrades (restarting without draining) for at least a couple of reasons:

  1. Some jobs are expensive to restart/migrate (QEMU VMs)
  2. We do not want to tie the lifetime/stability of the Nomad client agent to all of the tasks it runs. We try to isolate defects in our code from affecting user services.

That being said we've definitely come close to dropping support for inplace upgrades. I could see it happening someday but for now we intend to support restarts that don't affect tasks.

@onlyjob
Copy link
Contributor

onlyjob commented Sep 13, 2018

It is OK if you are committed to support restart without draining. However this is unsafe and therefore should be configurable. Moreover draining node on restart must be default behaviour. It is not OK to leave dangling VMs because they are not cheap to restart.
It is a classic "speed over safety" dilemma.

Betting on perfect stability of the Nomad client is a strategy for the perfect world, like saying that defects in your code (will) never happen.
One day something unforeseen will happen on architecture that your CI does not cover and client will fail to start for whatever reason - could be low memory condition for example - how do you know if there will be enough memory available to start Nomad if it doesn't terminate its jobs?

http://thecodelesscode.com/case/96

@dcparker88
Copy link

@schmichael ah thanks - that makes sense then. Do you know if that's a planned feature, or should I just continue to use GC to clean out down nodes?

@schmichael
Copy link
Member

@dcparker88: Do you know if that's a planned feature, or should I just continue to use GC to clean out down nodes?

We hoped shutdown improvements would land in 0.9.0, but some larger features (eg plugins) take priority so you may want to continue using GC for the time being. If they don't make it in 0.9.0, hopefully we'll get them out in a patch release.

@onlyjob: ...therefore should be configurable. Moreover draining node on restart must be default behaviour.

This is the plan!

@onlyjob: Betting on perfect stability of the Nomad client is a strategy for the perfect world, like saying that defects in your code (will) never happen.

This is precisely reason 2 I gave above for supporting inplace upgrades. A guiding principle in Nomad's design is in the face of errors: do not stop user services! Nomad downtime should prevent further scheduling, but it should avoid causing service downtime as much as possible.

@onlyjob
Copy link
Contributor

onlyjob commented Sep 15, 2018

Thanks. :) I think there is a flaw in this reasoning... We need to separate two issues: avoiding stopping services during normal operations and a case when Nomad client itself is restarting.
It violate principles of integrity and common sense to leave scheduled jobs running when nomad client exited...
Service downtime is necessary when manager/dispatcher is restarting because it is the only safe mode of operations.

What if updated Nomad disagrees with running Docker on version of API?

@rmlsun
Copy link

rmlsun commented Jul 13, 2020

Basically what we want is, if nomad itself is running into unexpected issues, leave the task runtime alone and confine nomad issue to be just nomad issue as much as possible (smallest blast radius possible). On the other hand, if it's an intentional shutdown of nomad client, provides a way to trigger a clean shutdown of task runtimes

I think there might be a fine line here @schmichael

Ideally, if nomad client itself crashes or shutdown b/c not operator initiated reasons, it should not trigger task shutdown. Only if it's an operator initiated shutdown, it triggers (and waits for the finish of) clean shutdown of all tasks.

So would a signal be a good way to indicate it's an intentional shutdown? Like, instead of having client.drain_shutdown = true, how about client.drain_shutdown_signal = SIGINT something along that line.

Would a client.drain_shutdown = true agent configuration parameter fit your use case? The idea being that when the nomad client received the signal to shutdown it would block exiting until it had drained all running allocations?

@mwild1
Copy link

mwild1 commented Apr 20, 2021

I understand the reasoning for many people in this thread wanting this feature. It is surely a safer option in many environments. However a few comments implied that the current behaviour is always undesirable, which is not the case.

There are some workloads where it absolutely makes sense to keep allocations running during a restart (or crash) of the nomad client. Assuming of course that those allocations can be re-adopted by a new nomad process.

In-place upgrades and the general ease of upgrades with zero or minimal disruption in Nomad are one of the big features over Kubernetes for me.

So yes, by all means a way to combine drain+shutdown/restart would be great, but not because it's the only way that makes sense.

@ketzacoatl
Copy link
Contributor

@tgross what discussion is needed to figure out next steps here? I would love to help move this along!

@ketzacoatl
Copy link
Contributor

@tgross ping

@tgross
Copy link
Member

tgross commented Sep 16, 2021

Hey @ketzacoatl given that you already cross-linked this somewhere the Nomad team was asking for inputs, I'm sure they'll have some thoughts for you at some point. But I'm not at HashiCorp for a while now and there aren't any non-HashiCorp maintainers so pinging me probably won't help move things along. 😁

That being said, if it were up to me (and it's not!), I'd say there's not much to this issue:

  • If you're already shutting down a client intentionally, scripting a drain doesn't seem like a huge additional effort.
  • If a node is shutting down unintentionally (i.e. it crashes), the node can't participate in telling the server to drain it. So you need to rely on something like stop_after_client_disconnect anyways.

I'm sure the Nomad team would be open to a patch that provides client configuration that causes the node to drain on graceful shutdown.

@ketzacoatl
Copy link
Contributor

@tgross apologies for the ping!

@mikenomitch
Copy link
Contributor

There was a suggestion to use systemd inhibitor locks to achieve this. Noting here in case it is helpful if this gets picked up.

@tgross
Copy link
Member

tgross commented Apr 14, 2023

Implemented (finally!) in https://github.com/hashicorp/nomhttps://github.com/hashicorp/nomad/pull/16827ad/pull/16827, which will ship in the next release of Nomad.

Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.