Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent additional scheduling on Node - helps with system maintenance #1500

Closed
olenm opened this issue Aug 2, 2016 · 16 comments
Closed

Prevent additional scheduling on Node - helps with system maintenance #1500

olenm opened this issue Aug 2, 2016 · 16 comments

Comments

@olenm
Copy link

olenm commented Aug 2, 2016

A feature to stop new-jobs from starting on a select node would be incredibly useful.
This would allow for enhanced testing of new nomad clients and testing out infrastructure before having to deprecate old systems.

@diptanu
Copy link
Contributor

diptanu commented Aug 2, 2016

@olenm Isn't node-drain solving this use-case? When you push new nodes, you can drain it so that new jobs don't land on them and enable the new nodes when you are comfortable?

@olenm
Copy link
Author

olenm commented Aug 2, 2016

node-drain is excellent for removing nodes from a host, but often I am wanting to block jobs being pushed to a particular node to enforce jobs being pushed to another node. Using node-drain causes all containers on the node to migrate to other nodes, which is exactly what I want to prevent.

@dadgar
Copy link
Contributor

dadgar commented Aug 5, 2016

I think #1523 this is what you want?

@olenm
Copy link
Author

olenm commented Aug 9, 2016

@dadgar: no, it looks like #1523 is slightly different as dvusboy still needs node-drain to do all 3.
I simply want a single method to block new-jobs from being scheduled on a node (and to not automatically drain the node of current running tasks)

@dvusboy
Copy link

dvusboy commented Aug 9, 2016

@olenm: In this case, it really can't be put under the node-drain command: If there are service tasks and/or system tasks allocated to a node, your "block new task" will continue to let them run without stop, which means the node will never be drained. In #1523, I still want a node to drain; I just want the batch tasks to die a natural death. I'm rather curious about the utility of this block.

@Nomon
Copy link

Nomon commented Aug 9, 2016

I tried to do this (prevent new jobs to be scheduled on a node) by setting reserved resources to currently allocated resources in config and reloading the agent config with SIGHUP, but it did not update the available resources in node-status.

@dadgar
Copy link
Contributor

dadgar commented Aug 9, 2016

@olenm and @Nomon can you explain why you want this? Generally in a large cluster you shouldn't be manipulating individual nodes like this

@olenm
Copy link
Author

olenm commented Aug 9, 2016

This would be incredibly powerful for upgrading the nomad-agent cluster, in removing the 'magic' of a node-drain and allowing more control over upgrades.

Imagine a 3-node nomad-agent cluster via AWS ASG's. We want to upgrade the nomad binary by simply creating new ami's and a new ASG joined to the same nomad cluster.

We have a dozen jobs, some with multiple group counts, and some are singles. The containers are currently across 2 of the 3 (older) nodes, and calling node-drain on the 3rd node stops jobs but migrates none (since none exist on it).

We are left with 2 (older) nodes with containers on it, calling a node-drain on one of these will cause all containers to load onto the last old node - regardless if I have already spun up the 2nd ASG or not.
This leaves me to either call node-drain on the last host, causing an outage, or if I use our deployment process (adds a step for blue-green) can attempt to load balance on the new agents (presuming they are not triggered on the same node) - we do not use distinct as we sometimes have more container count than nodes.

With a node-block status in place, blocking new jobs on the old agents would allow a deploy to occur and not interrupt service and also allow testing out new nomad-agent clusters added to an already existing cluster. A deploy would still be able to terminate the jobs on the blocked-agent after a successful deploy occurs on the new systems.

@dvusboy
Copy link

dvusboy commented Aug 9, 2016

Do you not add a new client node before calling node-drain on the existing (older) nodes?

@olenm
Copy link
Author

olenm commented Aug 9, 2016

yes, we use ASG's and would essentially (at least) double the size of the nomad-agent cluster

@Nomon
Copy link

Nomon commented Aug 10, 2016

@dadgar the node had degraded and was marked for termination by AWS, I wanted the long running jobs on it to finish and no new services/batch jobs to be scheduled there before I drain it.

@dadgar
Copy link
Contributor

dadgar commented Aug 12, 2016

@Nomon #1523 would be what you want
@olenm I understand now. Good idea

@dadgar dadgar changed the title new feature: node-block Prevent additional scheduling on Node - helps with system maintenance Aug 12, 2016
@dvusboy
Copy link

dvusboy commented Aug 12, 2016

I wonder if adding the flag --graceful to node-drain may satisfy every one:

--graceful=[batch,service,system]

possibly defaulting to =batch. I'm still not certain about what graceful draining means for services, but this will provide the flexibility to not kill any task running on a node being drained but just setting up the allocation gate.

@olenm
Copy link
Author

olenm commented Aug 15, 2016

@dvusboy: if you mean to have a flag to set the individual component levels of a node-drain, then yes that would be an excellent feature. But I also think adding the node-block method would be a first step in that direction.

@dadgar: awesome, thanks; glad I was able to clearify the use case

@preetapan
Copy link
Contributor

Revamped node drain in 0.8 addresses this, closing

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants