Prevent additional scheduling on Node - helps with system maintenance #1500

olenm · 2016-08-02T01:47:52Z

A feature to stop new-jobs from starting on a select node would be incredibly useful.
This would allow for enhanced testing of new nomad clients and testing out infrastructure before having to deprecate old systems.

diptanu · 2016-08-02T02:00:02Z

@olenm Isn't node-drain solving this use-case? When you push new nodes, you can drain it so that new jobs don't land on them and enable the new nodes when you are comfortable?

olenm · 2016-08-02T02:02:05Z

node-drain is excellent for removing nodes from a host, but often I am wanting to block jobs being pushed to a particular node to enforce jobs being pushed to another node. Using node-drain causes all containers on the node to migrate to other nodes, which is exactly what I want to prevent.

dadgar · 2016-08-05T23:50:38Z

I think #1523 this is what you want?

olenm · 2016-08-09T01:35:01Z

@dadgar: no, it looks like #1523 is slightly different as dvusboy still needs node-drain to do all 3.
I simply want a single method to block new-jobs from being scheduled on a node (and to not automatically drain the node of current running tasks)

dvusboy · 2016-08-09T01:42:49Z

@olenm: In this case, it really can't be put under the node-drain command: If there are service tasks and/or system tasks allocated to a node, your "block new task" will continue to let them run without stop, which means the node will never be drained. In #1523, I still want a node to drain; I just want the batch tasks to die a natural death. I'm rather curious about the utility of this block.

Nomon · 2016-08-09T11:22:22Z

I tried to do this (prevent new jobs to be scheduled on a node) by setting reserved resources to currently allocated resources in config and reloading the agent config with SIGHUP, but it did not update the available resources in node-status.

dadgar · 2016-08-09T16:59:01Z

@olenm and @Nomon can you explain why you want this? Generally in a large cluster you shouldn't be manipulating individual nodes like this

olenm · 2016-08-09T21:12:26Z

This would be incredibly powerful for upgrading the nomad-agent cluster, in removing the 'magic' of a node-drain and allowing more control over upgrades.

Imagine a 3-node nomad-agent cluster via AWS ASG's. We want to upgrade the nomad binary by simply creating new ami's and a new ASG joined to the same nomad cluster.

We have a dozen jobs, some with multiple group counts, and some are singles. The containers are currently across 2 of the 3 (older) nodes, and calling node-drain on the 3rd node stops jobs but migrates none (since none exist on it).

We are left with 2 (older) nodes with containers on it, calling a node-drain on one of these will cause all containers to load onto the last old node - regardless if I have already spun up the 2nd ASG or not.
This leaves me to either call node-drain on the last host, causing an outage, or if I use our deployment process (adds a step for blue-green) can attempt to load balance on the new agents (presuming they are not triggered on the same node) - we do not use distinct as we sometimes have more container count than nodes.

With a node-block status in place, blocking new jobs on the old agents would allow a deploy to occur and not interrupt service and also allow testing out new nomad-agent clusters added to an already existing cluster. A deploy would still be able to terminate the jobs on the blocked-agent after a successful deploy occurs on the new systems.

dvusboy · 2016-08-09T23:09:31Z

Do you not add a new client node before calling node-drain on the existing (older) nodes?

olenm · 2016-08-09T23:29:23Z

yes, we use ASG's and would essentially (at least) double the size of the nomad-agent cluster

Nomon · 2016-08-10T06:46:33Z

@dadgar the node had degraded and was marked for termination by AWS, I wanted the long running jobs on it to finish and no new services/batch jobs to be scheduled there before I drain it.

dadgar · 2016-08-12T21:37:02Z

@Nomon #1523 would be what you want
@olenm I understand now. Good idea

dvusboy · 2016-08-12T21:44:54Z

I wonder if adding the flag --graceful to node-drain may satisfy every one:

--graceful=[batch,service,system]

possibly defaulting to =batch. I'm still not certain about what graceful draining means for services, but this will provide the flexibility to not kill any task running on a node being drained but just setting up the allocation gate.

olenm · 2016-08-15T21:40:07Z

@dvusboy: if you mean to have a flag to set the individual component levels of a node-drain, then yes that would be an excellent feature. But I also think adding the node-block method would be a first step in that direction.

@dadgar: awesome, thanks; glad I was able to clearify the use case

preetapan · 2018-06-22T16:14:42Z

Revamped node drain in 0.8 addresses this, closing

github-actions · 2022-11-29T02:18:38Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

diptanu added the stage/waiting-reply label Aug 2, 2016

diptanu removed the stage/waiting-reply label Aug 2, 2016

dadgar added the stage/waiting-reply label Aug 5, 2016

dadgar added type/enhancement theme/core theme/scheduling and removed stage/waiting-reply labels Aug 12, 2016

dadgar changed the title ~~new feature: node-block~~ Prevent additional scheduling on Node - helps with system maintenance Aug 12, 2016

preetapan closed this as completed Jun 22, 2018

github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent additional scheduling on Node - helps with system maintenance #1500

Prevent additional scheduling on Node - helps with system maintenance #1500

olenm commented Aug 2, 2016

diptanu commented Aug 2, 2016

olenm commented Aug 2, 2016

dadgar commented Aug 5, 2016

olenm commented Aug 9, 2016

dvusboy commented Aug 9, 2016

Nomon commented Aug 9, 2016 •

edited

Loading

dadgar commented Aug 9, 2016

olenm commented Aug 9, 2016

dvusboy commented Aug 9, 2016

olenm commented Aug 9, 2016

Nomon commented Aug 10, 2016

dadgar commented Aug 12, 2016

dvusboy commented Aug 12, 2016

olenm commented Aug 15, 2016

preetapan commented Jun 22, 2018

github-actions bot commented Nov 29, 2022

Prevent additional scheduling on Node - helps with system maintenance #1500

Prevent additional scheduling on Node - helps with system maintenance #1500

Comments

olenm commented Aug 2, 2016

diptanu commented Aug 2, 2016

olenm commented Aug 2, 2016

dadgar commented Aug 5, 2016

olenm commented Aug 9, 2016

dvusboy commented Aug 9, 2016

Nomon commented Aug 9, 2016 • edited Loading

dadgar commented Aug 9, 2016

olenm commented Aug 9, 2016

dvusboy commented Aug 9, 2016

olenm commented Aug 9, 2016

Nomon commented Aug 10, 2016

dadgar commented Aug 12, 2016

dvusboy commented Aug 12, 2016

olenm commented Aug 15, 2016

preetapan commented Jun 22, 2018

github-actions bot commented Nov 29, 2022

Nomon commented Aug 9, 2016 •

edited

Loading