You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the scheduler attempts to place a task on a node, and the allocation fails, that node should be ranked lower (or quarantined) the next time the scheduler is looking for a node to place.
Use-cases
In some instances, the Nomad scheduler repeatedly selects the same node to schedue on, but for some reason deployments on that node fail repeatedly. This can be seen in this issue. In these cases, the entire cluster can become blocked due to several bad nodes. This feature would make Nomad far more resilient in when there are bad nodes that the scheduler does not know are bad.
Attempted Solutions
A person can monitor for failed placements and/or blocked evaluations and intervene. This takes a lot of effort and knowledge for the Nomad operator to do, and shouldn't be necessary in the first place.
The text was updated successfully, but these errors were encountered:
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Proposal
When the scheduler attempts to place a task on a node, and the allocation fails, that node should be ranked lower (or quarantined) the next time the scheduler is looking for a node to place.
Use-cases
In some instances, the Nomad scheduler repeatedly selects the same node to schedue on, but for some reason deployments on that node fail repeatedly. This can be seen in this issue. In these cases, the entire cluster can become blocked due to several bad nodes. This feature would make Nomad far more resilient in when there are bad nodes that the scheduler does not know are bad.
Attempted Solutions
A person can monitor for failed placements and/or blocked evaluations and intervene. This takes a lot of effort and knowledge for the Nomad operator to do, and shouldn't be necessary in the first place.
The text was updated successfully, but these errors were encountered: