Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master node election + force master node change api #37036

Closed
vbohata opened this issue Dec 31, 2018 · 3 comments
Closed

Master node election + force master node change api #37036

vbohata opened this issue Dec 31, 2018 · 3 comments
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >feature

Comments

@vbohata
Copy link

vbohata commented Dec 31, 2018

Feature request about master election configurable conditions. Some feature requests are mentioned here #14340 and here #32462 but there is more what would be nice to have.

Use case (our situation) - ES cluster with high load, 3 master eligible nodes in 2 fast locations (because it is impossible to have an arbiter node in slow location now), virtual machine nodes. If the cluster is under a heavy load, the master node has to be located on the low loaded hypervisor with dedicated CPUs (and memory) if possible. We are able to achieve that on one hypervisor, not all. So currently we have to restart each master node one by one until the required master node (the one located in fast location with low loaded hypervisor) will be elected.

So following features should be available:

  • Configurable master eligible nodes priority. If all the nodes are available, the master will be always the node with the highest priority.
  • API to force change master independently on its currently priority or other conditions. Very useful feature if finding performance bottlenecks.
@DaveCTurner
Copy link
Contributor

Assigning priorities to master nodes as you describe seems like a bad idea. A master election is somewhat disruptive to a cluster, and if the highest priority node were consistently partially faulty (e.g. persistent GC or VM pauses, or connectivity issues) then this would trigger repeated master elections each time it joined or left the cluster. This goes against the goal for a cluster to be resilient to a single faulty node.

A cluster needs to have multiple nodes all of which are capable of taking over as master in order to run smoothly. If you require the elected master to be one specific node then that node is effectively a single point of failure. The solution is to add dedicated master nodes with sufficient resources to do their jobs.

Otherwise, I think this duplicates #14340.

@DaveCTurner DaveCTurner added >feature :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Dec 31, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@vbohata
Copy link
Author

vbohata commented Dec 31, 2018

This goes against the goal for a cluster to be resilient to a single faulty node.

Not always. There could be some threshold (# of times the election is allowed by following priorities until returning to the old behavior or until the administrator resets the threshold) or to be time driven or having some dynamic priorities related to the last elections.
The reason for this is to allow administrator prioritize the machine he knows it has the lowest latency, highest performance, stability etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >feature
Projects
None yet
Development

No branches or pull requests

3 participants