Master node election + force master node change api #37036

vbohata · 2018-12-31T00:39:03Z

Feature request about master election configurable conditions. Some feature requests are mentioned here #14340 and here #32462 but there is more what would be nice to have.

Use case (our situation) - ES cluster with high load, 3 master eligible nodes in 2 fast locations (because it is impossible to have an arbiter node in slow location now), virtual machine nodes. If the cluster is under a heavy load, the master node has to be located on the low loaded hypervisor with dedicated CPUs (and memory) if possible. We are able to achieve that on one hypervisor, not all. So currently we have to restart each master node one by one until the required master node (the one located in fast location with low loaded hypervisor) will be elected.

So following features should be available:

Configurable master eligible nodes priority. If all the nodes are available, the master will be always the node with the highest priority.
API to force change master independently on its currently priority or other conditions. Very useful feature if finding performance bottlenecks.

DaveCTurner · 2018-12-31T08:05:18Z

Assigning priorities to master nodes as you describe seems like a bad idea. A master election is somewhat disruptive to a cluster, and if the highest priority node were consistently partially faulty (e.g. persistent GC or VM pauses, or connectivity issues) then this would trigger repeated master elections each time it joined or left the cluster. This goes against the goal for a cluster to be resilient to a single faulty node.

A cluster needs to have multiple nodes all of which are capable of taking over as master in order to run smoothly. If you require the elected master to be one specific node then that node is effectively a single point of failure. The solution is to add dedicated master nodes with sufficient resources to do their jobs.

Otherwise, I think this duplicates #14340.

elasticmachine · 2018-12-31T08:05:33Z

Pinging @elastic/es-distributed

vbohata · 2018-12-31T12:00:47Z

This goes against the goal for a cluster to be resilient to a single faulty node.

Not always. There could be some threshold (# of times the election is allowed by following priorities until returning to the old behavior or until the administrator resets the threshold) or to be time driven or having some dynamic priorities related to the last elections.
The reason for this is to allow administrator prioritize the machine he knows it has the lowest latency, highest performance, stability etc.

DaveCTurner closed this as completed Dec 31, 2018

DaveCTurner added >feature :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Dec 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Master node election + force master node change api #37036

Master node election + force master node change api #37036

vbohata commented Dec 31, 2018

DaveCTurner commented Dec 31, 2018

elasticmachine commented Dec 31, 2018

vbohata commented Dec 31, 2018 •

edited

Loading

Master node election + force master node change api #37036

Master node election + force master node change api #37036

Comments

vbohata commented Dec 31, 2018

DaveCTurner commented Dec 31, 2018

elasticmachine commented Dec 31, 2018

vbohata commented Dec 31, 2018 • edited Loading

vbohata commented Dec 31, 2018 •

edited

Loading