Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make node.master a dynamic setting #10793

Closed
TwP opened this issue Apr 24, 2015 · 23 comments
Closed

Make node.master a dynamic setting #10793

TwP opened this issue Apr 24, 2015 · 23 comments
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >feature

Comments

@TwP
Copy link

TwP commented Apr 24, 2015

It would be nice to add dedicated master nodes to an existing cluster without requiring a full restart of each node in the cluster. To accomplish this, the node.master would need to be dynamically configurable.

Changing the node.master setting at runtime would not force a new master election. It would only apply to the cluster moving forward. To force a master election, the current master node would need to be restarted as is currently the case today.

@s1monw
Copy link
Contributor

s1monw commented Apr 24, 2015

@TwP we will discuss this internally but it might take a while to get here, just a headsup

@clintongormley clintongormley added the :Core/Infra/Settings Settings infrastructure and APIs label Apr 26, 2015
@clintongormley
Copy link
Contributor

I like the idea. That said:

It would be nice to add dedicated master nodes to an existing cluster without requiring a full restart of each node in the cluster.

You currently only need to restart the node you're promoting to master, unless I misunderstand you?

@dakrone
Copy link
Member

dakrone commented Apr 26, 2015

You currently only need to restart the node you're promoting to master, unless I misunderstand you?

Let's say you start with a 9-node cluster with minimum_master_nodes set to 5, then you decide at some point you want to move to dedicated master nodes. If the node.master setting were dynamic, you could add 3 nodes with node.master: true and update node.master: false on the other 9 nodes (also updating minimum_master_nodes to 2). Then you could (optionally) bounce the current master node and have one of the dedicated master nodes take over.

Right now you would have to restart each of the 9 data nodes (which stinks if you have a lot of data) in order to mark them all as non-master-eligible, because the node.master setting can't be dynamically changed.

@clintongormley
Copy link
Contributor

OK. I can see one issue here. If you change the setting dynamically and you don't update the config file, and the node reboots (expectedly or unexpectedly), it'll pick up the node.master setting from the config file.

Currently, all node-specific (as opposed to cluster-wide) settings are set at node startup only. We have no API to set node settings, and nowhere to persist them.

@clintongormley
Copy link
Contributor

Isn't the better solution for this problem fixing the slow restarts?

@dakrone
Copy link
Member

dakrone commented Apr 27, 2015

Isn't the better solution for this problem fixing the slow restarts?

I don't think so, (don't get me wrong, fixing slow restarts would be fantastic), but I still think it is nice to be able to make changes to node.master without requiring a restart.

@rjernst
Copy link
Member

rjernst commented Apr 27, 2015

A real world scenario where you may want this is when transitioning to a new set of nodes in a cloud environment. You would want to nicely transition to those new nodes, but you need to force the system to transition off of the old nodes. While killing the old nodes should work, this makes the exceptional case (ie master dies) the path taken, which shouldn't be necessary. There should be a clean way to transition from one set of master nodes to another set.

@s1monw
Copy link
Contributor

s1monw commented Apr 28, 2015

This kind of scenario is something that wonders me for a while now. You have a setup where you realize it`s not optimal ie. want to move toe dedicated master nodes etc. But you have to change this node level setting and bounce processes, do recoveries etc. set minimum master nodes what have you. (I bet folks miss at least on important step here regularly). I wonder if we should have a dedicated API that bakes the cluster. Ie you wanna move to 3 master nodes from a 9 node cluster you would have ideally a simple API call like this:

PUT /_cluster/bake
{
  "master_nodes" : [ "master_1", "master_2", "master_3"],
  "set_minimum_master_nodes" : "true|false #optional true by default", 
  "force" : "true|false #false by default to barf if something is not safe (ie. only one master)"
}

with this call we basically move away form everything is dynamic and ignore other master nodes even if they are master eligible. It will force a new election if the current master is not in the list. This might also help use with some safety mechanisms or allows us to harden master election since we know the nodes now and the list is not dynamic anymore. I am not an expert on this but I wanted to throw out the idea here...

@clintongormley
Copy link
Contributor

You still have to deal with what happens if any nodes reboot unexpectedly. Do they take their local elasticsearch.yml into account, or just use whatever setting is current in the cluster? If the latter, then the settings applied via the API should always override the local settings (which is confusing if you see settings from the yaml not being applied). What happens if three ex-masters reboot, and can't see the rest of the cluster initially? They'd use their local settings and form their own cluster.

@TwP
Copy link
Author

TwP commented Apr 29, 2015

@clintongormley you bring up a very valid point. However, that problem is not isolated to this proposed change to the node.master eligibility setting. All dynamically configurable settings have this same problem. Although with the node.master eligibility the consequences could be much more severe as you pointed out.

The first step is making the setting dynamically configurable. Changing the master eligibility via the API is one route for setting this value. The other route is to update the elasticsearch.yaml configuration file and then signal the running process to reload settings from the configuration file. Static settings are ignored. Dynamic settings are updated accordingly.

How dynamic settings are handled across restarts is an orthogonal problem. It should not prevent this feature from being implemented. However, taking the broader view will definitely highlight problems that can be introduced by making master eligibility configurable at runtime.

@mlorch-ai
Copy link

I just ran into the situation of having to restart a full production cluster to enable this setting and hence would've loved it we could change that dynamically.
Please keep in mind that if we change the master node (e.g. going from a cluster where each node can be a master to a cluster with a smaller number of dedicated masters) we also need to change/adapt other settings (such as minimum_master_nodes) dynamically.

I dont see an issue with the fact that the config file may have outdated information. We need to keep the config file updated with other settings as well as we evolve our cluster to accomodate for node failures/restarts. Being able to change this dynamically would however avoid the "yellow" cluster state and reduced redundancy that happens when a node (all nodes) needs to be restarted while indexing is going on as its indexes need to be brought up to speed.

@rjsm
Copy link

rjsm commented Nov 2, 2015

I would assume any cluster that this would be an issue on is under config management. You can update the elasticsearch.yml under the node, and then tell to not be master eligible. If it unexpectedly restarts, it'll pick up the intended configuration from the file. I'm transitioning my cluster to dedicated masters here shortly.

@yehosef
Copy link

yehosef commented Dec 14, 2015

+1 - this would be great. I can change the elasticsearch.yml file behind the scenes to handle a reboot - But I shouldn't have to restart the node to do it.

So I assume if I set node.master = false on a machine that's currently the master the cluster would start a new master selection process? I think it's important to be able to manually force a master - or migrate a master when I know a machine is going down. Even though the cluster will rebound, there is no reason to enter a failover state and any risks that could involve, when I don't need to.

@charlesmims
Copy link

Would a possible solution to this problem be not necessarily making node.master a dynamic setting, but among the nodes which are master eligible, providing a way to force election of a particular node as the master? Perhaps a transient cluster API setting?

@pkusnail
Copy link

I really love it if a non-master node in a cluster can be dynamically upgrade to a master eligible node , in order to prevent potential problems such as split brain , we set only one master eligible node in a cluster which is the master of the cluster for sure , and we add supervisor process which can restart the master node with 20 seconds, but if the node is physically damaged or power off , it is better that we can dynamically upgrade a node (such as client node ) to be master of the cluster.

@bleskes
Copy link
Contributor

bleskes commented Jul 22, 2016

we set only one master eligible node in a cluster which is the master of the cluster for sure
if the node is physically damaged or power off , it is better that we can dynamically upgrade a node

The right solution for this problem is having at least 3 master nodes and properly configure minimum master nodes to 2.

The issue is requesting for a master transition which doesn't require a 3 second period with no master, which is different.

@rjernst rjernst added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Core/Infra/Settings Settings infrastructure and APIs labels Mar 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner DaveCTurner added :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure and removed :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Mar 15, 2018
@DaveCTurner DaveCTurner added :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. and removed :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure labels Mar 27, 2018
@DaveCTurner
Copy link
Contributor

I found a trap:

public boolean mustAck(DiscoveryNode discoveryNode) {
// repository was created on both master and data nodes
return discoveryNode.isMasterNode() || discoveryNode.isDataNode();
}

This is checked twice in MasterService.AckCountDownListener: once in the constructor and once in onNodeAck(). We rely on the return value not changing in between those two calls.

This is, of course, surmountable - I just thought it wise to write it down here for future reference.

@jasontedor
Copy link
Member

@andrershov @DaveCTurner @ywelsch What impact does the work on Zen 2 have on an issue such as this one?

@ywelsch
Copy link
Contributor

ywelsch commented Nov 9, 2018

@jasontedor there are a number of different scenarios and asks brought up on this issue. Zen2 in its current form already addresses some, but not all of them. The main change with Zen2 is that it makes the notion of "master-eligible" a bit more dynamic, by giving the flexibility to assign voting rights to only a subset of the master-eligible nodes, and allowing these voting rights to be dynamically shifted to another subset of the master-eligible nodes. This enables a clean transition from one set of master nodes to another set (the situation @rjernst mentioned). There are other scenarios described here (e.g. dynamically making a mixed master/data node a master-only node), which have consequences that go beyond Zen (moving the shard data off these nodes before allowing them to become master-only nodes). Finally, there is also another dimension to this problem, namely the capability to make the property of a node to "act as elected master" more dynamic (relates e.g. to #14340). This is not dynamically adaptable in Zen2 yet, which means that every master-eligible node (whether voting right or not) can become the elected master. We will be looking at some of these remaining use cases in more detail post 7.0. I'm convinced that the Zen2 way of managing voting configurations and the leader election mechanism will facilitate implementing solutions for them though.

@andrershov
Copy link
Contributor

@ywelsch what do you think about closing this issue, because it discusses too many different things, and opening new dedicated issues instead?
I'm not convinced that dynamically making a mixed master/data node a master-only worth the effort, we generally recommend running dedicated master nodes and if it happens that someone wants to convert mixed node into master-only - it can be stopped, repurposed and started again.
So probably leaving just #14340 open is enough.

@DaveCTurner
Copy link
Contributor

I am +1 on closing this issue.

We have addressed this valid use-case in #37802: the master now gracefully abdicates when removed from the voting configuration. We have also addressed some other comments about needing to update minimum_master_nodes by deprecating that setting in #37868.

It would be nice to add dedicated master nodes to an existing cluster without requiring a full restart of each node in the cluster.

I think it is not too onerous to restart a node when repurposing it from a mixed master/data node to a data-only node. It would be nice, but I do not think it gains us very much. The interplay between static node settings and dynamic cluster-wide settings concerns me, as does all the other things you need to do when node.master changes.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 13, 2019

+1 as well for closing this issue. Restarting nodes is something that will naturally have to happen in a cluster anyway (e.g. rolling upgrades) and repurposing nodes should be a rather rare event. In addition to what @andrershov and @DaveCTurner have said, I also want to point out that we have taken steps to address the following (quoting myself):

There are other scenarios described here (e.g. dynamically making a mixed master/data node a master-only node), which have consequences that go beyond Zen (moving the shard data off these nodes before allowing them to become master-only nodes).

In #37748 and #37347 we have come up with stricter rules for repurposing nodes and in #39403 we are adding tooling to support this model.

@ywelsch ywelsch closed this as completed Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >feature
Projects
None yet
Development

No branches or pull requests