Do not try to exclude a master node that never existed #1986

barkbay · 2019-10-14T08:05:50Z

I ran out of resources on a K8S cluster while doing an upscale of a set of MDI nodes.

I'm now in a situation where the nodeSet can't be downscaled because the operator is trying to exclude a master node which has never existed:

pod/es-apm-sample-es-1-0                    1/1     Running   0          33m    10.56.3.18    gke-michael-dev-cluster-default-pool-8a982915-3msd   <none>           <none>
pod/es-apm-sample-es-1-1                    0/1     Pending   0          26m    <none>        <none>

2019-10-14T07:30:50.277Z	ERROR	controller-runtime.controller	Reconciler error
{
	"ver": "1.0.0-beta1-bc11-c8bb5e5b",
	"controller": "elasticsearch-controller",
	"request": "default/es-apm-sample",
	"error": "unable to add to voting_config_exclusions: 400 Bad Request: add voting config exclusions request for [es-apm-sample-es-1-1] matched no master-eligible nodes",
	"errorCauses": [{
		"error": "unable to add to voting_config_exclusions: 400 Bad Request: unknown",
		"errorVerbose": "400 Bad Request: unknown
unable to add to voting_config_exclusions
github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client.(*clientV7).AddVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client/v7.go:41\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2.AddToVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2/voting_exclusions.go:34\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.updateZenSettingsForDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:237\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.doDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:198\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.attemptDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:129\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.HandleDownscale\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/downscale.go:54\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).reconcileNodeSpecs\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/nodes.go:112\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/driver.go:234\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:284\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"
	}]
}

The text was updated successfully, but these errors were encountered:

sebgl · 2019-10-14T08:51:40Z

This is an interesting one.

Some ideas:

Ignore the error when it happens and move on with removing the master node.
Check if the master node is part of the cluster before we add it to voting_config_exclusions. Don't make the voting_config_exclusions call if the node is not part of the cluster.

In both cases, there's a race condition:

we don't add the master node to voting_config_exclusions (either from 1. or 2.)
the node joins the cluster <-- the operator does not notice yet
we remove the node, but it was not excluded from voting

The current code retries over and over again, hitting the same error until the node finally joins the cluster. But this could never happen if the node stays Pending or bootlooping forever.
We can detect a Pending or bootlooping Node, but we would still end up with the same race condition as above.

Note we do remove only one master node at a time, which mitigates the risks introduced with the above race condition.

I think we have the same sort of problem when setting allocation excludes in cluster settings, to migrate shards away from a data node before removing it. We have an easy way out though: it is possible to exclude a node that is not part of the cluster. The corresponding HTTP call does not fail.

@ywelsch @DaveCTurner I would appreciate your thoughts on this.

DaveCTurner · 2019-10-14T10:00:04Z

Ugh yes this is tricky.

Unfortunately it's necessary to know the node ID (not just its name) before we can exclude it from the voting configuration. If it's not in the cluster we don't know its node ID so we cannot exclude it, hence the exception.

Naively, if a node is not running then you don't need to play with the voting configuration to get rid of it safely. If the cluster is alive then the node in question wasn't needed for its votes, and if the cluster is dead then it's already too late. The main thing that worries me is that this node is still showing as Pending which suggests to me that it might come to life at some point in the future. If we knew it would certainly never start then life would be easier. Is that possible?

Unfortunately, "will not run in future" isn't quite enough. Nodes that are not running cannot join a cluster, but they could remain in a cluster for a short while after their deaths. I think that after stopping the node from running we need to ensure it is certainly out of the cluster. I don't think we provide an API to do this today.

I wonder if we should strengthen the voting config exclusions API to accept an unknown node name.

DaveCTurner · 2019-10-14T11:10:32Z

I opened elastic/elasticsearch#47990

DaveCTurner · 2020-04-16T11:33:53Z

Ok the change to Elasticsearch is now merged to master and 7.x: We have replaced POST /_cluster/voting_config_exclusions/... with POST /_cluster/voting_config_exclusions?node_names=.... The existing API will be supported throughout the rest of 7.x but will result in deprecation warnings when used in ≥7.8.0.

It will shortly be removed in master but I will hold off on doing that for at least a week from now to give you some time to adapt to the new API without breaking your master builds.

sebgl · 2020-04-24T08:39:57Z

Thanks for the heads up @DaveCTurner!

I suggest we keep this issue open for pre-8.0 clusters (we may decide to do nothing about it though).
And create a new one to track the necessary changes for 8.0.0: #2951.

sebgl · 2020-11-16T08:46:57Z

I just realized that thanks to elastic/elasticsearch#50836 we could already fix this for Elasticsearch 7.8+, by changing our call from /_cluster/voting_config_exclusions/node1,node2 to /_cluster/voting_config_exclusions? node_names=node1,node2 , which should properly ignore non-existing nodes. @DaveCTurner pointed this out already in his comment above, not sure how we missed it 😞.

Raising priority on this issue.

pebrc · 2020-11-17T14:56:07Z

We have #2951 for the more focused fix of using the new query parameter

barkbay · 2020-11-19T13:43:17Z

To workaround this situation when running Elasticsearch < 7.8 it's possible to edit the StatefulSet and scale down manually the number of replicas:

Find the StatefulSet:

> kubectl get sts -l elasticsearch.k8s.elastic.co/cluster-name=<cluster-name>
NAME                       READY   AGE
<cluster-name>-es-<nodeset>   m/n     44h

Adjust the number of replicas:

> kubectl scale --replicas=m  sts/<cluster-name>-es-<nodeset>

pebrc · 2020-11-30T10:28:05Z

So are we going to add this workaround to our troubleshooting docs for <7.8 and close this issue?

barkbay added the >bug Something isn't working label Oct 14, 2019

sebgl mentioned this issue Oct 14, 2019

Voting config exclusions should work with absent nodes elastic/elasticsearch#47990

Closed

sebgl mentioned this issue Apr 24, 2020

Set node names in the voting config exclusions API call starting 7.8.0 #2951

Closed

barkbay mentioned this issue Oct 15, 2020

TestReversalStatefulSetRename is flaky #3844

Closed

curtdept mentioned this issue Oct 22, 2020

BUG - Operator wont scale down nodes that are not Master-Eligible #3874

Closed

pebrc assigned pebrc and unassigned pebrc Nov 17, 2020

pebrc mentioned this issue Jan 4, 2021

Add a doc section about stuck master scale down in ES <= 7.8 #4078

Merged

pebrc closed this as completed in #4078 Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not try to exclude a master node that never existed #1986

Do not try to exclude a master node that never existed #1986

barkbay commented Oct 14, 2019

sebgl commented Oct 14, 2019

DaveCTurner commented Oct 14, 2019

DaveCTurner commented Oct 14, 2019

DaveCTurner commented Apr 16, 2020

sebgl commented Apr 24, 2020

sebgl commented Nov 16, 2020 •

edited

Loading

pebrc commented Nov 17, 2020 •

edited

Loading

barkbay commented Nov 19, 2020

pebrc commented Nov 30, 2020

Do not try to exclude a master node that never existed #1986

Do not try to exclude a master node that never existed #1986

Comments

barkbay commented Oct 14, 2019

sebgl commented Oct 14, 2019

DaveCTurner commented Oct 14, 2019

DaveCTurner commented Oct 14, 2019

DaveCTurner commented Apr 16, 2020

sebgl commented Apr 24, 2020

sebgl commented Nov 16, 2020 • edited Loading

pebrc commented Nov 17, 2020 • edited Loading

barkbay commented Nov 19, 2020

pebrc commented Nov 30, 2020

sebgl commented Nov 16, 2020 •

edited

Loading

pebrc commented Nov 17, 2020 •

edited

Loading