[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

BenB196 · 2025-01-22T18:22:54Z

(ECK version 2.15.0)

Background:

I was recently upgrading a rather large Elasticsearch cluster from 8.16.2 to 8.17.1, but ran into an issue where one of the dedicated Master pods was recreated part way through the upgrade process.

Issue

The problem appears that ECK when it gets an upgrade of the Elasticsearch version, it will automatically update all statefulset versions right away, and then perform the rolling restart. The problem is that if a pod gets killed/recreated part way through the process, there is no longer an "order of operations" applied and things can be upgraded in the wrong order.

Reproduction:

Create an Elasticsearch cluster with dedicated masters
Upgrade the Elasticsearch cluster
Recreate one of the master pods while the upgrade is still working on non-master nodes
Once the new master node gets created, create an index
The index will get assigned the new Elasticsearch index version, and won't be allocatable on the lower version non-master nodes
Observe that the upgrade managed via ECK deadlocks on a yellow state because of allocation issues from step 5.

Expectation:

ECK should only upgrade the statefulset version when its ready to perform the rolling restart of that statefulset, and not so far before in the upgrade process.

Workaround:

To workaround the deadlock, I had to manually (and carefully) delete/recreate each of the remaining non-master pods to allow them to pick up the new version.

pebrc · 2025-01-23T11:24:37Z

This behaviour has been in place for ~ 6 years 😄 . But that does not mean it is not worth revisiting.

What I don't fully undestand in your scenario is why upgrading one of the master nodes would have an effect on the index version on the data nodes. Or was the index allocated on the already upgraded master? Was the new master node elected master?

Regarding workarounds in such a situation I wonder if you could have selectively disabled the predicate that acts as a safeguard and stops the upgrade on yellow health clusters. So you would have had at least a semi-automatic upgrade (with some additional risk of unavailability)

For a fix we need to take a closer look at the upgrade logic. The thing I am not sure about is why we chose to upgrade all stateful sets at once to begin with. I can't think of a reason other than simplicity. Also the current code structure separates the spec update from the actual deletion of the pods, with the predicate system that makes sure everything happens in order being part of the latter. What we could do is delay the stateful set spec updates (maybe with a special case for version upgrades) per tier (master, data etc) and do the masters last in case of version upgrades.

BenB196 · 2025-01-23T11:50:16Z

This behaviour has been in place for ~ 6 years

Yep, and in the ~4 years I've been using ECK, this is also the first time I've really experienced this issue, so definitely rare 😄

What I don't fully undestand in your scenario is why upgrading one of the master nodes would have an effect on the index version on the data nodes. Or was the index allocated on the already upgraded master? Was the new master node elected master?

Unfortunately, I didn't capture at the time which node was elected master, all I looked at, at the time was the allocation issue, which indicated that the index version was 8.17.1 and that there were no available nodes left to allocate the replica shard to that were on 8.17.1. (I'm not sure if the index version is decided by the master, or if it's decided by the node which initially creates the index's primary shard)

Regarding workarounds in such a situation I wonder if you could have selectively disabled the predicate that acts as a safeguard and stops the upgrade on yellow health clusters. So you would have had at least a semi-automatic upgrade (with some additional risk of unavailability)

This most likely would have worked (I didn't realize these were a thing now). ECK was hung up on if_yellow_only_restart_upgrading_nodes_with_unassigned_replicas, so disabling that would've most likely allowed the upgrade to proceed.

What we could do is delay the stateful set spec updates (maybe with a special case for version upgrades) per tier (master, data etc) and do the masters last in case of version upgrades.

I definitely think at a minimum, masters should have their spec upgraded last, as one of those getting upgraded early has the potential to prevent nodes from joining the cluster mid-way through an upgrade. But based on the guidance of, https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elasticsearch.html, data tiers should also probably be done in order, as it seems like it might impact ILM (and thus allocation) functionality if those are upgraded out of order.

BenB196 changed the title ~~ECK Should only Update Elasticsearch Stateful set version when attempting to upgrade the stateful node.~~ ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the stateful node Jan 22, 2025

botelastic bot added the triage label Jan 22, 2025

BenB196 changed the title ~~ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the stateful node~~ ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods Jan 22, 2025

BenB196 changed the title ~~ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods~~ [Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods Jan 22, 2025

pebrc added the >bug Something isn't working label Jan 23, 2025

botelastic bot removed the triage label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

BenB196 commented Jan 22, 2025 •

edited

Loading

pebrc commented Jan 23, 2025

BenB196 commented Jan 23, 2025 •

edited

Loading

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

Comments

BenB196 commented Jan 22, 2025 • edited Loading

pebrc commented Jan 23, 2025

BenB196 commented Jan 23, 2025 • edited Loading

BenB196 commented Jan 22, 2025 •

edited

Loading

BenB196 commented Jan 23, 2025 •

edited

Loading