Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429

Open
BenB196 opened this issue Jan 22, 2025 · 2 comments
Labels
>bug Something isn't working

Comments

@BenB196
Copy link

BenB196 commented Jan 22, 2025

(ECK version 2.15.0)

Background:

I was recently upgrading a rather large Elasticsearch cluster from 8.16.2 to 8.17.1, but ran into an issue where one of the dedicated Master pods was recreated part way through the upgrade process.

Issue

The problem appears that ECK when it gets an upgrade of the Elasticsearch version, it will automatically update all statefulset versions right away, and then perform the rolling restart. The problem is that if a pod gets killed/recreated part way through the process, there is no longer an "order of operations" applied and things can be upgraded in the wrong order.

Reproduction:

  1. Create an Elasticsearch cluster with dedicated masters
  2. Upgrade the Elasticsearch cluster
  3. Recreate one of the master pods while the upgrade is still working on non-master nodes
  4. Once the new master node gets created, create an index
  5. The index will get assigned the new Elasticsearch index version, and won't be allocatable on the lower version non-master nodes
  6. Observe that the upgrade managed via ECK deadlocks on a yellow state because of allocation issues from step 5.

Expectation:

ECK should only upgrade the statefulset version when its ready to perform the rolling restart of that statefulset, and not so far before in the upgrade process.

Workaround:

To workaround the deadlock, I had to manually (and carefully) delete/recreate each of the remaining non-master pods to allow them to pick up the new version.

@BenB196 BenB196 changed the title ECK Should only Update Elasticsearch Stateful set version when attempting to upgrade the stateful node. ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the stateful node Jan 22, 2025
@botelastic botelastic bot added the triage label Jan 22, 2025
@BenB196 BenB196 changed the title ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the stateful node ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods Jan 22, 2025
@BenB196 BenB196 changed the title ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods [Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods Jan 22, 2025
@pebrc pebrc added the >bug Something isn't working label Jan 23, 2025
@botelastic botelastic bot removed the triage label Jan 23, 2025
@pebrc
Copy link
Collaborator

pebrc commented Jan 23, 2025

This behaviour has been in place for ~ 6 years 😄 . But that does not mean it is not worth revisiting.

What I don't fully undestand in your scenario is why upgrading one of the master nodes would have an effect on the index version on the data nodes. Or was the index allocated on the already upgraded master? Was the new master node elected master?

Regarding workarounds in such a situation I wonder if you could have selectively disabled the predicate that acts as a safeguard and stops the upgrade on yellow health clusters. So you would have had at least a semi-automatic upgrade (with some additional risk of unavailability)

For a fix we need to take a closer look at the upgrade logic. The thing I am not sure about is why we chose to upgrade all stateful sets at once to begin with. I can't think of a reason other than simplicity. Also the current code structure separates the spec update from the actual deletion of the pods, with the predicate system that makes sure everything happens in order being part of the latter. What we could do is delay the stateful set spec updates (maybe with a special case for version upgrades) per tier (master, data etc) and do the masters last in case of version upgrades.

@BenB196
Copy link
Author

BenB196 commented Jan 23, 2025

This behaviour has been in place for ~ 6 years

Yep, and in the ~4 years I've been using ECK, this is also the first time I've really experienced this issue, so definitely rare 😄

What I don't fully undestand in your scenario is why upgrading one of the master nodes would have an effect on the index version on the data nodes. Or was the index allocated on the already upgraded master? Was the new master node elected master?

Unfortunately, I didn't capture at the time which node was elected master, all I looked at, at the time was the allocation issue, which indicated that the index version was 8.17.1 and that there were no available nodes left to allocate the replica shard to that were on 8.17.1. (I'm not sure if the index version is decided by the master, or if it's decided by the node which initially creates the index's primary shard)

Regarding workarounds in such a situation I wonder if you could have selectively disabled the predicate that acts as a safeguard and stops the upgrade on yellow health clusters. So you would have had at least a semi-automatic upgrade (with some additional risk of unavailability)

This most likely would have worked (I didn't realize these were a thing now). ECK was hung up on if_yellow_only_restart_upgrading_nodes_with_unassigned_replicas, so disabling that would've most likely allowed the upgrade to proceed.

What we could do is delay the stateful set spec updates (maybe with a special case for version upgrades) per tier (master, data etc) and do the masters last in case of version upgrades.

I definitely think at a minimum, masters should have their spec upgraded last, as one of those getting upgraded early has the potential to prevent nodes from joining the cluster mid-way through an upgrade. But based on the guidance of, https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elasticsearch.html, data tiers should also probably be done in order, as it seems like it might impact ILM (and thus allocation) functionality if those are upgraded out of order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants