-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods #8429
Comments
This behaviour has been in place for ~ 6 years 😄 . But that does not mean it is not worth revisiting. What I don't fully undestand in your scenario is why upgrading one of the master nodes would have an effect on the index version on the data nodes. Or was the index allocated on the already upgraded master? Was the new master node elected master? Regarding workarounds in such a situation I wonder if you could have selectively disabled the predicate that acts as a safeguard and stops the upgrade on yellow health clusters. So you would have had at least a semi-automatic upgrade (with some additional risk of unavailability) For a fix we need to take a closer look at the upgrade logic. The thing I am not sure about is why we chose to upgrade all stateful sets at once to begin with. I can't think of a reason other than simplicity. Also the current code structure separates the spec update from the actual deletion of the pods, with the predicate system that makes sure everything happens in order being part of the latter. What we could do is delay the stateful set spec updates (maybe with a special case for version upgrades) per tier (master, data etc) and do the masters last in case of version upgrades. |
Yep, and in the ~4 years I've been using ECK, this is also the first time I've really experienced this issue, so definitely rare 😄
Unfortunately, I didn't capture at the time which node was elected master, all I looked at, at the time was the allocation issue, which indicated that the index version was
This most likely would have worked (I didn't realize these were a thing now). ECK was hung up on
I definitely think at a minimum, masters should have their spec upgraded last, as one of those getting upgraded early has the potential to prevent nodes from joining the cluster mid-way through an upgrade. But based on the guidance of, https://www.elastic.co/guide/en/elastic-stack/current/upgrading-elasticsearch.html, data tiers should also probably be done in order, as it seems like it might impact ILM (and thus allocation) functionality if those are upgraded out of order. |
(ECK version 2.15.0)
Background:
I was recently upgrading a rather large Elasticsearch cluster from 8.16.2 to 8.17.1, but ran into an issue where one of the dedicated Master pods was recreated part way through the upgrade process.
Issue
The problem appears that ECK when it gets an upgrade of the Elasticsearch version, it will automatically update all statefulset versions right away, and then perform the rolling restart. The problem is that if a pod gets killed/recreated part way through the process, there is no longer an "order of operations" applied and things can be upgraded in the wrong order.
Reproduction:
Expectation:
ECK should only upgrade the statefulset version when its ready to perform the rolling restart of that statefulset, and not so far before in the upgrade process.
Workaround:
To workaround the deadlock, I had to manually (and carefully) delete/recreate each of the remaining non-master pods to allow them to pick up the new version.
The text was updated successfully, but these errors were encountered: