ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

rsparulek · 2022-04-21T19:09:03Z

What keywords did you search in kubeadm issues before filing this one?

This issue is similar to kubernetes/kubernetes#65580 raised some time back

Is this a BUG REPORT or FEATURE REQUEST? BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.22.5

Environment:

Kubernetes version (use kubectl version): 1.22.5
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): Centos 7
Kernel (e.g. uname -a): 4.18.0-305.25.1.el8_4.x86_64
Container runtime (CRI) (e.g. containerd, cri-o): dockershim
Container networking plugin (CNI) (e.g. Calico, Cilium): Calico
Others:

What happened?

While performing a K8S upgrade from version 1.21 to 1.22 using kubeadm; we hit a failure as etcd pod failed to come up on one of the masters with the following trace

2022-04-19 17:43:30.743893 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 

2022-04-19 17:43:30.744031 I | embed: listening for peers on 40.10.20.11:2380

2022-04-19 17:43:30.744186 I | embed: listening for metrics on http://127.0.0.1:2381/

2022-04-19 17:43:30.744707 I | rafthttp: peer 7d529180a772e4cb became active

2022-04-19 17:43:30.744739 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 reader)

2022-04-19 17:43:30.745450 I | rafthttp: peer d8f7b2579c1c73ee became active

2022-04-19 17:43:30.745475 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 reader)

2022-04-19 17:43:30.746760 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message writer)

2022-04-19 17:43:30.747210 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message reader)

2022-04-19 17:43:30.750965 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 writer)

2022-04-19 17:43:30.751100 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message writer)

raft2022/04/19 17:43:30 INFO: raft.node: 1ee2f89393d526b7 elected leader d8f7b2579c1c73ee at term 7

2022-04-19 17:43:30.752443 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 writer)

2022-04-19 17:43:30.755120 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message reader)

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5

2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

What you expected to happen?

successful K8S upgrade to 1.22.5 or healthy etcd cluster in case of failure

How to reproduce it (as minimally and precisely as possible)?

This issue is not hit consistently; we hit this in 1 out of our 20 upgrade runs

Anything else we need to know?

All etcd pods were up and running healthy before attempting the kubeadm upgrade; it is only during the kubeadm upgrade that the etcd pod on one of the master fails to come up with the trace pasted above.

This is how the kubeadm upgrade faiulure trace looks like

Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
error execution phase control-plane: couldn't complete the static pod upgrade: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

before attempting the kubeadm upgrade; all etcd pods were in healthy state

kube-system    etcd-node-07                                1/1     Running   0          148m   40.10.20.7      node-07   <none>           <none>
kube-system    etcd-node-08                                1/1     Running   0          131m   40.10.20.8      node-08   <none>           <none>
kube-system    etcd-node-11                                1/1     Running   0          115m   40.10.20.11      node-11   <none>           <none>

Post the kubeadm upgrade; the etcd pod on etcd-node-11 failed to come up as the etcd cluster was using version 3.5 and etcd-node-11 pod was failing and crashing with the docker logs pasted above in this bug.

The text was updated successfully, but these errors were encountered:

rsparulek · 2022-04-21T19:09:37Z

@neolit123 and me have been discussing this issue over the kubeadm slack channel

neolit123 · 2022-04-21T19:16:44Z

slack discussion:
https://kubernetes.slack.com/archives/C2P1JHS2E/p1650391559244909

do you have proof that kubeadm upgrade node was run on the 3rd member.
why did it's static pod manifest remain on 3.4.x?

rsparulek · 2022-04-21T19:18:40Z

@neolit123 The trace below if from running kubeadm upgrade node on the 3rd master

Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
error execution phase control-plane: couldn't complete the static pod upgrade: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

neolit123 · 2022-04-21T19:19:13Z

what do you see in the etcd logs for that member if the etcd manifest is at 3.5?

rsparulek · 2022-04-21T19:19:37Z

Also, who is responsible for updating the static manifest file ? I thought kubeadm upgrade handles it ?

rsparulek · 2022-04-21T19:21:45Z

@neolit123 Will this step not set the manifest file back to 3.4.13 ?

[upgrade/etcd] Etcd was rolled back and is now available

This is from the kubeadm upgrade node failure log I pasted above.

rsparulek · 2022-04-21T19:23:25Z

These were the docker logs I could gather from the crashing etcd container on the third master. Note that the pod was not even running on the third master post the failure; the container itself was getting restarted continuously with the following trace

2022-04-19 17:43:30.743893 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 

2022-04-19 17:43:30.744031 I | embed: listening for peers on 40.10.20.11:2380

2022-04-19 17:43:30.744186 I | embed: listening for metrics on http://127.0.0.1:2381/

2022-04-19 17:43:30.744707 I | rafthttp: peer 7d529180a772e4cb became active

2022-04-19 17:43:30.744739 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 reader)

2022-04-19 17:43:30.745450 I | rafthttp: peer d8f7b2579c1c73ee became active

2022-04-19 17:43:30.745475 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 reader)

2022-04-19 17:43:30.746760 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message writer)

2022-04-19 17:43:30.747210 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message reader)

2022-04-19 17:43:30.750965 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 writer)

2022-04-19 17:43:30.751100 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message writer)

raft2022/04/19 17:43:30 INFO: raft.node: 1ee2f89393d526b7 elected leader d8f7b2579c1c73ee at term 7

2022-04-19 17:43:30.752443 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 writer)

2022-04-19 17:43:30.755120 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message reader)

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5

2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

neolit123 · 2022-04-21T19:24:46Z

Will this step not set the manifest file back to 3.4.13 ?

yes, it rolls back etcd to the old manifest if the 3.5 pod fails to start.

Also, who is responsible for updating the static manifest file ? I thought kubeadm upgrade handles it ?

kubeadm upgrades etcd for you.

kubeadm upgrade works as follows:

you run upgade apply on one CP node
you run upgrade node on the rest of the CP nodes

let's say you have 3 etcd members at 3.4

upgrade on first node will result in:
3.5, 3.4, 3.4

after upgrade on second node you will have:
3.5, 3.5, 3.4

at that point only one member is at 3.4, but the minimum version of etcd is still at 3.4 because that's how etcd works.

once you call upgrade on the third node you will get:
3.5, 3.5, 3.5

that would upgrade the etcd minimum version to 3.5 (NOTE: etcd does that internally not kubeadm)
but if for some reason the etcd 3.5 pod on the last node fails after the minimum version is set to 3.5, kubeadm will still try to roll it back to 3.4.

this is what i suspect happened, but the question is why the 3.5 etcd pod failed in the first place after the minimum version was set?

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5
2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

this sequence of messages does not make sense.
upgrading the min version to 3.5 means that all members are >= 3.5.
then the downgrade message indicates that the member is still at 3.4.

that sounds like an etcd bug.

neolit123 · 2022-04-21T19:30:01Z

that would upgrade the etcd minimum version to 3.5 (NOTE: etcd does that internally not kubeadm)
but if for some reason the etcd 3.5 pod on the last node fails after the minimum version is set to 3.5, kubeadm will still try to roll it back to 3.4.

if this happens the kubeadm rollback is incorrect, but we actually can't do much about it because the etcd minimum version is internal. as mentioned on slack you can manually update the 3rd node manifest to 3.5 and see if it starts, but if it still contains the errors related to downgrade and 3.4 that's very strange and sounds like a etcd database bug.

neolit123 · 2022-04-21T19:33:55Z

what you can try is follow standard etcd recovery procedure and delete the 3rd member data:
https://etcd.io/docs/v3.3/op-guide/recovery/
https://www.techtarget.com/searchdatabackup/tutorial/How-to-restore-a-Kubernetes-cluster-from-an-etcd-snapshot

on the third host temporary move the /etc/kubernetes/manifests/etcd.yaml file to another folder.
set the container image to 3.5 in the file
delete the contents of /var/lib/etcd from the 3rd host
copy the contents of /var/lib/etcd from 2nd or 1st host to the 3rd host
restore /etc/kubernetes/manifests/etcd.yaml

see if the pod starts.

not much else can be done here.

rsparulek · 2022-04-21T20:46:23Z

Thanks @neolit123 We will make a note of these steps and try it out if we hit this issue again. As I said; this is a very rare issue

neolit123 · 2022-04-21T21:01:01Z

yeah, just comment here again if you see it.
the issue is already archiving the specs of the problem.

it's much more rare than 1/20. in our CI we create multiple HA clusters every 2 hours.
kubeadm is also used in higher level tools like CAPI (edit: although CAPI doesn't use kubeadm upgrade) and kubespray and directly in a number of other projects / products.

neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it. area/etcd labels Apr 21, 2022

neolit123 added this to the v1.25 milestone Apr 21, 2022

neolit123 added the kind/support Categorizes issue or PR as a support question. label Apr 21, 2022

neolit123 closed this as completed Apr 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

rsparulek commented Apr 21, 2022 •

edited

Loading

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022

rsparulek commented Apr 21, 2022

rsparulek commented Apr 21, 2022

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022 •

edited

Loading

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

Comments

rsparulek commented Apr 21, 2022 • edited Loading

What keywords did you search in kubeadm issues before filing this one?

Is this a BUG REPORT or FEATURE REQUEST? BUG REPORT

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022

rsparulek commented Apr 21, 2022

rsparulek commented Apr 21, 2022

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022 • edited Loading

neolit123 commented Apr 21, 2022 • edited Loading

neolit123 commented Apr 21, 2022 • edited Loading

rsparulek commented Apr 21, 2022

neolit123 commented Apr 21, 2022 • edited Loading

rsparulek commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading

neolit123 commented Apr 21, 2022 •

edited

Loading