Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

Closed
rsparulek opened this issue Apr 21, 2022 · 12 comments
Closed

ETCD cluster unhealthy during kubeadm upgrade to v1.22.5 #2682

rsparulek opened this issue Apr 21, 2022 · 12 comments
Labels
area/etcd kind/support Categorizes issue or PR as a support question. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@rsparulek
Copy link

rsparulek commented Apr 21, 2022

What keywords did you search in kubeadm issues before filing this one?

This issue is similar to kubernetes/kubernetes#65580 raised some time back

Is this a BUG REPORT or FEATURE REQUEST? BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.22.5

Environment:

  • Kubernetes version (use kubectl version): 1.22.5
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Centos 7
  • Kernel (e.g. uname -a): 4.18.0-305.25.1.el8_4.x86_64
  • Container runtime (CRI) (e.g. containerd, cri-o): dockershim
  • Container networking plugin (CNI) (e.g. Calico, Cilium): Calico
  • Others:

What happened?

While performing a K8S upgrade from version 1.21 to 1.22 using kubeadm; we hit a failure as etcd pod failed to come up on one of the masters with the following trace

2022-04-19 17:43:30.743893 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 

2022-04-19 17:43:30.744031 I | embed: listening for peers on 40.10.20.11:2380

2022-04-19 17:43:30.744186 I | embed: listening for metrics on http://127.0.0.1:2381/

2022-04-19 17:43:30.744707 I | rafthttp: peer 7d529180a772e4cb became active

2022-04-19 17:43:30.744739 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 reader)

2022-04-19 17:43:30.745450 I | rafthttp: peer d8f7b2579c1c73ee became active

2022-04-19 17:43:30.745475 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 reader)

2022-04-19 17:43:30.746760 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message writer)

2022-04-19 17:43:30.747210 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message reader)

2022-04-19 17:43:30.750965 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 writer)

2022-04-19 17:43:30.751100 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message writer)

raft2022/04/19 17:43:30 INFO: raft.node: 1ee2f89393d526b7 elected leader d8f7b2579c1c73ee at term 7

2022-04-19 17:43:30.752443 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 writer)

2022-04-19 17:43:30.755120 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message reader)

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5

2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

What you expected to happen?

successful K8S upgrade to 1.22.5 or healthy etcd cluster in case of failure

How to reproduce it (as minimally and precisely as possible)?

This issue is not hit consistently; we hit this in 1 out of our 20 upgrade runs

Anything else we need to know?

All etcd pods were up and running healthy before attempting the kubeadm upgrade; it is only during the kubeadm upgrade that the etcd pod on one of the master fails to come up with the trace pasted above.

This is how the kubeadm upgrade faiulure trace looks like

Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
error execution phase control-plane: couldn't complete the static pod upgrade: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

before attempting the kubeadm upgrade; all etcd pods were in healthy state

kube-system    etcd-node-07                                1/1     Running   0          148m   40.10.20.7      node-07   <none>           <none>
kube-system    etcd-node-08                                1/1     Running   0          131m   40.10.20.8      node-08   <none>           <none>
kube-system    etcd-node-11                                1/1     Running   0          115m   40.10.20.11      node-11   <none>           <none>

Post the kubeadm upgrade; the etcd pod on etcd-node-11 failed to come up as the etcd cluster was using version 3.5 and etcd-node-11 pod was failing and crashing with the docker logs pasted above in this bug.

@rsparulek
Copy link
Author

@neolit123 and me have been discussing this issue over the kubeadm slack channel

@neolit123
Copy link
Member

slack discussion:
https://kubernetes.slack.com/archives/C2P1JHS2E/p1650391559244909

do you have proof that kubeadm upgrade node was run on the 3rd member.
why did it's static pod manifest remain on 3.4.x?

@rsparulek
Copy link
Author

@neolit123 The trace below if from running kubeadm upgrade node on the 3rd master

Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
Static pod: etcd-node-11 hash: 1efca8552c87c1c981538fc79acd9ee8
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
error execution phase control-plane: couldn't complete the static pod upgrade: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

@neolit123
Copy link
Member

what do you see in the etcd logs for that member if the etcd manifest is at 3.5?

@rsparulek
Copy link
Author

Also, who is responsible for updating the static manifest file ? I thought kubeadm upgrade handles it ?

@rsparulek
Copy link
Author

@neolit123 Will this step not set the manifest file back to 3.4.13 ?

[upgrade/etcd] Etcd was rolled back and is now available

This is from the kubeadm upgrade node failure log I pasted above.

@rsparulek
Copy link
Author

These were the docker logs I could gather from the crashing etcd container on the third master. Note that the pod was not even running on the third master post the failure; the container itself was getting restarted continuously with the following trace

2022-04-19 17:43:30.743893 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file = 

2022-04-19 17:43:30.744031 I | embed: listening for peers on 40.10.20.11:2380

2022-04-19 17:43:30.744186 I | embed: listening for metrics on http://127.0.0.1:2381/

2022-04-19 17:43:30.744707 I | rafthttp: peer 7d529180a772e4cb became active

2022-04-19 17:43:30.744739 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 reader)

2022-04-19 17:43:30.745450 I | rafthttp: peer d8f7b2579c1c73ee became active

2022-04-19 17:43:30.745475 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 reader)

2022-04-19 17:43:30.746760 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message writer)

2022-04-19 17:43:30.747210 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream Message reader)

2022-04-19 17:43:30.750965 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream MsgApp v2 writer)

2022-04-19 17:43:30.751100 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message writer)

raft2022/04/19 17:43:30 INFO: raft.node: 1ee2f89393d526b7 elected leader d8f7b2579c1c73ee at term 7

2022-04-19 17:43:30.752443 I | rafthttp: established a TCP streaming connection with peer d8f7b2579c1c73ee (stream MsgApp v2 writer)

2022-04-19 17:43:30.755120 I | rafthttp: established a TCP streaming connection with peer 7d529180a772e4cb (stream Message reader)

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5

2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

@neolit123
Copy link
Member

neolit123 commented Apr 21, 2022

Will this step not set the manifest file back to 3.4.13 ?

yes, it rolls back etcd to the old manifest if the 3.5 pod fails to start.

Also, who is responsible for updating the static manifest file ? I thought kubeadm upgrade handles it ?

kubeadm upgrades etcd for you.

kubeadm upgrade works as follows:

  • you run upgade apply on one CP node
  • you run upgrade node on the rest of the CP nodes

let's say you have 3 etcd members at 3.4

upgrade on first node will result in:
3.5, 3.4, 3.4

after upgrade on second node you will have:
3.5, 3.5, 3.4

at that point only one member is at 3.4, but the minimum version of etcd is still at 3.4 because that's how etcd works.

once you call upgrade on the third node you will get:
3.5, 3.5, 3.5

that would upgrade the etcd minimum version to 3.5 (NOTE: etcd does that internally not kubeadm)
but if for some reason the etcd 3.5 pod on the last node fails after the minimum version is set to 3.5, kubeadm will still try to roll it back to 3.4.

this is what i suspect happened, but the question is why the 3.5 etcd pod failed in the first place after the minimum version was set?

2022-04-19 17:43:30.761778 N | etcdserver/membership: updated the cluster version from 3.4 to 3.5
2022-04-19 17:43:30.761793 C | etcdserver/membership: cluster cannot be downgraded (current version: 3.4.13 is lower than determined cluster version: 3.5).

this sequence of messages does not make sense.
upgrading the min version to 3.5 means that all members are >= 3.5.
then the downgrade message indicates that the member is still at 3.4.

that sounds like an etcd bug.

@neolit123 neolit123 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it. area/etcd labels Apr 21, 2022
@neolit123 neolit123 added this to the v1.25 milestone Apr 21, 2022
@neolit123
Copy link
Member

neolit123 commented Apr 21, 2022

that would upgrade the etcd minimum version to 3.5 (NOTE: etcd does that internally not kubeadm)
but if for some reason the etcd 3.5 pod on the last node fails after the minimum version is set to 3.5, kubeadm will still try to roll it back to 3.4.

if this happens the kubeadm rollback is incorrect, but we actually can't do much about it because the etcd minimum version is internal. as mentioned on slack you can manually update the 3rd node manifest to 3.5 and see if it starts, but if it still contains the errors related to downgrade and 3.4 that's very strange and sounds like a etcd database bug.

@neolit123
Copy link
Member

neolit123 commented Apr 21, 2022

what you can try is follow standard etcd recovery procedure and delete the 3rd member data:
https://etcd.io/docs/v3.3/op-guide/recovery/
https://www.techtarget.com/searchdatabackup/tutorial/How-to-restore-a-Kubernetes-cluster-from-an-etcd-snapshot

  • on the third host temporary move the /etc/kubernetes/manifests/etcd.yaml file to another folder.
  • set the container image to 3.5 in the file
  • delete the contents of /var/lib/etcd from the 3rd host
  • copy the contents of /var/lib/etcd from 2nd or 1st host to the 3rd host
  • restore /etc/kubernetes/manifests/etcd.yaml

see if the pod starts.

not much else can be done here.

@neolit123 neolit123 added the kind/support Categorizes issue or PR as a support question. label Apr 21, 2022
@rsparulek
Copy link
Author

Thanks @neolit123 We will make a note of these steps and try it out if we hit this issue again. As I said; this is a very rare issue

@neolit123
Copy link
Member

neolit123 commented Apr 21, 2022

yeah, just comment here again if you see it.
the issue is already archiving the specs of the problem.

it's much more rare than 1/20. in our CI we create multiple HA clusters every 2 hours.
kubeadm is also used in higher level tools like CAPI (edit: although CAPI doesn't use kubeadm upgrade) and kubespray and directly in a number of other projects / products.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/etcd kind/support Categorizes issue or PR as a support question. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

2 participants