Failure to rollback etcd datadir in case of errors during kubeadm inplace upgrade from 1.10.5 to 1.11.0 #65580
Labels
area/kubeadm
kind/bug
Categorizes issue or PR as related to a bug.
sig/cluster-lifecycle
Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
During upgrade from
v1.10.5
tov1.11.0
theetcd
data rollback procedure ofkubeadm
does not delete/var/lib/etcd/member
before copying backup data into it.These are last lines from the
etcd
container after the rollback procedure:Manually removing
/var/lib/etcd/member
followed by copying backup data fromkubeadm-backup-etcd-2018-06-28-12-29-22/etcd
into/var/lib/etcd
and deletingetcd
container makes nextetcd
container (with old version) stay alive.I believe that the
etcd
upgrade failure in the first place was due a mistake I did in installingkubelet
before executingkubeadm upgrade apply v1.11.0
.Mostly because during the installation of
kubeadm
the file/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
was replaced, andkubelet
started to fail resulting inNotReady
. (maybe this is another bug)I had to copy
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
from a node and restartkubelet
.This is the output of
kubeadm
:What you expected to happen:
Successful rollback of
etcd
data dir in case of failures during upgrade.How to reproduce it (as minimally and precisely as possible):
dpkg -i kubectl_1.11.0-00_amd64.deb kubeadm_1.11.0-00_amd64.deb
kubeadm
the file/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
will be replaced, andkubelet
will start failing. (maybe this is another bug)/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
from a node and restart kubelet.dpkg -i kubelet_1.11.0-00_amd64.deb # _now_ I know this is a mistake, but important to trigger the "rollback etcd datadir" :)
kubeadm upgrade plan
kubeadm upgrade apply v1.11.0
.Anything else we need to know?:
This cluster already received the following upgrade paths:
I moved
kube-proxy
fromiptables
toIPVS
when running1.10.2
.I downgraded
kubelet
tov1.10.5
, upgraded again withkubeadm upgrade apply v.1.11.0
and it works, my cluster is healthy.Environment:
kubectl version
):bare metal
uname -a
):The text was updated successfully, but these errors were encountered: