You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before creating an issue, make sure you've checked the following:
You are running the latest released version of k0s
Make sure you've searched for existing issues, both open and closed
Make sure you've searched for PRs too, a fix might've been merged already
You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.
Platform
ALL
Version
1.30.0
Sysinfo
`k0s sysinfo`
➡️ Please replace this text with the output of `k0s sysinfo`. ⬅️
What happened?
When k0s running with controller+worker nodes, autopilot fails to update. Say you have 3 controller+worker nodes. When autopilot updates the first one, it is very likely that the node will lose it's leadership. If it was a leader in the first place.
Now when autopilot updates the k0s binary and restarts it (via systemd/openrc),t he controller parts does start properly. The worker part will NOT start properly as it does not find the worker-config-default-1.30 configmap. This is because the controller does not apply the new version of the CM as it is not the leader. Hence we reach a "deadlock", autopilot cannot proceed as the first controller does not successfully update.
This doesn't happen 100% of the time, if one is lucky enough that autopilot updates first the controller that is a leader and everything is fast enough so that node does not lose its leadership it will succeed.
Steps to reproduce
Install 3 nodes with controller-worker with version 1.29.4 (the version does not really matter AFAIK)
Autopilot gets into "deadlock" where the worker part on an updated controller does not start properly unless the updated node happens to also be the leader.
Screenshots and logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
I've noticed this error, probably meaning that kubelet can't connect to kube-apiserver. Looks like, kubelet won't be operational even if the worker-config configmap is present
May 16 14:16:48 k0s-cluster-ctr2 k0s[1761293]: time="2024-05-16 14:16:48" level=info msg="E0516 14:16:48.793590 1761329 authentication.go:73] \"Unable to aut
henticate the request\" err=\"[invalid bearer token, service account token has been invalidated]\"" component=kube-apiserver stream=stderr
NB: This is not Autopilot doing something wrong, it is a general conceptual problem with updating HA clusters using controller+worker nodes that Autopilot runs into: the worker parts of newly added controller+worker nodes won't get ready, so Autopilot fails to proceed with the update.
Before creating an issue, make sure you've checked the following:
Platform
Version
1.30.0
Sysinfo
`k0s sysinfo`
What happened?
When k0s running with controller+worker nodes, autopilot fails to update. Say you have 3 controller+worker nodes. When autopilot updates the first one, it is very likely that the node will lose it's leadership. If it was a leader in the first place.
Now when autopilot updates the k0s binary and restarts it (via systemd/openrc),t he controller parts does start properly. The worker part will NOT start properly as it does not find the
worker-config-default-1.30
configmap. This is because the controller does not apply the new version of the CM as it is not the leader. Hence we reach a "deadlock", autopilot cannot proceed as the first controller does not successfully update.This doesn't happen 100% of the time, if one is lucky enough that autopilot updates first the controller that is a leader and everything is fast enough so that node does not lose its leadership it will succeed.
Steps to reproduce
Expected behavior
Autopilot successfully updates controller+worker nodes
Actual behavior
Autopilot gets into "deadlock" where the worker part on an updated controller does not start properly unless the updated node happens to also be the leader.
Screenshots and logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: