Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

K3OS single node cluster broken on-its-own #718

Open
kallex opened this issue Jun 11, 2021 · 8 comments
Open

K3OS single node cluster broken on-its-own #718

kallex opened this issue Jun 11, 2021 · 8 comments

Comments

@kallex
Copy link

kallex commented Jun 11, 2021

I'm running K3OS currently in single code cluster; kubehost and one master node running on default installation - on Hyper-V. This setup has been running without issues for quite a while.

Now suddenly today when doing Windows-required normal reboot, the cluster failed to come up.

Errors I can dig out are:

My usually working portforward:

error: error upgrading connection: error dialing backend: x509: certificate is valid for kubehost, localhost, not k3os-12063

kubectl nodes =>

NAME STATUS ROLES AGE VERSION
k3os-12063 NotReady master 493d v1.19.5+k3s2
kubehost Ready master 40m v1.19.5+k3s2

I cannot find any way to check what's failing in the server's "embedded" agent.

I tried to run manual upgrade on the cluster to see if it helps the situation, I could get Kubernetes Dashboard to run then (now it's erroring too), but agent still stayed as NotReady.

If someone could point me to right direction, would be great.

@kallex
Copy link
Author

kallex commented Jun 15, 2021

I removed the NotReady master node and dashboard works now (the remaining kubehost master seems to behave better).

Now I'm trying to find the proper documentation about how to (re-)attached the installed agent node back - that's running with the same host as the master.

@kz159
Copy link

kz159 commented Jun 18, 2021

how did you removed NotReady?
mine is struggling with

Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Fri, 18 Jun 2021 22:45:02 +0000   Fri, 18 Jun 2021 22:45:02 +0000   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Fri, 18 Jun 2021 23:00:44 +0000   Fri, 18 Jun 2021 23:04:30 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Fri, 18 Jun 2021 23:00:44 +0000   Fri, 18 Jun 2021 23:04:30 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Fri, 18 Jun 2021 23:00:44 +0000   Fri, 18 Jun 2021 23:04:30 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Fri, 18 Jun 2021 23:00:44 +0000   Fri, 18 Jun 2021 23:04:30 +0000   NodeStatusUnknown   Kubelet stopped posting node status.

@kallex
Copy link
Author

kallex commented Jun 19, 2021

NOTE! It might not improve things, might get worse! But I don't have anything important on my cluster, so I'm more or less experimenting. Also I have data on PV, which I expect not to die from node removes (but they might, because it was a master node).

I was preparing to reinstall the node (and have hopes of reinstalling would bring workloads back on it), so just deleted it.

kubectl delete node node-name-here

@thanhtoan1196
Copy link

same here

image

@dweomer
Copy link
Contributor

dweomer commented Sep 1, 2021

I haven't worked much with Hyper-V but it looks as if k3OS failed to detect a hostname override via config (assuming that you have something like hostname: kubehost in your /k3os/system/config.yaml

@patrik-upspot
Copy link

I have the same problem if i restart my VServer of netcup (https://www.netcup.de/vserver/vps.php -> VPS 4000 G9).
I have a ticket -> #734
After typing "sudo k3os config"
i can delete the wrong node and the old one oes up correctly.

@rickard-von-essen
Copy link

I can confirm this too.

After first reboot the hostname changes from k3os to k3os-21898 and K8s thinks there is two master nodes, one unavailable. I didn't have any hostname specified in my config.yaml.

@rickard-von-essen
Copy link

The hostname sees to come from this line boot#L132

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants