-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling update puts nodes into "not ready" #4946
Comments
In our case manually restarting the kubelet helped, do you have the logs of an affected node? |
Not from the current test runs (cluster has been deleted and created a couple of times). But I can recreate this, keep the logs and attach them to this issue. Nevertheless, I would be interested how to prevent this from happening. We need to do some changes to a production cluster where a restart of a kubelet seems rather inappropriate. |
Understandably, we are also started seeing this upgrading to kubernetes 1.8 (1.8.10 currently) and I’m currently debugging what could cause this. It looks like in our case the kubelet tries to connect to an old API server IP so either its caching the dns resolution somehow too long (TTL should only allow 60s) or the record wasn’t updated correctly. |
Thanks for the pointing this out. I will see if this is the same for us first thing in the morning. My TZ is CEST. |
Thanks for reporting & sorry about the problem. Was this with a gossip DNS (.k8s.local) or a "real" Route53 DNS name? |
In our case a "real" Route53 record. Also restarting the kubelet almost immediately fixed the issue, while just waiting took up to 15mins for the node to be marked as We are running kops 1.9.0-beta.2. |
Real Route53 DNS name. |
@johanneswuerbach seems the same for us - kubelet trying to connect to an old API server IP. Trying to verify this now. |
Could you check whether the internal master DNS contains the IPs of the new masters or is still returning an old one? |
We also hit this on another node again:
The IP is the IP of the node itself. |
That is exactly the error msg I see. It starts to appear when the old IP address is removed from the A record for api.internal.xxx and the new IP address for the new master is added. Sometimes after the first master, sometimes after the second master. |
It's probably due to: kubernetes/kubernetes#41916 (comment) where the kubelet caches the IP of the old master nodes. That's why a restart fixes it. |
I had the same issue today with 1.9.0-beta-2 |
I think the best practice is to set up an internal ELB that fronts the masters and have the API url point to that, the same way it's done for the external api. Is that possible with kops right now? |
I don’t think so. The type load balancer for the api in the spec refers to the client (kubectl) AFAIK. At least in a quick test I still got DNS round robin based entries for the api that the kubelet used. I think as well an ELB for the kubelet to connect to would be the “right” way to go. At least it is the way kubeadm does HA nowadays (if though manually still). The ELB would detect that a master is gone through the health check, break the connection and force the kubelet to reconnect, wouldn’t it? What would be needed? How much work would it be? Are there any pointers to start? I wouldn’t mind to give it a try. But would need instructions. |
Just did a rolling update from 1.9.0-beta-2 to 1.9.0 and the same issue all of my nodes go from Ready to not ready. @chrislovecnm @justinsb have you tried a master rolling update with 1.9.0 by chance on AWS? The first time I ever noticed this issue was with 1.9.0-beta-2, but all nodes go into Not Ready which takes down every service in the cluster. |
I can confirm it happens for 15mins or until the kubelet is restarted. |
15m is what is expected for the issue with kubelet caching IPs: kubernetes/kubernetes#41916 (comment) |
Just realised that the same problem affects the kube-proxy, btw. |
Our current hypothesis for a workaround is to create new temp nodes and lock these to only one of the masters by overriding the dns name of the api server in /etc/hosts. Then migrate all the pods to these new temp nodes by draining the old nodes. These will free up two of the master nodes for a rolling update without causing interruption due to the old nodes becoming “not ready”. Once the 2 masters are done the old nodes can be lock to the new 2 masters and the pods moved back to them. Freeing up master 3 for a rolling update. And finally the temp nodes can be deleted. Cumbersome and ugly... but it works. Nevertheless we should consider doing the LB for the node to master communication as it is also the nowadays recommended way for doing HA with kubeadm, for example. |
Ran into this today with a beta environment deploy -- thankfully nothing broke in our production env, but certainly not a good sign... Any detailed fixes for this? |
Just to add another occasion where this can happen: when "updating" from kops 1.8 to kops 1.9 and performing the required rolling-update. As first all masters are restarted/recreated, the nodes can become not ready if kubelet/kube-proxy was/is talking to the corresponding, restarting master. |
We are also being hit by this and its causing our own api's to have downtime when the masters come back up from a termination. I do think that putting an elb on the internal api endpoint would help in this case as well. |
Looks like a fix has been merged and a pr is open to backport to 1.9 |
Approved and cherry-picked as well. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Kops version 1.8.0
Kubernetes version 1.8.6
AWS (3 masters and 3 nodes)
kops edit followed by kops update and kops rolling-update. kops edit to add configuration flags for the apiserver (dex related). Also tried kops rolling-update --instance-group <master...> to only update one master at a time.
Nodes become "not ready" in an unpredictable way. Sometimes no node is affected. Sometimes one node becomes "not ready" and recovers after a few minutes. Sometimes all nodes are "not ready" for a longer period. Up to 15 minutes. While the masters report ready. During this time the workload on the cluster is not accessible.
Nothing: a non-breaking rolling update without affecting nodes or the workload.
Starting config: https://gist.github.com/recollir/9e9b4b0b426ef77014083f1839c123d6
Added via kops edit before the rolliing-update: https://gist.github.com/recollir/da9fd8a123b58f555f2e4321093e9d46
https://gist.github.com/recollir/5b19d543adaa50b1889aabafeb77b847
A couple of times I observed that after the rolling update the ELB for the API server was missing AZ attached to it.
The text was updated successfully, but these errors were encountered: