-
Notifications
You must be signed in to change notification settings - Fork 674
Weave restarting after a few hours #2768
Comments
Interesting chatter in dmesg:
Question
I think I have seen this error by kubelet and it is not an error:
This is an odd message
Seeing the double remove errors where cni is trying to remove the network after it has already been removed. Have not seen these errors before:
Specifically weave-npc is restarting - and I have zero logs on weave-npc on 243. Kernel and ethtool
|
I only have logging in one weave-npc pod, and I am getting:
|
Another question kubernetes/kops#1171 should we be using nonMasqueradeCIDR for the weave subnet block? |
Hi Chris, The below errors basically mean that Weave cannot reach Kubernetes, so let's proceed step by step, backwards from this information.
Given the above was logged by Weave's
Steps:
Detailed steps:
In any case, let's first get your reply regarding the above. |
None, from ip-172-20-79-243 - it can curl the API server. Checking the other node that is cordoned. The other node does not have any communication problems to the master either.
It does not timeout now. Does not now, and we had another weave pod restart last night. What are the weave diagnostic commands for weave-net? |
Did not mention the k8s version:
|
One of the commands to debug weave is |
Two options:
|
So yesterday at 17:47 MT
And this morning 7:54:29 MT
And we lost another node ip-172-20-81-117.us-west-2.compute.internal - which is running weave-net-y6mru . I am able to get the logs for also @bboreham this is a shot in the dark but could this be impacting this kubernetes/kops#1171 |
kops#1171 is about reconfiguring the IP address range? I can't see why that would be relevant. Weave peers all need to be configured with the same IP range, but there is a specific error message if they are mismatched, and you're not getting that message. |
With help I am getting more logs:
|
Any idea of some keywords that I can search for? |
@bboreham you asked about hairpin
|
Nothing is standing out to me in the logs ... hopefully we can dig around in kibana when we talk |
"Stack dump" or just "dump" would indicate weave net had crashed. Or just the end of the log before a restart. In the kubelet logs I would be interests in what happened around the time the errors start (06:57), or any errors after the first minute or so when you can get some noise due to things not being ready yet. |
On ip-172-20-81-117: Tried setting some kernel params to see if it would have any positive effects on this issue (pods are failing suddenly on this cluster):
I restarted all of the weave pods on the cluster. Pods worked again for about 5 minutes, then began to fail again. The number of sockets in either TIME_WAIT or CLOSE_WAIT was never high (<64 in total) but after the kernel params were tuned, TIME_WAIT went to 0 and only 37 sockets remained in CLOSE_WAIT. The number of sockets, in some WAIT state, on the other nodes in the cluster were much higher:
I curl'd the weave status endpoint and got the following results: root@ip-172-20-81-117:~# curl http://127.0.0.1:6784/status
PeerDiscovery: enabled
DefaultSubnet: 10.32.0.0/12 Why can weave talk to itself??? I set the same kernel params on each node in the cluster: after 1 min: ip-172-20-100-20.us-west-2.compute.internal: 296 after 3 min: ip-172-20-100-20.us-west-2.compute.internal: 288 after 5 min: ip-172-20-100-20.us-west-2.compute.internal: 290 I deleted all of the weave pods again. The process overall did NOT fix the stability issues on the original node. Nor did the output from the /status/connections change on the original node: Maybe there were more steps to get weave to restart completely that I didn't know to take? I'd love to hear that data from this experiment lead someone in some direction, even if just proves that it's not Socket depletion that's the culprit (but it sure feels like it to me). |
Assuming you mean "why am I seeing this one line above", the reason is that "does this IP address connect back to myself" is a question that cannot in general be resolved without trying it (given NAT, public/private IPs, etc), so we don't bother to optimise out that case, we just try the address and blacklist it if it's us. |
In the netstat logs I was sent, far and away the most sockets in use (thousands) were between kube-apiserver and kubelet on port 10250. I did not determine whether this is abnormal. However, those dumps were from mostly-working nodes; it would be interesting to set up a netstat dump every five minutes on all nodes and then look at the last one before a failure. |
Or if you are scraping node metrics from |
To update this. The current problem we are diagnosing is that there is a delay making a connection from the host to the pod networks.
The problem does not seem to be between pods, but between kubelet and the pods. Now I asked about the noncidr address space previously. Does that impact this routing? How is the route / bridge configured from the host to the pod network? How can we compare a working system to this broken system?? |
From a packet capture you supplied, I can see two interfaces have the same IP address - 10.37.0.0 - which is disrupting the conversations. Can you please run inside the weave-kube pod:
to list all the veths connected to the 'weave' bridge. |
It's a straightforward Linux bridge named The bridge itself is then given an IP address in the same subnet as the pods, which creates an entry in the host's route table for that subnet (10.32.0.0/12). |
To update the issue. For some reason multiple ip addresses existed on the weave bridge
The work around is to remove the duplicate interfaces. We need to document more on the fix and mitigation. |
@bboreham do you want a separate issue open for this bug? |
Once we identify specific bugs they should be separate; for now it's fine to keep all the comments in one place. |
@bboreham any idea on how the duplicate ip was introduced, and what is a good mitigation process? |
It has taken 12 or so hours to have the weave pods start to fail. So you may want to let a cluster run overnight. Also this seems to be only occuring on clusters that have some load on them. |
Chris supplied this event notification log:
This led to #2794 which is my current favourite explanation for "restarting after a few hours": after a Demonset pod is evicted, it is immediately scheduled back to the same node. |
Interesting. Is the restarting actually that bad? In "plain" weave that would mostly go unnoticed by applications, but perhaps not in a k8s setting? |
I believe you mean that when fastdp is in use, already-connected pods would not experience any effect from Weave Net restarting. If Kubernetes took some time to restart Weave Net (which is plausible if the machine is very loaded) then new pods could not connect and existing pods could not start new connections with remote pods. However the eviction deletes Weave's persistence file which would otherwise avoid the race at #2784. Once you hit that race the node is unusable until rebooted. |
Even without fastdp, most running apps will just see a blip in throughput.
Right. That would be bad.
I see. One would hope that with the right app architecture and health-checking etc policies this shouldn't cause significant disruption. But we have now kicked off a chain of events with unpredictable outcomes.
ouch |
@bboreham during the original issue we were not seeing evictions at all we were seeing weave stopping itself due to the inability to connect to the api server. The eviction issue is a new issue due to over allocating resources. |
Closing this as it should be fixed in 1.9.1 |
@chrislovecnm we are using weave 1.9.4 and we are facing the exact same issue with weave |
May I ask why you are running that specific version? I am curious because we are seeing a lot of 1.9.4 out there and have been wondering for a while why that is the case. As for the problem you are seeing, I believe the answer is in @bboreham's #2768 (comment). |
@rade we installed We initially thought eviction could be the reason as our logs were getting spammed. But the issue was happening even after we controlled log spamming(we faced only disk evictions). Issue has reduced after we have
|
That's odd. kops should be installing Weave Net 1.9.8. as of kops 1.6.2. Are you running an older version?
The issue @bboreham mentioned is #2794, which is resolved Weave Net 1.9.6. I suggest you use the latest 1.6.x kops, which, as per the above, should install Weave Net 1.9.8. However, note that the fix actually isn't in Weave Net itself but only the documentation, namely we added a section on pod eviction. So I suggest you check that the yamls for the Weave Net Daemonset deployed in your cluster do actually set resource limits as described in the docs.
Sounds like #3073. |
Hi all
We are having weave-npc restarting on some nodes after about 10 - 12 hrs. We are running 1.8.2 this is an error message I was able to dig up:
kube-dns has been failing with UDP network connection errors, and a Zookeeper member was in CrashLoop because of the network. We have cordoned the nodes off and would like to diagnose the problem.
What steps should we go through to diagnose the problem?
What weave commands can we run on the nodes?
What tools besides tcpdump do you usually use?
Working on collecting more information on this issue.
The text was updated successfully, but these errors were encountered: