-
Notifications
You must be signed in to change notification settings - Fork 678
Node deletion does not clear up the IPs #3372
Comments
I'm also seeing this issue with weave 2.4.0, kops 1.9.1, k8s 1.9.9 |
@alok87 if you have a fix in mind please go ahead and raise a PR. Otherwise i am happy to pick this up and get it fixed for next release. |
@murali-reddy Cool i can take this up in few days. |
@alok87 I don't think there is any impact. They are not exactly unreachable. Once a node goes down, depending on nature of the deployment pods will get rescheduled to other nodes. Application will continue to work. Once a new node joins up, there will be readjustment (reclaim unused/unreachable ip address range) so 100% of subnet is usable for pod's. For e.g, with kops provisioned cluster with auto-scale group set to minimum of 3 instances . this is the ipam status I see once i delete a node, and after node is re-provisioned.
Do you see any thing problematic? |
From our experience, there is an impact. In case then pods are already allocated with IPs from some subnet block and after that, node responsible for this subnet goes down - pods are not able to communicate with others anymore. |
@redi-vinogradov this should not happen. Weave Net implements communication at Layer 2 - MAC to MAC, not via subnets. Please open another issue giving details of your install and log files from the time you experienced a communication problem. |
@murali-reddy What about this #2797
|
If the clean-up does run when a new node starts, this the same as #3171. |
@bboreham Sure i will get back with more info if I find a node provisioned with the same IP and its ips are not reclaimed. But is not such nodes lying there in the IPAM ring a problem. Is not it a problem if other peers could not connect too so many dead nodes in their peer to peer connection list.
No solid evidence yet but what we have observed is when we cross 50 nodes and there so many unreachable nodes the network overhead by the kubernetes n/w tend to increase by around 20ms. I will come back with the true evidence for this observation. |
No.
You showed one node, not "so many".
I know of no mechanism to connect unreachable nodes to packet latency. I'll close this for now; please re-open or open a new issue when you have evidence of a problem. |
Re-opening because we don't actually seem to have an issue that duplicates this (#3171 is a PR) |
Sorry, I don't know where you got "with same IP" from; it doesn't form part of the story that I recognize. Reclaim happens when any Weave Net pod starts. |
@bboreham ok.. wanted to mean the same. |
#3386 is a specific case which matches the title of this issue. |
We were facing 4-5 hours of increased latency in the kubernetes network. Spent couple of hours fighting it.
Looks like the deleted nodes if not removed from weave results in network latency. Not really sure it should or it should not but removing did work for us. |
How did you come to this conclusion? Is this something easily reproducible and what scale? In general dealing with deleted node is a control-plane aspect of Weave I dont see any reason why it should have any impact on data-plane. |
@murali-reddy We have faced this issue of request queing increasing multiple times. As soon as we removed the dead nodes from weave network, latency dropped to the old values. |
Fixed by #3399 (the originally described issue is fixed, not any other symptoms mentioned in comments) |
Node on getting deleted/terminated by autoscaler, or manual termination should clear up the Pod IPs. But it is not happening.
Related - #2797 (comment)
Versions:
1.9.8
1.9.2
weaveworks/weave-kube:2.3.0
kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
What you expected to happen?
I expected the dead nodes IPs to get cleared
Expected IPs to get cleared for
10.0.21.172
which got deleted(scaled down and terminated).We have to manually clean the nodes IPs using
curl -H "Accept: application/json" -X DELETE 'http://localhost:6784/peer/ip-10-0-21-72.ap-southeast-1.compute.internal
What happened?
The nodes were deleted but still showed the IPs as not cleared
How to reproduce it?
Delete a node
Check the IPs get cleared for it or not
Anything else we need to know?
CloudProvider:
aws
@bboreham I would like to contribute to fixing the problem here. Let me know if i can take this up.
The text was updated successfully, but these errors were encountered: