Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Node deletion does not clear up the IPs #3372

Closed
alok87 opened this issue Aug 8, 2018 · 19 comments
Closed

Node deletion does not clear up the IPs #3372

alok87 opened this issue Aug 8, 2018 · 19 comments
Labels
Milestone

Comments

@alok87
Copy link
Contributor

alok87 commented Aug 8, 2018

Node on getting deleted/terminated by autoscaler, or manual termination should clear up the Pod IPs. But it is not happening.

Related - #2797 (comment)

Versions:

  • kubernetes version: 1.9.8
  • provisioned using kops: 1.9.2
  • weave version which kops provisioned: weaveworks/weave-kube:2.3.0
  • Node AMI is the default debian AMI kops provide(1.9 image): kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
  • Kernel version:
admin@ip-10-0-21-54:~$ uname -a
Linux ip-10-0-21-54 4.4.78-k8s #1 SMP Fri Jul 28 01:28:39 UTC 2017 x86_64 GNU/Linux
  • Kubectl version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T19:01:12Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-14T06:36:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

What you expected to happen?

I expected the dead nodes IPs to get cleared

$ kubectl get nodes
NAME                                             STATUS    ROLES     AGE       VERSION
ip-10-0-20-119.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-20-155.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-20-172.ap-southeast-1.compute.internal   Ready     master    1d        v1.9.8
ip-10-0-20-186.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-20-203.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-20-207.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-20-67.ap-southeast-1.compute.internal    Ready     node      1d        v1.9.8
ip-10-0-21-119.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-21-120.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-21-142.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-21-165.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-21-242.ap-southeast-1.compute.internal   Ready     master    1d        v1.9.8
ip-10-0-21-252.ap-southeast-1.compute.internal   Ready     node      1d        v1.9.8
ip-10-0-21-59.ap-southeast-1.compute.internal    Ready     node      1d        v1.9.8
ip-10-0-40-135.ap-southeast-1.compute.internal   Ready     master    1d        v1.9.8
admin@ip-10-0-21-252:~$ curl -s 'http://127.0.0.1:6784/status/ipam' | grep 'unreachable\!$'
12:26:5c:08:f3:b4(ip-10-0-21-72.ap-southeast-1.compute.internal)   131072 IPs (06.2% of total) - unreachable!

Expected IPs to get cleared for 10.0.21.172 which got deleted(scaled down and terminated).
We have to manually clean the nodes IPs using curl -H "Accept: application/json" -X DELETE 'http://localhost:6784/peer/ip-10-0-21-72.ap-southeast-1.compute.internal

What happened?

The nodes were deleted but still showed the IPs as not cleared

How to reproduce it?

Delete a node
Check the IPs get cleared for it or not

Anything else we need to know?

CloudProvider: aws

@bboreham I would like to contribute to fixing the problem here. Let me know if i can take this up.

@redi-vinogradov
Copy link

I'm also seeing this issue with weave 2.4.0, kops 1.9.1, k8s 1.9.9

@murali-reddy murali-reddy added this to the 2.5 milestone Aug 9, 2018
@murali-reddy
Copy link
Contributor

@alok87 if you have a fix in mind please go ahead and raise a PR. Otherwise i am happy to pick this up and get it fixed for next release.

@alok87
Copy link
Contributor Author

alok87 commented Aug 9, 2018

@murali-reddy Cool i can take this up in few days.
One question - what impact this unreachable IPs have in the request being routed in the cluster having this issue where in total 86% in total of nodes IPs have become ubreachable.

@murali-reddy
Copy link
Contributor

murali-reddy commented Aug 9, 2018

@alok87 I don't think there is any impact.

They are not exactly unreachable. Once a node goes down, depending on nature of the deployment pods will get rescheduled to other nodes. Application will continue to work.

Once a new node joins up, there will be readjustment (reclaim unused/unreachable ip address range) so 100% of subnet is usable for pod's.

For e.g, with kops provisioned cluster with auto-scale group set to minimum of 3 instances . this is the ipam status I see once i delete a node, and after node is re-provisioned.

admin@ip-172-20-51-168:~$ curl http://127.0.0.1:6784/status/ipam
9e:b9:85:2c:70:1b(ip-172-20-51-168.us-west-2.compute.internal)   524288 IPs (25.0% of total) (5 active)
b6:d8:57:4e:85:3d(ip-172-20-57-116.us-west-2.compute.internal)   786432 IPs (37.5% of total)
d2:3c:82:07:18:da(ip-172-20-81-221.us-west-2.compute.internal)   524288 IPs (25.0% of total) - unreachable!
da:97:1b:c4:96:6b(ip-172-20-43-182.us-west-2.compute.internal)   262144 IPs (12.5% of total)
admin@ip-172-20-51-168:~$ curl http://127.0.0.1:6784/status/ipam
9e:b9:85:2c:70:1b(ip-172-20-51-168.us-west-2.compute.internal)   524288 IPs (25.0% of total) (5 active)
b6:d8:57:4e:85:3d(ip-172-20-57-116.us-west-2.compute.internal)   786432 IPs (37.5% of total)
52:97:88:4a:50:36(ip-172-20-65-39.us-west-2.compute.internal)   524288 IPs (25.0% of total)
da:97:1b:c4:96:6b(ip-172-20-43-182.us-west-2.compute.internal)   262144 IPs (12.5% of total)

Do you see any thing problematic?

@redi-vinogradov
Copy link

From our experience, there is an impact. In case then pods are already allocated with IPs from some subnet block and after that, node responsible for this subnet goes down - pods are not able to communicate with others anymore.
For now, we are using a shell script which just deletes unreachable nodes and that subnet range is associated with a node which executed delete command.

@bboreham
Copy link
Contributor

bboreham commented Aug 9, 2018

@redi-vinogradov this should not happen. Weave Net implements communication at Layer 2 - MAC to MAC, not via subnets.

Please open another issue giving details of your install and log files from the time you experienced a communication problem.

@alok87
Copy link
Contributor Author

alok87 commented Aug 9, 2018

@murali-reddy What about this #2797

In a situation such as a regularly expanding and contracting auto-scale group, the IPAM ring will eventually become clogged with peers that have gone away.
  1. Won't this be a problem in the routing of requests.
  2. I have never understood really what is the role of weave in request routing. Does it come into the path of the request, when requests are routed using IP-tables.

@bboreham
Copy link
Contributor

If the clean-up does run when a new node starts, this the same as #3171.
If it doesn't, please post the logs of the weave container on that new node.

@alok87
Copy link
Contributor Author

alok87 commented Aug 12, 2018

@bboreham Sure i will get back with more info if I find a node provisioned with the same IP and its ips are not reclaimed.

But is not such nodes lying there in the IPAM ring a problem. Is not it a problem if other peers could not connect too so many dead nodes in their peer to peer connection list.

/home/weave # ./weave --local status ipam | grep unrea | grep 21.5
9a:0d:e4:d6:dd:a7(ip-10-0-21-54.ap-southeast-1.compute.internal)    49152 IPs (02.3% of total) - unreachable!

/home/weave # ./weave --local status connections | grep failed | grep 21.5
-> 10.0.21.167:6783      failed      dial tcp4 :0->10.0.21.167:6783: getsockopt: connection timed out, retry: 2018-08-12 19:21:58.008282969 +0000 UTC

No solid evidence yet but what we have observed is when we cross 50 nodes and there so many unreachable nodes the network overhead by the kubernetes n/w tend to increase by around 20ms.
It reduces when we clear this pool of IPs from this dead nodes and also when the no of nodes reduce.

I will come back with the true evidence for this observation.

@bboreham
Copy link
Contributor

is not such nodes lying there in the IPAM ring a problem

No.

Is not it a problem if other peers could not connect too so many dead nodes in their peer to peer connection list.

You showed one node, not "so many".
Not being able to connect is expected, in a distributed system. Weave Net can cope, so long as it doesn't run out of free IP addresses entirely.

network overhead by the kubernetes n/w tend to increase by around 20ms.

I know of no mechanism to connect unreachable nodes to packet latency.

I'll close this for now; please re-open or open a new issue when you have evidence of a problem.

@alok87
Copy link
Contributor Author

alok87 commented Aug 21, 2018

The root cause of our problem of latency was because of many nodes switched back to sleeve mode and never returned to fastdp - #1737

As @bboreham said node deletion does not clear IPs but reclaims them when a new node comes with same IP and it has no impact on performance.

Thank you all

@bboreham
Copy link
Contributor

Re-opening because we don't actually seem to have an issue that duplicates this (#3171 is a PR)

@bboreham
Copy link
Contributor

bboreham commented Aug 28, 2018

reclaims them when a new node comes with same IP

Sorry, I don't know where you got "with same IP" from; it doesn't form part of the story that I recognize. Reclaim happens when any Weave Net pod starts.

@alok87
Copy link
Contributor Author

alok87 commented Aug 29, 2018

@bboreham ok.. wanted to mean the same.

@bboreham
Copy link
Contributor

#3386 is a specific case which matches the title of this issue.

@bboreham bboreham added the bug label Sep 11, 2018
@alok87
Copy link
Contributor Author

alok87 commented Oct 19, 2018

We were facing 4-5 hours of increased latency in the kubernetes network. Spent couple of hours fighting it.

  • We upgraded weave to 2.4.1 from 2.3.0 - did not help.
  • We moved our pods to different nodes/new nodes - did not help.
  • Could not find any weave pod using sleeve connection - all were fastdp
  • We moved the pod to a different cluster - dropped (problem confirmed in the current cluster)
  • We moved the pod back to the current cluster and removed the unreachable dead nodes by using curl -H "Accept: application/json" -X DELETE 'http://localhost:6784/peer/<IP>' - Request queuing dropped in the current cluster

Looks like the deleted nodes if not removed from weave results in network latency. Not really sure it should or it should not but removing did work for us.

@murali-reddy
Copy link
Contributor

Looks like the deleted nodes if not removed from weave results in network latency

How did you come to this conclusion? Is this something easily reproducible and what scale?

In general dealing with deleted node is a control-plane aspect of Weave I dont see any reason why it should have any impact on data-plane.

@alok87
Copy link
Contributor Author

alok87 commented Oct 22, 2018

@murali-reddy We have faced this issue of request queing increasing multiple times. As soon as we removed the dead nodes from weave network, latency dropped to the old values.

@bboreham
Copy link
Contributor

bboreham commented Nov 1, 2018

Fixed by #3399

(the originally described issue is fixed, not any other symptoms mentioned in comments)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants