-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS Advanced Networking model leads to frequent port exhaustion issues #637
Comments
@strtdusty, could this have something to do with the maximum pods per node limitation for advanced networking? I noticed that you are listing ~30 pods, which is the limitation. (I haven't tried this myself, but I know of this limitation since we want to move to AKS and advanced networking, but are a bit skeptical of the 30 pod limitation. Update: I see now that this can be increased, so I guess it's no problem.) |
@bremnes If it is due to the 30 pod limit then we are going to be in real trouble when we rebuild the cluster with a 100 pod limit. But yes, my understanding of the SNAT issue is that it will get worse with the more pods we load on a node. If my reading is correct, if we have a 101 node cluster, we will get 256 SNAT ports per VM/node. Assuming ~8 outbound connections per pod and 100 pods per node I would need at least 4 public IPs. Does this seem correct/expected? https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#pat Also note that you cannot change the pod limitation on an existing cluster. |
Wanted to add that while it doesn't seem documented, the maximum value for the maximum number of pods per node when using advanced networking seems to be around 250. This is because as soon as a node is added to the cluster, every single possible IP (based on the number of pods setting) is taken from the vnet subnet and added to the network interface of the node; hitting thus the limit of maximum number of IPs per network interface in Azure. I guess that's probably one good reason why the setting cannot be changed after cluster creation. |
It is documented and the limit is 110. I think that is poor reasoning on limiting the pods/node, however. Who is to say that I cannot efficiently handle all of my allocated IPs in 100 nodes vs the 145 nodes that would be allowed in a vnet with a 110 limit? There is just a limit of 16,000 (used to be 4,000) IP per vnet. How I allocate those across nodes should not be dictated. Our use case doesn't demand a higher density but there are probably people out there who do need it. I agree this is probably why you can't change the density after creation. I would hope that when node pools are available you would be able to have different density on different pools. |
@strtdusty That's actually incorrect, in my experience. That document talks about the default values, not the actual limits. I have a cluster I created a few weeks ago with And Azure CNI: |
You are right @tomasr I was looking at the default limits. |
Closing this issue as old/stale. If this issue still comes up, please confirm you are running the latest AKS release. If you are on the latest release and the issue can be re-created outside of your specific cluster please open a new github issue. If you are only seeing this behavior on clusters with a unique configuration (such as custom DNS/VNet/etc) please open an Azure technical support ticket. |
What happened:
We experience latency on egress network connections and refused ingress connections due to port exhaustion issues on our single public IP.
What you expected to happen:
Communication to be unhindered by port exhaustion issues.
How to reproduce it (as minimally and precisely as possible):
Using advanced networking egress through a single PIP. We currently run a 6 node cluster with ~30 pods per node. Each node has ~8 outbound connections (to things like service bus, azure storage, management API etc).
Anything else we need to know?:
We use AKS with advanced networking with egress traffic going through a single basic load balancer/PIP. This model allows us to evaluate all traffic using a next-gen firewall. I know that AKS has done some changes recently around azureproxy to help limit the traffic to the master nodes but this has only slightly helped the issue.
Environment:
kubectl version
): 1.10.2The text was updated successfully, but these errors were encountered: