Performance degradation for high levels of in-cluster kube-apiserver traffic #620

slack · 2018-08-27T16:15:15Z

The AKS team is aware of performance issues when in-cluster components (like Tiller, Istio) generate a large amount of traffic to the Kubernetes API. Symptoms include slow in-cluster API responses, slow Kubernetes dashboard, long-running API watches timing out, or an inability to establish an outbound connection to the AKS cluster's external API endpoint.

In the short term, we have made a few changes to the AKS infrastructure that is expected to help, but not eliminate these timeouts. Updated configuration will begin global rollout on in the coming weeks. Customers who create new clusters or upgrade existing clusters will automatically receive this updated deployment.

In parallel, engineering is working on a long-term fix. As we make progress, we will update this GitHub issue.

stdistef · 2018-08-29T20:08:53Z

Were any changes made to azureproxy? How can customers check if they were upgraded?

strtdusty · 2018-08-29T21:10:20Z

The azureproxy deployment has disappeared from two of my clusters and all kube-svc-redirect pods are in CrashLoop... is this the effect of the upgrade?

derekperkins · 2018-08-29T21:36:29Z

see #626

m1o1 · 2018-08-30T15:35:46Z

Is this related to / responsible for #455, #522, and #577 ?

blackbaud-brandonstirnaman · 2018-09-04T23:12:32Z

Any updates on this issue?

strtdusty · 2018-09-05T22:36:50Z

Can you please add some color around the issue you are still chasing? I opened #637 today and am wondering if it is related.

slack · 2018-09-13T22:44:55Z

Quick update, thanks for your patience. The configuration changes for azureproxy were released and completed global deployment on 8/31. All AKS clusters that were upgraded or created after that date have the new configuration applied automatically.

As part of the rollout there were some circumstances where incorrect limit/requests were set on kube-svc-redirect which prevented the updated Pod from starting (or crashing shortly thereafter). A hotfix for impacted clusters was deployed shortly thereafter.

What changed

We moved the azureproxy component from a K8s deployment into a sidecar as part of kube-svc-redirect DaemonSet
Updated per-AKS cluster connection timeouts to 10 minutes.

While we don't yet drop a release version in a customer-visible spot, you can check your kube-system namespace for kube-svc-redirect which should now consist of two pods:

$ kubectl -n kube-system get ds,po -l component=kube-svc-redirect
NAME                                     DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
daemonset.extensions/kube-svc-redirect   3         3         3         3            3           beta.kubernetes.io/os=linux   23d

NAME                          READY     STATUS    RESTARTS   AGE
pod/kube-svc-redirect-2lgl6   2/2       Running   0          15d
pod/kube-svc-redirect-6czxw   2/2       Running   0          15d
pod/kube-svc-redirect-8n69s   2/2       Running   0          15d

What this does

DeamonSet change

The azureproxy component lives in-cluster and is network local to all customer nodes in the VM, this nginx proxy is responsible for hauling traffic destined for kubernetes.local back to the AKS control plane. Under high enough connection rates, the VM running azureproxy could exhaust the allocated outbound ports, causing slowness and/or connection refused errors.

Now traffic destined for the K8s control plane will remain local to the host originating the traffic, spreading the "query load" across a larger number of Azure VMs. Note that this moves the goalposts rather than completely fixing the problem. This workaround performs better with more cluster members.

Connection timeouts

Previous idle connection timeouts were too conservative and impacted watches. The symptoms would appear as aborted or closed connections before a watch timeout. Under most circumstances, the controller/informer loop would re-initialize the watch and carry on. Timeouts are now set to a minimum of 10 minutes.

Going forward

We've seen a large decrease in connection related issues across the fleet, but aren't yet out of the woods. Engineering continues to work on long-term networking updates to fully address high-volume in-cluster K8s workloads.

strtdusty · 2018-09-13T22:50:51Z

@slack thanks a lot for the update. Understanding what is going with these changes is really important for us.

novitoll · 2018-09-14T06:25:50Z

Nice. Finally :) Guys, could you please confirm that restarting nodes of AKS is the way to get this patch?
CC: @slack

novitoll · 2018-09-19T07:41:51Z

We restarted our nodes to get the patch (verified that azureproxy is running inside of kube-svc-redirect) and noticed that the time of Helm deployments (with pods recreation) was decreased, but not constantly. Sometimes it still takes 6 mins.

13:12:36 + kubectl --namespace xxx rollout status deployment/xxx
13:12:38 Waiting for deployment spec update to be observed...
13:14:15 Waiting for rollout to finish: 1 old replicas are pending termination...
13:14:15 Waiting for rollout to finish: 1 old replicas are pending termination...
13:14:15 Waiting for rollout to finish: 1 old replicas are pending termination...
13:14:15 Waiting for rollout to finish: 0 of 1 updated replicas are available...
13:19:51 deployment "xxx" successfully rolled out
# ~ 7 mins

Could you please assist on this?

fhoy · 2018-10-24T09:48:35Z

Istio recently made some changes to reduce the number of watches set on the API-server (istio/istio#7675 (comment)), and this has had positive effects on stability in one of our AKS-clusters, at least. Is this relevant to how you continue to tune AKS, or was the impact of the number of watches already known? In any case, would it be possible to tune AKS to at least handle a somewhat higher number of watches?

digeler · 2018-10-31T22:05:52Z

Where is source code for svcredirect and azure proxy ?
Can you share the lists of prs?

charlspjohn · 2019-03-27T19:30:18Z

I have an AKS cluster in westeurope region with 58 nodes. My prometheus server scrapes are timing out regularly, mostly the cadvisor & node metric targets. (We have found similar timout in verious api related operations as well).

These kind of problems arose somewhere between the growth of this cluster from 14 nodes to 44 nodes. Since Prometheus is querying api server for scraping node & cadvisor metrics, the traffic caused by prometheus increased by almost ~5 times. Will this be a reason for api performance degredation ?

scrape conf samples:
https://github.com/helm/charts/blob/2fc288f1b2ad095d5b6dd0b60f9a3a16f663b049/stable/prometheus/values.yaml#L1028
https://github.com/helm/charts/blob/2fc288f1b2ad095d5b6dd0b60f9a3a16f663b049/stable/prometheus/values.yaml#L1065
(Prometheus scrape interval is 45Sec).

stdistef · 2019-03-27T20:01:28Z

I wonder if you are hitting a SNAT Port exhaustion? If using Advanced Networking for your cluster and UDR Routing to next hop to a NVA firewall, you could max out SNAT ports, maybe? The Azure Loadbalancer does have a metric for SNAP Port usage, assuming your NVAs are behind one.

stale · 2020-07-20T18:54:13Z

This issue has been automatically marked as stale because it has not had activity in 90 days. It will be closed if no further activity occurs. Thank you!

ghost · 2020-08-04T19:01:59Z

This issue will now be closed because it hasn't had any activity for 15 days after stale. slack feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

slack added the known-issue label Aug 27, 2018

This was referenced Aug 27, 2018

K8S Dashboard loads slowly, tiller unresponsive once istio 1.0.0 installed istio/istio#7675

Closed

K8S Dashboard loads slowly, tiller unresponsive once istio 1.0.0 installed #605

Closed

ayj mentioned this issue Aug 27, 2018

Istio galley failing liveness probes istio/istio#7586

Closed

sdake mentioned this issue Aug 27, 2018

Knative on AKS docs: istio installation istio-statsd-prom-bridge crashing knative/docs#358

Closed

BernhardRode mentioned this issue Aug 28, 2018

TLS Timeout #581

Closed

markwaldkat mentioned this issue Aug 28, 2018

helm install breaks nondeterministically #521

Closed

ozevren mentioned this issue Sep 6, 2018

Galley performance tuning with k8s api istio/istio#8338

Closed

DenisBiondic mentioned this issue Oct 4, 2018

streamwatcher.go:109] Unable to decode an event from the watch stream: stream error: stream ID 1; INTERNAL_ERROR #676

Closed

This was referenced Oct 19, 2018

Autoscaling does not work on Azure Kubernetes Service knative/serving#1730

Closed

Installing on Azure docs should be updated to indicate known issues and workarounds knative/serving#2270

Closed

subesokun mentioned this issue Nov 6, 2018

How to patch AKS hashicorp/terraform-provider-azurerm#1915

Closed

Azure deleted a comment from prune998 Apr 20, 2019

Azure deleted a comment from thehappycoder Jul 12, 2019

stale bot added the stale Stale issue label Jul 20, 2020

ghost closed this as completed Aug 4, 2020

ghost locked as resolved and limited conversation to collaborators Sep 4, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation for high levels of in-cluster kube-apiserver traffic #620

Performance degradation for high levels of in-cluster kube-apiserver traffic #620

slack commented Aug 27, 2018

stdistef commented Aug 29, 2018

strtdusty commented Aug 29, 2018 •

edited

Loading

derekperkins commented Aug 29, 2018

m1o1 commented Aug 30, 2018

blackbaud-brandonstirnaman commented Sep 4, 2018

strtdusty commented Sep 5, 2018

slack commented Sep 13, 2018

strtdusty commented Sep 13, 2018

novitoll commented Sep 14, 2018 •

edited

Loading

novitoll commented Sep 19, 2018

fhoy commented Oct 24, 2018

digeler commented Oct 31, 2018

charlspjohn commented Mar 27, 2019

stdistef commented Mar 27, 2019

stale bot commented Jul 20, 2020

ghost commented Aug 4, 2020

Performance degradation for high levels of in-cluster kube-apiserver traffic #620

Performance degradation for high levels of in-cluster kube-apiserver traffic #620

Comments

slack commented Aug 27, 2018

stdistef commented Aug 29, 2018

strtdusty commented Aug 29, 2018 • edited Loading

derekperkins commented Aug 29, 2018

m1o1 commented Aug 30, 2018

blackbaud-brandonstirnaman commented Sep 4, 2018

strtdusty commented Sep 5, 2018

slack commented Sep 13, 2018

What changed

What this does

Going forward

strtdusty commented Sep 13, 2018

novitoll commented Sep 14, 2018 • edited Loading

novitoll commented Sep 19, 2018

fhoy commented Oct 24, 2018

digeler commented Oct 31, 2018

charlspjohn commented Mar 27, 2019

stdistef commented Mar 27, 2019

stale bot commented Jul 20, 2020

ghost commented Aug 4, 2020

strtdusty commented Aug 29, 2018 •

edited

Loading

novitoll commented Sep 14, 2018 •

edited

Loading