-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[westeurope] Intermittent Ability to Communicate with API Server #577
Comments
I don't know if it is related, but here is an example log output from tiller running inside a k8s cluster (west EU as well):
kubectl executes the queries completely fine, but I suspect that connectivity inside the cluster to 10.0.0.1 is not working properly all the time. I've seen similar issues when using other containers that query the api server directly. |
Same here guys but on North Europe, i posted another message on antoher issue. #581
|
AKS is basically unuseable for me as for example Jenkins builds fail around 50% of the time if a step which involves API communication doesn't get a response. The exact same setup works on GKE. I haven't had time to investigate other configurations, my impression is there are some ghosts in the azure data centre networking machine. I need some reliable way to run kubernetes on Azure. Have had similar issues with acs-engine (although I haven't tried recently). Tools like Kubicorn dropped support for Azure because it was too difficult, so it looks like my only option is something like kubeadm or Kubernetes the Hard Way. https://github.com/ivanfioravanti/kubernetes-the-hard-way-on-azure |
Can you guys send details like subscriptionID, resource group, resource name and region to [email protected] for us to take a look? |
Hi @weinong I have a customer Case to this issue. I will send you an E-Mail with Details about it |
I just sent our subscription/cluster info. We are running in West-US. I see the issue with the dashboard service, nginx-ingress, helm and prometheus. All having issues (bad gateway) connecting to the API server. |
Hi, Your issue is similar to #522 |
I made a new cluster with version kubectl logs cert-manager-59fb9b6779-mbz42 | grep -o watch | wc -l
143 |
We have the same problem with watches. From cache.NewInformer
AKS might have trouble with the case where the resync period is 0 (or longer than the timeout that occurs between the pod and api server). I mention this resync period because I'm curious if the new timeout of 10 minutes will be sufficient for this case. |
@andrew-dinunzio thanks for that context, looks like that may be the issue. I've had pre-install hooks with helm before that lasted 45 minutes so I guess the new timeout wouldn't work for those. Not sure how long the watches are with cert-manager. |
Reading the comment here, I'm hoping that means that there won't be any period of time longer than 3 minutes where there's no traffic between pods and the API server, which would be a good thing |
Thanks. Well at least this seems like a fixable configuration which will hopefully be resolved soon. |
Closing this issue as old/stale. If this issue still comes up, please confirm you are running the latest AKS release. If you are on the latest release and the issue can be re-created outside of your specific cluster please open a new github issue. If you are only seeing this behavior on clusters with a unique configuration (such as custom DNS/VNet/etc) please open an Azure technical support ticket. |
What happened:
Communication with API server is very patchy. E.g. a
helm install
will work one moment, but then the next minute a502 bad gateway
is returned fromnginx-ingress
(versionquay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.15.0
). Another example is:What you expected to happen:
API server requests work as normal.
How to reproduce it (as minimally and precisely as possible):
Run 1000 requests against the API server and see if they all succeed.
Anything else we need to know?:
Environment:
kubectl version
):2
Standard_DS14_v2
Backend services:
Prometheus, Grafana, Elasticsearch, Jenkins
Kubernetes spec:
The text was updated successfully, but these errors were encountered: