-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS Timeout #581
Comments
Yep same here, sometimes the Master API is really slow or basically just not available during 5 or 10min, sometimes it's longer ... have you some trouble about master scaling on your side @Azure ? I'm experiencing this since 2 weeks by the way, it's very important that your master servers are 100% available since a lot of services are using k8s API for different things ... for the moment i can say that AKS is not really fully performant on that specific point. |
No, I have no issue scaling the cluster up or down. But I don't know how to check the status of master node, can you guide me? |
@talhairfanbentley No i was asking Azure Team if they have some trouble with their k8s masters on AKS because apparently there is some :) The purpose of AKS is basically to host the Masters and manage them for us, that's the main difference between ACS and AKS, it's an integrated Azure Service. |
well, hope they fix it soon. |
Same ... that's a big issue for AKS. |
Can you guys send subscriptionID, resource group, resource, and region to [email protected] for us to take a look? thanks! |
+1 Same diagnosis. But I am not surprised, you know |
@weinong isn't the subscriptionID is ought to be kept secret? how can I share that with you? |
If you send that information directly to [email protected] via a secure email client your information will be kept securely within Microsoft |
@jskulavik thanks , I'll talk to my manager first. |
Noticed the normal behavior last 2 days on one AKS cluster, others with same k8s versions keep lagging to k8s API. (resource deletion/creation takes longer). Hopefully, anyone from AKS dev team will provide some feedback/status here. |
It's been 2 weeks now. I still face timeout issues some time. |
Pod that used to be deleted and created back within 1-5 seconds on self-managed Kubernetes (via ACS engine), now on AKS GA takes ~14 minutes. @jskulavik Could you please provide any feedback of how stable AKS schedulers are? We are about to roll out back to ACS solution from AKS GA. |
Hi @novitoll, As AKS is a managed service, there are significantly more resources provisioned on your behalf during cluster creation than with a pure ACS-Engine cluster creation. The benefit of AKS provisioning workflows are that you receive the managed service - where Azure manages Kubernetes infrastructure on your behalf. This is most likely the reason you are seeing the difference in provisioning time. |
@jskulavik I understand, I'd used ACS-Engine cluster before AKS went in GA. But this is more of "complaint" we've been suffering since migrating to AKS. So I propose my help (I could not find the the source code of AKS, but apparently it's just performance) and request to fix the latency issue in AKS. If we have a Deployment of N-replica, then it seems that when you do RollingUpdate each replica takes ~6 mins to be recreated. So imagine for 3-replica deployment, waiting for 18 mins :) Insane. With ACS, when we managed our masters ourselves, there was a 0-latency, but we faced another bigger issues. Please let me know if I can help (fix some Go Lang code etc.) but we need to sort this issue out. |
Thank you @novitoll. We are constantly working to improve AKS, not only from a performance perspective, but across the board. This type of feedback is a great way to help. The more feedback like this that we receive, the better. So thank you again and please continue this feedback. I assure you, we're working hard to address your concerns. |
We are experiencing the same issue today in the West Europe region. |
Hi @Kemyke, You can submit a support request via portal.azure.com linking to this issue. Thank you. |
Hoping for these major issues to be solved soon. |
Linking in #605 as it sounds related |
Since longer time I'm facing already issues with Helm deployments and for me they sound related to this issue. Re-opened my support request and linked in this issue 👍
|
@subesokun yes it seems to be same, |
@talhairfanbentley Oh ok, I'm facing this issues in the EASTUS and WESTEUROPE region mostly during nightly deployments. For each nightly build a new AKS cluster gets created via IaC and some apps are getting deployed into it. Sometimes AKS provisioning fails (well that's another issue) but most of the time the deployment of the apps is failing with the errors mentioned above. |
are you trying to deploy them through command line or with yaml files? |
YAML files / Helm charts via Helm |
try to increase the timeout |
This timeout is only Helm specific (1800s = 30min for the complete deployment) and usually my deployments are taking 5mins as I'm creating a LB service. But the timeout I'm actually facing is on a lower level (timeout on connection level). So my deployments are failing sometimes already after 2min as a connection times out. |
hmm,, strange. Well we can wait for microsoft to look into this |
Maybe you should follow this topic: |
Everytime I sale-in or scale-out an AKS cluster, all kubectl commands start reporting TLS timeouts. Also, if an Azure VM fails, the same thing occurs. This happens with kubernetes versions 1.10.* and 1.11.*. |
@agolomoodysaada are you sure about that it only occur on these specific versions? |
Yes I'm sure. That's why I wrote that comment. I should mention this is on EastUS and EastUS2. So perhaps WestUS is on a different AKS release? |
well, I had this issue in EastUS and EastUS2, I changed my region to CentralUS and everything is working fine for now |
Hello, |
hey guys, I might have a fix for you all. I created my cluster in Central US. Also what I gathered from my findings. This timeout thing had to do something with Api Server at master node. So try to reduced the frequency of calls to your api server.
|
Closing this issue as old/stale. The error reported (TLS Timeout) can be caused by many different things. API server/etcd overload, custom NSG rules blocking needed ports, and more (custom firewalls, url whitelisting, etc). If you are only seeing this behavior on clusters with a unique configuration (such as custom DNS/VNet/etc) please open an Azure technical support ticket. |
I'm randomly getting TLS Timeout, fixing on it's own after a while. But in the mean time the kubernetes get unreponsive, like no response is returned from CLI. My cluster is in WEST US
The text was updated successfully, but these errors were encountered: