-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to connect to the server: net/http: TLS handshake timeout #14
Comments
Update 11/03: I'm now able to create clusters successfully in uswest2; however, I'm still getting TLS handshake errors:
Are we still in the realm of capacity issues or is there another underlying issue here? This should work, right? David |
Sometime I should look before I write :) I see the problem. The proxy is trying to connect to 10.240.0.4 which is the private IP of one of the agents and won't (and shouldn't) be reachable from the Internet. I'm guessing this is the underlying issue here. |
+1 originally this worked fine, I noticed the isse today when I deleted the cluster and tried to recreate it. |
I get this regardless of using West US 2 or UK West: |
Looks like we are good now, thanks for all the work! QQ: I cannot connect with Cabin app to my cluster using token. The app shows cluster as running but I can see any of the nodes, namespaces, etc. looks like the auth fails at some point. Thoughts? |
I'm having the same problem in West US 2 at the moment: $ kubectl get pods --all-namespaces |
same issue here on West US 2 |
The cluster aks is in West US 2. I have the same issue. kubectl get nodes az aks browse --resource-group xxxx-rg --name xxxx |
11/9: I'm still getting issues and have reverted back to unmanaged cluster using ACS and Kubernetes as the controller. Look forward to when AKS becomes a little more stable,. |
I am having these same issues! |
@dsandersAzure I did the same, I created using ACS !! |
AKS still in preview, for now, it seems west us 2 is not available, but ukwest is ok. We can create aks in ukwest now.
|
I believe capacity issues in ukwest is ongoing, hoping AKS will expand to other locations in Europe soon. Had a 1.7.7 cluster in ukwest that broke a couple of days ago. Attempted to recreate today, but it is still in a bad state.
|
So, provisioning in westuk gives me a cluster with crashing pods; provisioning in westus2 doesn't work at all:
|
Hi, Same here today, I created an aks 1.8.1 on westeurope and it's ok, but one hour later I upgraded to 1.8.2 and since
kubectl 1.8.0 and 1.8.4 same error. After that I cant create new aks on westeurope location cli return this cmd : cli error
|
Having the same issue. I have two clusters, One East US and other Central US, |
I'm having the same issue after downscaling my cluster in East US! |
Hi everyone, Having same issue today on westeurope. And when I try to create a new cluster in this location, it gives an error: |
Still an issue. Any resolution? This is my third running cluster I have lost the ability to communicate with, in East US. Doing an upgrade or scaling up the nodes does not work properly - a complete deal breaker when considering AKS. Either of these commands results in Command to create:
Perfectly healthy. Command to scale up: az aks scale `
--name AKS-Cluster-VoterDemo `
--resource-group RG-EastUS-AKS-VoterDemo `
--node-count 3 Result: |
I encounter the same TLS handshake timeout connection issue after I manually scale the node count from 1 to 2! My cluster is in Central US What's wrong? |
Thanks for your patience through our preview. We've had a few bugs in scale and upgrade paths that prevented the api-server from passing its health check after upgrade and/or scale. A number of bug fixes in this area went out over the last few weeks that have made upgrades more reliable. Last week, for clusters in East US, we had an operational issue that impacted a number of older customer clusters between 12/11 13:00PST and 12/12 16:01PST. Health and liveness of the api-server is now much better. If you haven't upgraded recently I'd recommend issuing |
@slack thank you it work ;) |
@slack Confirm upgrading the cluster to 1.8.2 get the Kubectl connect again |
@slack Having the same problem still after upgrading to 1.8.2 in westeurope. Is there a problem in that region? |
After downgrading to 2.0.23 i was able to install the cluster but after getting the credentials downloaded I also have the same problem in westeurope... kubectl get nodes doing an az aks upgrade to 1.8.2 failed for me too incidentally. |
running into the same issue, cluster in West Europe, upgrade to 1.8.2 fails with: Deployment failed. Correlation ID: 858d3cf0-0d4e-417d-a2ee-22f627892e51. Operation failed with status: 200. Details: Resource state Failed |
I am getting the TLS handshake error at 2:30 PM EST in East US:
|
Also, for me kubectl\api calls from my laptop do not work, they work from Azure Cloud Shell only. |
I discovered the cause of my issue. In the portal my AKS cluster is still listed as "Creating...". It's been like that for several days now. I tried a different region, with the default VM size, and that worked. It still took a long time to go from "Creating..." to normal, but it did get there eventually. Then all the subsequent commands worked. |
Solution for me was to scale the Cluster nodes up by 1 (temporarily) and then once the new load launch connect. I was then successful and could scale the cluster down to the original size. Full background can be found over here: |
Same problem here. Sometimes I just cannot use kubectl. |
@emanuelecasadio AKS is now in GA. Make sure you either upgraded or have necessary patches installed. |
I am still facing this issue while running the "kubectl get nodes" command. I have tried the following but with no luck :(
|
@SnehaJosephSTS - we had to re-create our cluster after AKS went GA. Haven't had the issue since then. Upgrade for us did not work, nor did scaling. |
I am getting the error this morning. while trying to get nodes on a new cluster in eastus. |
I am getting the same issue in eastus. I enabled "rbac" with the AKS create command., az aks create --resource-group my-AKS-resource-group --name my-AKS-Cluster --node-count 3 --generate-ssh-keys --enable-rbac
|
There are many reasons behind TLS handshake timeout error. For clusters created before AKS GA, we highly recommend customers to create a new cluster and redeploy the system there. We also recommend customer to upgrade clusters to stay to the latest or one version before latest supported K8S version. Also make sure your cluster is not overloaded, meaning you didn't max out usable cpu and memory on the agent nodes. We've seen many times when someone scale cluster down from X nodes to 1, X being 5 or above, interruption to connection to control plane can happen as they might be running a lot of pods on the cluster and now all of them will be evicted and redeployed to the only node left. And if the node vm is very small, it can leave pods no place to schedule, including some mission critical pods (addons in kube-system) If after all the diagnosis you still suffer from this issue, please don't hesitate to send email to [email protected] |
Isn't that a very big issue? I've had many cluster break irreparably in this way. Thankfully I'm not dealing with a production workload, but imagine if I was. I'd be livid. Is it possible to somehow get the scheduler to prioritise the system pods over the workload pods? |
After lots of back and forth with Azure support, we reached to this workaround. I have yet to try it as they fixed the issue on their end. However, it might help someone else facing this. Anyway, here's their message:
P.S. We should NOT close this issue as the bug still occurs from time to time. This is not considered an acceptable workaround. It's a mitigation for those whose clusters are stuck and cannot access logs, exec, or helm deployments. We still need a permanent fix designed for failure of either tunnelfront or tunnelend. Would be nice if you could also explain what tunnelfront and tunnelend are and how they work. Why are we, consumers of AKS, responsible for maintaining Azure's buggy workloads? |
Created a new cluster after GA and now out of a sudden getting a bunch of TLS handshake timeout from AKS. This does not give the feeling that AKS is anything near GA. |
Yeah we run into this frequently, AKS master node availability is terrible. Constantly going down, timing out requests (nginx-ingress, even some of our applications that talk to k8s)... We don't run into any of these issues with GKE or kops environments. Not sure if this is anywhere near GA. EDIT As I wrote this, our cluster has been unavailable for the last 20+ minutes saying "TLS handshake timeout". 😒 |
I set up a cluster with one node and I wanted to investigate differences to the GCP deployement, essentially do a dry run (our production deployment is on Google Cloud's Kubernetes but we're doing an Azure deployment for a client). However, it seems like all
az version 2.0.46 |
I just had this happen to myself. It appeared out of the blue, then went away a few hours later, after I restarted my nodes a few times, as well as killed most of my deployments. I'm not sure if that's what fixed it, or if whatever was truly causing the issue just went away. Some notes from my investigation:
|
Brand new cluster today... been online for just a few hours and TLS handshake timeouts. 👎 |
Any update on this issue? We're still experiencing it |
We've been hitting this for a year, and the explanation earlier was that we were using a preview version of AKS cluster. Now we've moved to a new cluster (supposed after GA) and are still seeing it. I think it's worth bumping the priority as the issue has been around for a long while and is affecting a lot of folks. |
I've found the solution!!! |
and that is? @adamsem |
Migrate to AWS :) |
Hi Everyone; AKS has rolled out a lot of enhancements and improvements to mitigate this including auto-detection of hung/blocked API servers, kubelets and proxies. One of the final components is to scale up the Master components to meet the overall workload load against the master APIs. This issue (this github issue) contains a lot of cluster-specific reports - as we can not safely request the data for your accounts to do deeper introspection here on github, I'd ask if you could please file Azure technical support issues for diagnosis (these support issues get routed to our back end on-call team as needed for resolution). Additionally, the errors displayed can also correlate to underlying service updates in some cases (especially if you are seeing it randomly, for a limited amount of time). This will be helped with the auto scaling (increased master count) being worked on. For issues that come up after I close this, please file new github issues that include instructions for re-creation on any AKS cluster (e.g. general not-tied-to-your-app-or-cluster). This will help support and engineering debug. |
kubectl get pods --insecure-skip-tls-verify=true gives below error |
Hi, when I create an AKS cluster, I'm receiving a timeout on the TLS handshake. The cluster creates okay with the following commands:
The response from the create command is a JSON object:
{
"id": "/subscriptions/OBFUSCATED/resourcegroups/dsK8S/providers/Microsoft.ContainerService/managedClusters/dsK8SCluster",
"location": "westus2",
"name": "dsK8SCluster",
"properties": {
"accessProfiles": {
"clusterAdmin": {
"kubeConfig": "OBFUSCATED"
},
"clusterUser": {
"kubeConfig": "OBFUSCATED"
}
},
"agentPoolProfiles": [
{
"count": 2,
"dnsPrefix": null,
"fqdn": null,
"name": "agentpool1",
"osDiskSizeGb": null,
"osType": "Linux",
"ports": null,
"storageProfile": "ManagedDisks",
"vmSize": "Standard_A2",
"vnetSubnetId": null
}
],
"dnsPrefix": "dasanderk8",
"fqdn": "dasanderk8-d55f0987.hcp.westus2.azmk8s.io",
"kubernetesVersion": "1.8.1",
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": "OBFUSCATED"
}
]
}
},
"provisioningState": "Succeeded",
"servicePrincipalProfile": {
"clientId": "OBFUSCATED",
"keyVaultSecretRef": null,
"secret": null
}
},
"resourceGroup": "dsK8S",
"tags": null,
"type": "Microsoft.ContainerService/ManagedClusters"
}
I've now torn down this cluster but this has happened three times today.
Any help?
David
The text was updated successfully, but these errors were encountered: