-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS starting a stopped cluster fails #2171
Comments
Hi FunkyFabe, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
Triage required from @Azure/aks-pm |
I have the exact same issue. If stop/start the cluster without waiting too long (a few seconds) between the two actions, it starts again successfully. I assume that the stop action on the cluster level is really stopping everything (after a time) even the managed apiserver on Azure side. As a proof, this DNS issue on the managed apiserver. Then when we start the cluster again, the node are going up faster than the managed side on Azure so when they try to contact the apiserver, they cannot do so and the cluster enters failed state. |
Got same issue as well, not a private cluster, no NSG, no UDR, no custom DNS. Deployment failed. Correlation ID: d9347991-56a4-445d-9fb6-719f5382f7d1. Agents are unable to resolve Kube rnetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka. ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information. Details: VMSSAgentPoolReconcile r retry failed: { |
@jbouzekri in which region is your cluster deployed? |
@karyjac : In francecentral |
@jbouzekri same, I guess this is a bug....maybe DNS replication but can't say for sure. |
Action required from @Azure/aks-pm |
Issue needing attention of @Azure/aks-leads |
3 similar comments
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
I have the same issue. But can only reproduce it with k8s v1.20.5. With k8s v1.19.9 it works fine. Please investigate in this issue, its a really bad behaviour and the bug should be found as fast as possible. |
Issue needing attention of @Azure/aks-leads |
@NickKeller please investigate |
Triage required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
Issue needing attention of @Azure/aks-leads |
Are you still seeing this issue? cc @qpetraroia |
I am seeing this issue still. Twice in a space of week. Brand new private clusters. DNS setup correct and verified. |
Hi folks, we are reviewing this again and will provide an update soon. |
We are still facing this issue, Many of our clusters fails with |
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. @alvinli222 |
What happened:
Starting a stopped AKS cluster with
az aks start -g test-bench-exam-scheduler -n aks-test-bench
fails.(
az aks stop -g test-bench-exam-scheduler -n aks-test-bench
was used to stop the cluster before.)What you expected to happen:
The cluster should start properly.
How to reproduce it (as minimally and precisely as possible):
Create new AKS cluster. Then stop it and start it again with the CLI commands.
Anything else we need to know?:
Starting the cluster throws the following failure:
Environment:
Resource JSON of cluster
The text was updated successfully, but these errors were encountered: