-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS upgrade fails if you have node-role.kubernetes.io node labels #1835
Comments
Hi ohorvath, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
+1, we cannot even update node labels on a node pool before we upgrade to address this problem. https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools
|
+1 We also saw this recently on a 1.15 to 1.16 upgrade. We are unable to change our node labels to be compliant with 1.16, so our upgrade completely fails. We kind of need this ability so that:
|
+1. Ideally just upgrading from 1.15 to 1.16 should Just Work. As it is the portal just hangs for over an hour until it fails with no meaningful information. The ability to update node labels after creation is also critically important. Not only setting the labels, but propagating the change to the existing nodes should be part of the update. |
Triage required from @Azure/aks-pm |
We're going to try and take a look at this/reproduce. Seems like node-role.kubernetes.io a common label that got deprecated. |
Action required from @paulgmiller. |
Yes, that's deprecated, but we can't remove it as node labels cannot be changed after AKS cluster creation. I mean we can try to manipulate them programmatically node by node or move the workloads to a new node pool, but that's a pain for existing environments. So all of those clusters are stuck with 1.15 forever. And similar issues will happen in the future I'm sure. So a solution like node label operator would be beneficial to control node labels with tags or similar. |
So we reprod that if you have a nodepool you created with
Since we don't have a way to update labels (--labels not supported in az aks nodepool update ) the short term way to fix this is to create a new node pool and cordon/drain/delete the old one. Longer term we need to decide if we should just
Interested in the feedback of those whove' commetned already. |
I would personally like to see the ability for us to change the labels on node pools. This will allow us to prevent any problems during upgrade and also to continue to target certain nodes with our nodeSelectors. |
I'd like to manipulate node labels through API calls or CLI commands before or during upgrade. Like so: az aks nodepool upgrade --removelabel XYZ --addlabel XYZ |
Having the existing node labels automatically removed doesn't buy us much since our charts used them as nodeSelector targets. We'd definitely need the ability to adjust the existing nodepool labels. |
Current plan is to try and block upgrades/creates >=1.16 with these bad labels starting next week (2a). We want to let you update labels but needs more work and first priority is to keep people from accidentally destroying their agent pools. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue will now be closed because it hasn't had any activity for 15 days after stale. ohorvath feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
What happened:
If you have AKS 1.15 nodepool labeled with "node-role.kubernetes.io/something", the upgrade process will fail. It will terminate all your pods, set your nodes as notready but never can bring in new nodes to replace the old ones. I think it tries to bring in the new nodes with the old unsupported labels.
What you expected to happen:
Upgrade should work by provisioning new nodes with the new labels or feature gate should be set to support the old one. Or at least we should have a way to replace the node labels after creation to fix this manually before starting the upgrade.
How to reproduce it (as minimally and precisely as possible):
Create an AKS cluster with a nodepool contains a node-role.kubernetes.io label. Start the upgrade process from the portal.
Anything else we need to know?:
Environment:
kubectl version
): 1.15The text was updated successfully, but these errors were encountered: