Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS: Incorrect vm_size default for version >= v2.14.0 #8127

Closed
pst opened this issue Aug 14, 2020 · 8 comments
Closed

AKS: Incorrect vm_size default for version >= v2.14.0 #8127

pst opened this issue Aug 14, 2020 · 8 comments

Comments

@pst
Copy link

pst commented Aug 14, 2020

Starting with release v2.14.0 creating an AKS cluster without specifying a vm_size for the default_node_pool defaults to a sku that does not meet the min. requirements of more than 2 cores and 4GB memory.

Error: creating Managed Kubernetes Cluster "kbstacctest-ops-westeurope" (Resource Group "terraform-kubestack-testing"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="SystemPoolSkuTooLow" Message="System node pool must use VM sku with more than 2 cores and 4GB memory. Nodepool name: default."

Either the vm_size attribute should be mandatory and not have a default. Or the default should work. Weirdly enough, it seems no change has been made to this part of the schema.

@tombuildsstuff tombuildsstuff added service/kubernetes-cluster upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR question and removed upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR labels Aug 14, 2020
@tombuildsstuff
Copy link
Contributor

hi @pst

Thanks for opening this issue.

The vm_size field within the default_node_pool block of the azurerm_kubernetes_cluster resource is Required rather than Optional & Defaulted - so users are required to specify a value here (meaning that Terraform doesn't have a default VM Sku assigned here). As such unfortunately whilst this is a breaking behavioural change within the AKS Service, since this field is Required (and thus has to be specified by users) - you'd need to update your Terraform Configuration to use a larger value here.

Since this should be fixed by updating the Terraform Configuration being used here - I'm going to close this issue for the moment - but please let us know if updating the vm_size doesn't work for you and we'll take another look.

Thanks!

cc @JeffreyRichter here's another breaking behavioural change in an existing API

@pst
Copy link
Author

pst commented Aug 14, 2020

I understand the schema defines it as required. But it does so for both v2.13.0 and v2.14.0. Yet with identical TF configuration v2.13.0 creates a cluster successfully while v2.14.0 fails with above error during apply.

If the attribute is required, Terraform should error during validate or plan already. It does not, it starts to apply and then fails mid-way.

Do you see my point now @tombuildsstuff?

@tombuildsstuff
Copy link
Contributor

@pst unfortunately this is a breaking behavioural change in the AKS API - not something specific to Terraform. The difference in Provider behaviour here is likely due to the version of the AKS API being used changing between versions 2.13 and 2.14 of the Azure Provider.

Where the AKS API has a history of shipping breaking behavioural changes to existing API versions - we're only able to apply best-effort validation for this resource rather than more granular as we'd like. The API itself doesn't expose this information via an API endpoint - and so from our side we're forced to lean on the AKS API's validation here - rather than validating against a dictionary of all of the valid SKU's for each VM type at plan time (which could change at any time behind the scenes, as is the case in this breaking behavioural change to the Azure API).

As such whilst this breaking behavioural change is unfortunate - by the nature of this being an API change there's not much that Terraform could do ahead of time to prevent this.

Thanks!

@pst
Copy link
Author

pst commented Aug 14, 2020

I've deleted my last comment. I made a mistake, I did provide a value through a lookup() somewhere else in my config.

But that a breaking change like this between two minor versions is just immediately accepted like this in one of the first class Terraform providers is still disappointing at best.

@tombuildsstuff
Copy link
Contributor

@pst this isn't a breaking change in Terraform - this is a breaking change by Microsoft in the Azure/AKS API - I'd suggest filing an issue with Microsoft on the AKS repository - but unfortunately we're at the mercy of the API here.

@pst
Copy link
Author

pst commented Aug 15, 2020

This TF provider changed the API version it is using under the hood between the two releases. If the API you migrated to has breaking changes, you have breaking changes too now. This should have been reflected in the semver.

Mistakes happen and this is nothing personal, but I strongly disagree with your view on this.

If that Frank Turner concert will ever happen I owe you a beer. I'll let this rest now.

@alexeldeib
Copy link
Contributor

👋 I came here from the linked issue and was a bit concerned AKS would be making behaviorally breaking changes as described. Looking at the commits, @pst seems correct -- changing to 2020-03-01 means you started using system and user pools, which enforced extra validation.

I don't see that in the AKS changelog anywhere, although it's doc'd on the relevant site: https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools Could probably call that out as a behavior change in the release notes, though..

@ghost
Copy link

ghost commented Sep 13, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Sep 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants