Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-default node_group name breaks node group version upgrade #862

Closed
1 task done
mfilocha opened this issue May 6, 2020 · 4 comments · Fixed by #868
Closed
1 task done

Non-default node_group name breaks node group version upgrade #862

mfilocha opened this issue May 6, 2020 · 4 comments · Fixed by #868

Comments

@mfilocha
Copy link

mfilocha commented May 6, 2020

Non-default node_group name breaks node group version upgrade

I'm submitting a...

  • bug report

What is the current behavior?

Node group allows to define a custom name for a node group. This works as expected, however when you try to upgrade k8s version of this group with keyword version, TF fails with "ResourceInUseException: NodeGroup already exists with name ... and cluster name ..."

If this is a bug, how to reproduce?

node_groups = {
    main_group = {
      name    = "group-name"
      version = var.kubernetes_version
      ...
    }
}

What's the expected behavior?

TF will perform node group version upgrade.

Are you able to fix this problem and submit a PR?

The possible workaround is to not define custom node group name. This should be stated in the documentation.

I can create a PR to add a note to the documentation until the issue is fixed.

Environment details

  • Affected module version: v11.1.0
  • Terraform version: v0.12.20
@wwentland
Copy link
Contributor

wwentland commented May 6, 2020

The issue you are seeing here is probably related to #843 in that the version upgrade causes a change to the cluster resource. Do you see output similar to the following?

 # module.node_groups.aws_eks_node_group.workers["foo-0123456789abcd"] must be replaced
+/- resource "aws_eks_node_group" "workers" {
      [...]
      ~ cluster_name    = "foo-cluster" -> (known after apply) # forces replacement
      [...]

If so, the behaviour you see is probably due to the way the MNG module has been stitched into the main module. In particular, it contains the following code snippet:

data "null_data_source" "node_groups" {
  count = var.create_eks ? 1 : 0

  inputs = {
    cluster_name = aws_eks_cluster.this[0].name
    [...]
  }
}

module "node_groups" {
  [...]
  cluster_name           = coalescelist(data.null_data_source.node_groups[*].outputs["cluster_name"], [""])[0]
  [...]
}

The intention of this is to force the creation of a number of resources before the node groups are being created, with the unfortunate side effect of turning cluster_name into a value that will be computer during apply time whenever the underlying cluster resource has pending changes.

@dpiddockcmp elaborated on this behaviour in #843 (comment)

@dpiddockcmp
Copy link
Contributor

This is two issues clashing together. Changing kubernetes version shouldn't require recreation of the managed node groups, so yes, it is the explanation in #843 that's causing half the issue.

There's also the problem with the optional name parameter introduced in #739. That stops successful recreation of the node group as the name is already in use and must be unique on the cluster.

@wwentland
Copy link
Contributor

wwentland commented May 6, 2020

The name change introduced in #739 was meant to ensure that node groups are updated in place, rather than recreated with a new name. Although that might have been a symptom of the same underlying issue.

Node groups are behaving as intended with the following change:

diff --git a/node_groups.tf b/node_groups.tf
index 5c2b92e..54acf42 100644
--- a/node_groups.tf
+++ b/node_groups.tf
@@ -19,7 +19,7 @@ data "null_data_source" "node_groups" {
 module "node_groups" {
   source                 = "./modules/node_groups"
   create_eks             = var.create_eks
-  cluster_name           = coalescelist(data.null_data_source.node_groups[*].outputs["cluster_name"], [""])[0]
+  cluster_name           = var.cluster_name
   default_iam_role_arn   = coalescelist(aws_iam_role.workers[*].arn, [""])[0]
   workers_group_defaults = local.workers_group_defaults
   tags                   = var.tags

But that would mean that the desired resource creation order is not enforced any more.

I am investigating utilising the node_group module as a standalone module to get around these issues and making their configuration more explicit.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants