Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_instance change to user_data no longer results in a destroy and re-create #23315

Closed
jaredfholgate opened this issue Feb 22, 2022 · 17 comments · Fixed by #23604
Closed

aws_instance change to user_data no longer results in a destroy and re-create #23315

jaredfholgate opened this issue Feb 22, 2022 · 17 comments · Fixed by #23604
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@jaredfholgate
Copy link

jaredfholgate commented Feb 22, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

  • aws_instance

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

resource "aws_instance" "tfe" {
  count                = 2
  ami                  = data.aws_ami.ubuntu.id
  instance_type        = "m5.xlarge"
  key_name             = aws_key_pair.main.key_name
  availability_zone    = data.aws_availability_zones.available.names[count.index]
  iam_instance_profile = aws_iam_instance_profile.tfe.id

  network_interface {
    network_interface_id = aws_network_interface.tfe[count.index].id
    device_index         = 0
  }

  root_block_device {
    volume_size = 100
    tags = {
      Name = "${var.friendly_name_prefix}-tfe-server-ebs-root-${count.index + 1}"
    }
  }

  user_data = (count.index == 0 && var.install_type != "tfe_manual" && var.install_type != "apache_hello_world") ? "${local.final_tfe_script}${local.tfe_script_setup_admin_user}" : local.final_tfe_script

  tags = {
    Name = "${var.friendly_name_prefix}-tfe-server-${count.index + 1}"
  }
}

Debug Output

Panic Output

Expected Behavior

Prior to version 4.0.0 a change to the user_data argument resulted in the resource being destroyed and recreated.

Actual Behavior

The resource is no longer destroyed and recreated when there is a change to user_data.

Steps to Reproduce

  1. terraform apply

Important Factoids

The docs appear to have been updated to include this line 'Updates to this field will trigger a stop/start of the EC2 instance'.

However there does not appear to be anything in the release notes about this being a breaking change, so thought I should flag it in case it is unintentional. This change causes significant impact on users that expect the previous behaviour. This is particularly the case for Terraform Cloud and Enterprise users with VCS integration who are unable to use the -replace command line flag.

Given that user_data scripts are only run the first time an EC2 instance is deployed, it seemed logical that a change would result in a recreation of the resource, otherwise user_data changes will not be applied which may cause confusion.

References

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ec2 Issues and PRs that pertain to the ec2 service. labels Feb 22, 2022
@anGie44
Copy link
Contributor

anGie44 commented Feb 22, 2022

@runesl
Copy link

runesl commented Feb 24, 2022

This new lifecycle behaviour also applies to changes to userdata_base64. Here it is even worse, because the stop-start cycle seems to fail consistently when using zip'ed base64 userdata:

Error: error updating instance (i-05d631dea38538058) user data base64: illegal base64 data at input byte 12

Workaround: Force a recreate.

@anGie44
Copy link
Contributor

anGie44 commented Feb 24, 2022

Error: error updating instance (i-05d631dea38538058) user data base64: illegal base64 data at input byte 12

This is likely due to this line 😞

userData, err := base64.URLEncoding.DecodeString(d.Get("user_data_base64").(string))
if err != nil {
return fmt.Errorf("error updating instance (%s) user data base64: %w", d.Id(), err)

#23305 may be the correct approach

@anGie44
Copy link
Contributor

anGie44 commented Feb 24, 2022

@runesl are you able to provide the file content that is causing the error you are seeing?

@gdavison
Copy link
Contributor

Instance user data is most commonly used to run scripts at startup, but there are other cases where it can be used. User data can be retrieved at any time using IMDS (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-add-user-data.html)

So we'll need to handle both cases.

@vitved
Copy link

vitved commented Feb 25, 2022

@runesl are you able to provide the file content that is causing the error you are seeing?

This happens when we (colleague here) base64encode a couple of template files and then add the to the cloud-init file:

locals {
  1_config = base64encode(templatefile("${path.module}/1.yml.tpl", {
  }))
  2_config = base64encode(templatefile("${path.module}/2.conf.tmpl", {
  }))

  cloudinit_config = templatefile("${path.module}/cloud_init.cfg", {
    1_conf         = local.1_config,
    2_conf         = local.2_config
  })
}

data "template_cloudinit_config" "config" {
  gzip          = true
  base64_encode = true

  # Main cloud-config configuration file.
  part {
    filename     = "init.cfg"
    content_type = "text/cloud-config"
    content      = local.cloudinit_config
  }
}

@MattTDickinson
Copy link

MattTDickinson commented Feb 25, 2022

This change has has significant impact for our automated deployments using terraform within CI/CD. We relied on terraform replacing the instance when it saw a userdata change, so that the new userdata could run when the new instance was launched.
Is there a workaround where we can force terraform to replace the instance if it sees a user data change?

@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Feb 25, 2022
@simon-priestman-sky
Copy link

we have had to pin our cicd pipes to a specific aws provider which behaves as we expect. Wr risk being forced to update due to security concerns. Can we have a resolution to this issue please? Can terraform aws once again terminate and destroy EC2s when user data hash has changed?

@leonsmith
Copy link
Contributor

It's nice that you can now update user_data and it not recreate an instance (I can see some use cases for this) but if you relied on the previous behaviour of recreating the instance on user data change there is currently no workaround besides down pinning the provider.

Would be nice to be able to specify the behaviour or have some lifecycle workaround like the following hashicorp/terraform#8099

@rfarro82
Copy link

rfarro82 commented Mar 8, 2022

We have the same issue of now pinning the provider so that our CI/CD pipelines redeploys our infrastructure. Can we get a fix for this?

@v4de
Copy link

v4de commented Mar 8, 2022

Looks like it reboots instances now, we are trying to roll back to another instance provider currently. #18043

We see in our instance logs that it was rebooted, and the user data was modified, however none of the user data was run on reboot. We rely on this to update and install our packages on our instances. Is there a flag to force new?

@jaredfholgate
Copy link
Author

We are planning to work on a solution for this. We plan to add a new argument called user_data_replace_on_change. If set to true, then a change to user_data or user_data_base64 will trigger a destroy and recreate of the resource as per the previous behaviour.

The new argument will default to false meaning that users wanting to keep the previous behaviour will need to proactively set this argument.

I will update this issue once I have a better idea of a release date.

@korenlev
Copy link

korenlev commented Mar 9, 2022

This change has has significant impact for our automated deployments using terraform within CI/CD. We relied on terraform replacing the instance when it saw a userdata change, so that the new userdata could run when the new instance was launched. Is there a workaround where we can force terraform to replace the instance if it sees a user data change?

"-replace" was always there to force an ec2 instance replacement

@vitved
Copy link

vitved commented Mar 11, 2022

This change has has significant impact for our automated deployments using terraform within CI/CD. We relied on terraform replacing the instance when it saw a userdata change, so that the new userdata could run when the new instance was launched. Is there a workaround where we can force terraform to replace the instance if it sees a user data change?

"-replace" was always there to force an ec2 instance replacement

-replace also replaces resources that aren't changed so there's a subtle difference. And for those using something like Atlantis in the pipeline it's a no go.

@github-actions
Copy link

This functionality has been released in v4.7.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@alewando
Copy link
Contributor

I agree with @jaredfholgate; this is an unexpected and breaking change. It's not just that it changes existing behavior, it's also that it seems incorrect for the average (cloud-init) use case where (as the OP points out) user-data is often a script that only gets run when the instance is launched and not when it reboots. So in these cases, with the new (reboot-only) behavior, changing the user data won't even result in that script being re-executed, which seems counter-intuitive to say the least. I'm not saying they don't exist, but I'd be willing to bet that a majority of EC2 instances (including those based on AWS' own amazon-linux) are using cloud-init (or similar) mechanisms to handle user-data (and, more specifically, for instance-launch scripts) and not treating it as some sort of generic data blob.

There are definitely valid reasons for allowing user-data changes to not recreate the instance, and it should be supported by terraform. It just doesn't feel like that warrants a change to the default behavior given the average (maybe wrong on this) use case and long history, as people (and tools) have come to expect it. I see there is a proposal to add an user_data_replace_on_change attribute, but that would be "opt-out". Significant changes to default behavior should be opt-in in my opinion.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.