4.0 - Issue with EC2 Instance Metadata running inside Container #23110

kylegoch · 2022-02-10T21:45:07Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

v1.1.2

Affected Resource(s)

The provider itself

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

provider "aws" {
  region = "us-east-2"

  assume_role {
    role_arn = "<redacted>"
  }
}

Debug Output

Panic Output

Expected Behavior

Terraform should plan and run using the EC2 metadata.

Actual Behavior

| Error: error configuring Terraform AWS Provider: no valid credential sources for Terraform AWS Provider found.
│ 
│ Please see https://registry.terraform.io/providers/hashicorp/aws
│ for more information about providing credentials.
│ 
│ Error: no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded
│ 
│ 
│   with provider["registry.terraform.io/hashicorp/aws"],
│   on configuration.tf line 11, in provider "aws":
│   11: provider "aws" {

Steps to Reproduce

terraform plan

Important Factoids

Today when switching to v4.0, we discovered we could no longer run Terraform on EC2 instances that use the AWS Instance Metadata service. Running v4.0 locally works fine. But running the same terraform on an EC2 instance (such as for CICD) results in the error shown above.

Rolling back to 3.74.1 fixes the issue and all works as planned.

The instances in question are running both v1 and v2 of the Instance Metadata service.

The text was updated successfully, but these errors were encountered:

YakDriver · 2022-02-10T22:40:03Z

See:

ntman4real · 2022-02-10T23:29:22Z

@YakDriver so what is the fix? Whole lot of things are breaking at the moment. Should have pinned provider ver. but didn't.

FYI same issue as OP, works locally but when using assumed role via CICD EC2 runners, the issue exists.

gdavison · 2022-02-10T23:35:48Z

Thanks for reporting this, @kylegoch. Could you attach the debug log please? You can enable debug logging by setting the environment variable TF_LOG=DEBUG. For more information on logging, see https://www.terraform.io/internals/debugging

ntman4real · 2022-02-10T23:48:50Z

here is my debug log @gdavison

debug.txt

opalmer · 2022-02-11T01:02:32Z

I was working on producing a debug log too and ran across something interesting. It does work in my case but specifically outside of the docker container running the terraform job in our case. The container is running on a host where the instance metadata service, just like @kylegoch's, is exposed and the container has access to it via a bridge network. Here's an example of how that container is spun up:

sudo docker run --name test-container --network=bridge -ti alpine:latest /bin/sh

I wrote a small test program that I think reproduces the behavior I'm seeing too. On the host it runs fine but inside the container it produces:


Please see https://registry.terraform.io/providers/hashicorp/aws
for more information about providing credentials.

Error: no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded


goroutine 1 [running]:
main.main()
	/Users/opalmer/go/src/github.com/terraform-providers/terraform-provider-aws/main.go:56 +0x595

Output from docker info and the test program itself is attached. Host network wise we're not doing anything special with iptables that could be causing this behavior. I'll continue digging on my side to see if there's anything else I can track down that might be helpful.

opalmer · 2022-02-11T01:06:31Z

... and of course it would be helpful if I actually included that program I mentioned haha.

debug.tar.gz

breser · 2022-02-11T01:53:14Z

This is happening because v4.0.0 is using IMDSv2. IMDSv2 requires a PUT to retrieve the token. There is a setting that limits the number of hops that the response to that PUT will go before being dropped by the network, httpPutResponseHopLimit, the default for this setting is 1.

This means if you are more than one network hop away from the the IMDS then you will get the errors. The most common reason for this is that you're running in a docker container.

The solution is to increase the hop count to at least 2. Running the following command will fix this:
aws ec2 modify-instance-metadata-options --instance-id "$INSTANCE_ID" --http-put-response-hop-limit 2 --http-endpoint enabled

This change is very vaguely mentioned in the release notes for v4.0.0:

provider: Updates AWS authentication to use AWS SDK for Go v2 https://aws.github.io/aws-sdk-go-v2/docs/ (#20587)

This doesn't seem to impact the AWS CLI when running in a docker container with the hop count set to 1 on a host that allows v1 and v2. I suspect because it either falls back to v1 or tries v1 first and never fails to get the response from the PUT. But that's probably not the terraform provider's issue but really the AWS Go SDK.

This is likely to impact a lot of people because many build systems use docker containers these days. I'd strongly recommend working to get the Go SDK to do something intelligent here.

gdavison · 2022-02-11T02:06:19Z

The Provider is using the AWS SDK for Go v2 for authentication. According to AWS documentation,

The AWS SDKs use IMDSv2 calls by default. If the IMDSv2 call receives no response, the SDK retries the call and, if still unsuccessful, uses IMDSv1. This can result in a delay. In a container environment, if the hop limit is 1, the IMDSv2 response does not return because going to the container is considered an additional network hop. To avoid the process of falling back to IMDSv1 and the resultant delay, in a container environment we recommend that you set the hop limit to 2.

The AWS SDK for Go v1 also tries IMDSv2 first, so it's not clear why it worked with earlier versions of the provider and fails with v4.0.

We can update our documentation and try to return a more helpful message.

opalmer · 2022-02-11T02:18:57Z

@breser good call on that second hop, forgot about that with IMDS!

@gdavison, I'm going to echo what @breser suggests and work on figuring out how to make this fall back correctly. The test code I attached reproduced the issue but the following code also works out of the box inside a docker container:

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/aws/aws-sdk-go-v2/config"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), time.Second*10)
	defer cancel()

	cfg, err := config.LoadDefaultConfig(ctx)
	if err != nil {
		panic(err)
	}

	creds, err := cfg.Credentials.Retrieve(ctx)
	if err != nil {
		panic(err)
	}

	fmt.Println(creds)
}

This is using the same version of the AWS Go SDK v2 that the terraform provider is using. In fact, the above is what I dropped directly into main.go after cloning down this project and checking out the v4.0.0 tag.

If the default configuration from the AWS SDK works around this issue then I believe the provider should be as well. I suspect it's specifically something to do with how the AWS config that's being generated in github.com/hashicorp/aws-sdk-go-base. In addition, another reason to fix this is if you do something weird with your network that number of hops could change easily breaking terraform requiring another instance level modification. If someone has specifically disabled the fallback on their host that's one thing but if it's enabled and available to terraform then the provider should take advantage of that after trying the better (IMDSv2) one.

Grummfy · 2022-02-11T11:14:47Z

Tips for whoever are block by this, rollback to v3, see #20433 and use

terraform {
  required_providers {
    aws = {
      version = "~> 3.0"
    }
  }
}

FernandoMiguel · 2022-02-11T13:11:42Z

Anyone using ASG you need

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 2
    instance_metadata_tags      = "enabled"
  }

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#metadata-options
if you are using the community ASG module, https://github.com/terraform-aws-modules/terraform-aws-autoscaling#input_metadata_options

rexsuecia · 2022-02-11T15:51:37Z

I ran into the same, my conclusion became that with provider 4.0.0 it does no longer respect the credentials set in the environment (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY).

I run a number of project on GitLab CI/CD and have used the same approach for years, but today it broke. I have the secrets stored on group account in environment variables.

When it does not find the credentials in the environment it looks for "profiles" in ~/.aws/credentials etc. When that fails it tries the meta data service (which I do not have since I do not run on EC2 (I think GitLab runs on GCP)) and hence fails miserably.

My workaround was to put the creds in a file like:
mkdir -p ~/.aws && echo -e "[profile-name]\naws_access_key_id = $AWS_ACCESS_KEY_ID\naws_secret_access_key = $AWS_SECRET_ACCESS_KEY\n" > ~/.aws/credentials

And finally carefully clean out these credentials rm -rf ~/.aws/credentials

Far from optimal but doing so it works, for me.

FWIW

YakDriver · 2022-02-11T16:07:53Z

@rexsuecia Thank you for reporting this additional aspect. The provider should still respect the access and secret key env vars. Will you create a separate issue for that so we can track it?

rexsuecia · 2022-02-11T16:16:50Z

@YakDriver I wish I had the time to do that. But the issue submission process is so cumbersome I simply cannot do that in near time, I have 20+ projects that need patching to handle this (and the other interesting breaking changes in v4 ( ;-) ) so I am more than busy this weekend and I bet you guys will have released a fixed 4.0.1 before I even have created a reproducable gist.

aissarmurad · 2022-02-11T19:45:25Z

Tips for whoever are block by this, rollback to v3, see #20433 and use
terraform {
  required_providers {
    aws = {
      version = "~> 3.0"
    }
  }
}

The workaround in the most part of the time will be

terraform {
  required_providers {
     aws = {
       version = "~> 3"
     }
  }
}

FernandoMiguel · 2022-02-11T19:48:17Z

Tips for whoever are block by this, rollback to v3, see #20433 and use
terraform {
  required_providers {
    aws = {
      version = "~> 3.0"
    }
  }
}
The workaround in the most part of the time will be
terraform {
  required_providers {
     aws = {
       version = "~> 3"
     }
  }
}

That will not work.
That means 3 or pretty much anything.
You need 3.0 so it does any releases within major 3.

aissarmurad · 2022-02-11T20:01:19Z

@FernandoMiguel according to the Terraform documentation

~>: Allows only the rightmost version component to increment. For example, to allow new patch releases within a specific minor release, use the full version number: ~> 1.0.4 will allow installation of 1.0.5 and 1.0.10 but not 1.1.0. This is usually called the pessimistic constraint operator.

Reference
https://www.terraform.io/language/expressions/version-constraints

FernandoMiguel · 2022-02-11T20:20:30Z

@FernandoMiguel according to the Terraform documentation

~>: Allows only the rightmost version component to increment. For example, to allow new patch releases within a specific minor release, use the full version number: ~> 1.0.4 will allow installation of 1.0.5 and 1.0.10 but not 1.1.0. This is usually called the pessimistic constraint operator.

Reference

https://www.terraform.io/language/expressions/version-constraints

That's exactly what I said 😉
What do you think happens if you pin 3 only without a dot zero?

gdavison · 2022-02-11T21:45:10Z

Thanks for your patience, everyone. We're investigating what has changed between v4.0 and previous versions that causes this to fail inside containers now.

If you have other authentication issues that are not related to using the EC2 Instance Metadata Service from inside a Container, please open a new issue so that they can be tracked separately.

gdavison · 2022-02-11T22:48:14Z

@opalmer thanks for your investigation. When the instance is configured to use either IMDSv1 or IMDSv2, the sample code succeeds, but when IMDSv2 is required, the sample code fails with

panic: no EC2 IMDS role found, operation error ec2imds: GetMetadata, canceled, context deadline exceeded

@opalmer and @kylegoch, can you paste the output of aws ec2 describe-instances --instance-ids <instance id> | jq '.Reservations[0].Instances[0].MetadataOptions'

gdavison · 2022-02-11T23:27:21Z

I've just tried the provider v3 authentication flow in a container with both IMDSv1 and IMDSv2, which succeeds, and requiring IMDSv2, which fails.

breser · 2022-02-12T00:06:08Z

This was happening for me with machines that had IMDSv1 and IMDSv2 enabled (taken from a AWS Config snapshot that I pulled down trying to investigate this issue yesterday):
"metadataOptions": { "state": "applied", "httpTokens": "optional", "httpPutResponseHopLimit": 1, "httpEndpoint": "enabled" },

gdavison · 2022-02-12T00:12:12Z

@breser was Terraform running in a container? Can you share the contents of your provider configuration block, please?

provider "aws" {
  ...
}

breser · 2022-02-12T00:18:10Z

Yes runnning in a container:

provider "aws" {
  region  = var.region
  assume_role {
    role_arn = "arn:aws:iam::${var.account_id}:role/RoleName"
  }
}

Using terraform 0.13.7 (yes I know it's old).

Provider versions from the init output:

Initializing provider plugins...
- Finding hashicorp/template versions matching ">= 2.1.2"...
- Finding hashicorp/aws versions matching ">= 2.55.0"...
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing hashicorp/aws v4.0.0...
- Installed hashicorp/aws v4.0.0 (signed by HashiCorp)

FernandoMiguel · 2022-02-12T18:58:21Z

Thanks. Totally agree this will impact a lot of people. Especially those who terminate build servers to save costs on the weekends like me. I will have to find a way to run the aws cli command every time the instance starts

Why?
It's a simple metadata option you path to the launch config of that vm.. It's not even cloud init change. So, super simple.

FernandoMiguel · 2022-02-12T19:11:49Z

Thanks. Totally agree this will impact a lot of people. Especially those who terminate build servers to save costs on the weekends like me. I will have to find a way to run the aws cli command every time the instance starts

Why? It's a simple metadata option you path to the launch config of that vm.. It's not even cloud init change. So, super simple.

Yes, but I have multiple Service Catalog templates that spin up Runners for multiple projects. I will have to find a way to add that option to the template

Welcome to manage infra with code.
What would you do if you had to add extra EBS volume?

dr-travis · 2022-02-13T21:43:35Z

The following solution works for me.

Change the paths to aws config file and credential file from:

provider "aws" {
  region = "us-east-2"
  shared_config_files=["~/.aws/config"] # Or $HOME/.aws/config
  shared_credentials_files = ["~/.aws/credentials"] # Or $HOME/.aws/credentials
  profile = "default"
}

to

provider "aws" {
  region = "us-east-2"
  shared_config_files=["/Users/me/.aws/config"]
  shared_credentials_files = ["/Users/me/.aws/credentials"]
  profile = "default"
}

willthames · 2022-02-14T05:26:20Z

Thanks. Totally agree this will impact a lot of people. Especially those who terminate build servers to save costs on the weekends like me. I will have to find a way to run the aws cli command every time the instance starts

Why? It's a simple metadata option you path to the launch config of that vm.. It's not even cloud init change. So, super simple.

Yes, but I have multiple Service Catalog templates that spin up Runners for multiple projects. I will have to find a way to add that option to the template

Welcome to manage infra with code. What would you do if you had to add extra EBS volume?

This reply seems unnecessarily dismissive.

In our case, our agents are managed by terraform cloud - we're paying hashicorp good money to avoid having to manage terraform workers - and we don't have the level of access to be able to configure metadata settings.

Edit: oops, the issue is occurring on agents running on our infrastructure, which I do have control of.

FernandoMiguel · 2022-02-14T07:29:28Z

Then it's a bug on their part and you should report that to your provider (hashicorp) , since you are a paying client.

…

-- Fernando 🐼

On Mon, 14 Feb 2022, 05:26 Will Thames, ***@***.***> wrote: In our case, our agents are managed by terraform cloud - we're paying hashicorp good money to avoid having to manage terraform workers - and we don't have the level of access to be able to configure metadata settings. — Message ID: <hashicorp/terraform-provider-aws/issues/23110/1038659595@ github.com>

mccartney · 2022-02-14T12:28:05Z

For people using the EC2 plugin in Jenkins and configure Jenkins as YAML code, this line (shown as the last one) helps:

          - description: "my worker"
            type: Z1d6xlarge
[...]
            associatePublicIp: true
            metadataHopsLimit: 2

chris-peterson · 2022-02-14T18:51:12Z

surprised to see all the mentions of "fixing up existing instances" with various CLI incantations.

the bread and butter of terraform is immutable infrastructure.

IMO, the right sustainable fix is to modify metadata_options in your terraform source(s). this will vary slightly based on how you are creating instances, but the various mechanisms support similar functionality:

The field to pay special attention to is http_put_response_hop_limit which should be changed from its default (1) to 2 (for most cases)

In my case, we were using launch configurations, adding the following to the aws_launch_configuration that creates our infrastructure builders got things back to ✅

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "optional"
    http_put_response_hop_limit = 2
  }

fabionovais · 2022-02-14T22:26:21Z

The following solution works for me.

Change the paths to aws config file and credential file from:

provider "aws" {
  region = "us-east-2"
  shared_config_files=["~/.aws/config"] # Or $HOME/.aws/config
  shared_credentials_files = ["~/.aws/credentials"] # Or $HOME/.aws/credentials
  profile = "default"
}

to

provider "aws" {
  region = "us-east-2"
  shared_config_files=["/Users/me/.aws/config"]
  shared_credentials_files = ["/Users/me/.aws/credentials"]
  profile = "default"
}

thanks @dr-travis this solution was ok form me

Akupsmee · 2022-02-14T22:49:49Z

The following solution works for me.

Change the paths to aws config file and credential file from:

provider "aws" {
  region = "us-east-2"
  shared_config_files=["~/.aws/config"] # Or $HOME/.aws/config
  shared_credentials_files = ["~/.aws/credentials"] # Or $HOME/.aws/credentials
  profile = "default"
}

to

provider "aws" {
  region = "us-east-2"
  shared_config_files=["/Users/me/.aws/config"]
  shared_credentials_files = ["/Users/me/.aws/credentials"]
  profile = "default"
}

worked for me after running "aws configure" command

github-actions · 2022-02-15T18:23:03Z

This functionality has been released in v4.1.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

software-engr-full-stack · 2022-03-21T03:47:07Z

I don't know if this will help. I've been using Terraform Cloud as my back end. I changed the workspace execution mode from "remote" to "local" and it worked. I didn't change any versions. I'm using whatever version terraform init installed which was hashicorp/aws v4.6.0 as of this writing.

github-actions · 2022-05-07T02:21:45Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Feb 10, 2022

ewbankkit added the provider Pertains to the provider itself, rather than any interaction with AWS. label Feb 10, 2022

YakDriver added upstream Addresses functionality related to the cloud provider. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. and removed needs-triage Waiting for first response or review from a maintainer. labels Feb 10, 2022

gdavison self-assigned this Feb 10, 2022

tommyb82 mentioned this issue Feb 11, 2022

Pin TF AWS provider version to 3+ to bypass EC2IDMS error Crown-Commercial-Service/ccs-scale-cat-buyer-ui#761

Merged

8 tasks

YakDriver mentioned this issue Feb 11, 2022

v4: Authentication not honoring AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY #23131

Closed

gdavison mentioned this issue Feb 11, 2022

Authentication Issues hashicorp/aws-sdk-go-base#110

Closed

gdavison changed the title ~~4.0 - Issue with EC2 Instance Metadata~~ 4.0 - Issue with EC2 Instance Metadata running inside Container Feb 11, 2022

ewbankkit added the authentication Pertains to authentication; to the provider itself of otherwise. label Feb 12, 2022

ewbankkit mentioned this issue Feb 12, 2022

Terraform plan and apply failing with authentication error inside docker using latest aws provider version 4.0.0 #23150

Closed

gdavison mentioned this issue Feb 14, 2022

Uses AWS SDK BuildableClient for custom HTTP client hashicorp/aws-sdk-go-base#116

Merged

justinretzolk mentioned this issue Feb 14, 2022

no valid credential sources for Terraform AWS Provider found #23167

Closed

gdavison mentioned this issue Feb 15, 2022

Updates aws-sdk-go-base and better documents authentication changes #23191

Merged

kayman-mk mentioned this issue Feb 15, 2022

feat: add metadata options for AWS 4.x provider cattle-ops/terraform-aws-gitlab-runner#445

Closed

YakDriver closed this as completed in #23191 Feb 15, 2022

github-actions bot added this to the v4.1.0 milestone Feb 15, 2022

ericb-summit mentioned this issue Feb 15, 2022

shared_credential_files in v4 broken? #23207

Closed

github-actions bot locked as resolved and limited conversation to collaborators May 7, 2022

4.0 - Issue with EC2 Instance Metadata running inside Container #23110

4.0 - Issue with EC2 Instance Metadata running inside Container #23110

Comments

kylegoch commented Feb 10, 2022

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

YakDriver commented Feb 10, 2022 • edited Loading

ntman4real commented Feb 10, 2022

gdavison commented Feb 10, 2022

ntman4real commented Feb 10, 2022

opalmer commented Feb 11, 2022 • edited Loading

opalmer commented Feb 11, 2022

breser commented Feb 11, 2022 • edited Loading

gdavison commented Feb 11, 2022

opalmer commented Feb 11, 2022 • edited Loading

Grummfy commented Feb 11, 2022

FernandoMiguel commented Feb 11, 2022

rexsuecia commented Feb 11, 2022 • edited Loading

YakDriver commented Feb 11, 2022

rexsuecia commented Feb 11, 2022

aissarmurad commented Feb 11, 2022

FernandoMiguel commented Feb 11, 2022

aissarmurad commented Feb 11, 2022

FernandoMiguel commented Feb 11, 2022

gdavison commented Feb 11, 2022

gdavison commented Feb 11, 2022

gdavison commented Feb 11, 2022

breser commented Feb 12, 2022

gdavison commented Feb 12, 2022

breser commented Feb 12, 2022

FernandoMiguel commented Feb 12, 2022

FernandoMiguel commented Feb 12, 2022

dr-travis commented Feb 13, 2022

willthames commented Feb 14, 2022 • edited Loading

FernandoMiguel commented Feb 14, 2022 via email

mccartney commented Feb 14, 2022

chris-peterson commented Feb 14, 2022

fabionovais commented Feb 14, 2022

Akupsmee commented Feb 14, 2022

github-actions bot commented Feb 15, 2022

software-engr-full-stack commented Mar 21, 2022

github-actions bot commented May 7, 2022

YakDriver commented Feb 10, 2022 •

edited

Loading

opalmer commented Feb 11, 2022 •

edited

Loading

breser commented Feb 11, 2022 •

edited

Loading

opalmer commented Feb 11, 2022 •

edited

Loading

rexsuecia commented Feb 11, 2022 •

edited

Loading

willthames commented Feb 14, 2022 •

edited

Loading