-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
4.0 - Issue with EC2 Instance Metadata running inside Container #23110
Comments
@YakDriver so what is the fix? Whole lot of things are breaking at the moment. Should have pinned provider ver. but didn't. FYI same issue as OP, works locally but when using assumed role via CICD EC2 runners, the issue exists. |
Thanks for reporting this, @kylegoch. Could you attach the debug log please? You can enable debug logging by setting the environment variable |
I was working on producing a debug log too and ran across something interesting. It does work in my case but specifically outside of the docker container running the terraform job in our case. The container is running on a host where the instance metadata service, just like @kylegoch's, is exposed and the container has access to it via a bridge network. Here's an example of how that container is spun up:
I wrote a small test program that I think reproduces the behavior I'm seeing too. On the host it runs fine but inside the container it produces:
Output from |
... and of course it would be helpful if I actually included that program I mentioned haha. |
This is happening because v4.0.0 is using IMDSv2. IMDSv2 requires a PUT to retrieve the token. There is a setting that limits the number of hops that the response to that PUT will go before being dropped by the network, This means if you are more than one network hop away from the the IMDS then you will get the errors. The most common reason for this is that you're running in a docker container. The solution is to increase the hop count to at least 2. Running the following command will fix this: This change is very vaguely mentioned in the release notes for v4.0.0:
This doesn't seem to impact the AWS CLI when running in a docker container with the hop count set to 1 on a host that allows v1 and v2. I suspect because it either falls back to v1 or tries v1 first and never fails to get the response from the PUT. But that's probably not the terraform provider's issue but really the AWS Go SDK. This is likely to impact a lot of people because many build systems use docker containers these days. I'd strongly recommend working to get the Go SDK to do something intelligent here. |
The Provider is using the AWS SDK for Go v2 for authentication. According to AWS documentation,
The AWS SDK for Go v1 also tries IMDSv2 first, so it's not clear why it worked with earlier versions of the provider and fails with v4.0. We can update our documentation and try to return a more helpful message. |
@breser good call on that second hop, forgot about that with IMDS! @gdavison, I'm going to echo what @breser suggests and work on figuring out how to make this fall back correctly. The test code I attached reproduced the issue but the following code also works out of the box inside a docker container: package main
import (
"context"
"fmt"
"time"
"github.com/aws/aws-sdk-go-v2/config"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), time.Second*10)
defer cancel()
cfg, err := config.LoadDefaultConfig(ctx)
if err != nil {
panic(err)
}
creds, err := cfg.Credentials.Retrieve(ctx)
if err != nil {
panic(err)
}
fmt.Println(creds)
} This is using the same version of the AWS Go SDK v2 that the terraform provider is using. In fact, the above is what I dropped directly into main.go after cloning down this project and checking out the v4.0.0 tag. If the default configuration from the AWS SDK works around this issue then I believe the provider should be as well. I suspect it's specifically something to do with how the AWS config that's being generated in github.com/hashicorp/aws-sdk-go-base. In addition, another reason to fix this is if you do something weird with your network that number of hops could change easily breaking terraform requiring another instance level modification. If someone has specifically disabled the fallback on their host that's one thing but if it's enabled and available to terraform then the provider should take advantage of that after trying the better (IMDSv2) one. |
Tips for whoever are block by this, rollback to v3, see #20433 and use
|
Anyone using ASG you need
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/launch_template#metadata-options |
I ran into the same, my conclusion became that with provider 4.0.0 it does no longer respect the credentials set in the environment ( I run a number of project on GitLab CI/CD and have used the same approach for years, but today it broke. I have the secrets stored on group account in environment variables. When it does not find the credentials in the environment it looks for "profiles" in My workaround was to put the creds in a file like: And finally carefully clean out these credentials Far from optimal but doing so it works, for me. FWIW |
@rexsuecia Thank you for reporting this additional aspect. The provider should still respect the access and secret key env vars. Will you create a separate issue for that so we can track it? |
@YakDriver I wish I had the time to do that. But the issue submission process is so cumbersome I simply cannot do that in near time, I have 20+ projects that need patching to handle this (and the other interesting breaking changes in v4 ( ;-) ) so I am more than busy this weekend and I bet you guys will have released a fixed 4.0.1 before I even have created a reproducable gist. |
The workaround in the most part of the time will be
|
That will not work. |
@FernandoMiguel according to the Terraform documentation
Reference |
That's exactly what I said 😉 |
Thanks for your patience, everyone. We're investigating what has changed between v4.0 and previous versions that causes this to fail inside containers now. If you have other authentication issues that are not related to using the EC2 Instance Metadata Service from inside a Container, please open a new issue so that they can be tracked separately. |
@opalmer thanks for your investigation. When the instance is configured to use either IMDSv1 or IMDSv2, the sample code succeeds, but when IMDSv2 is required, the sample code fails with
@opalmer and @kylegoch, can you paste the output of |
I've just tried the provider v3 authentication flow in a container with both IMDSv1 and IMDSv2, which succeeds, and requiring IMDSv2, which fails. |
This was happening for me with machines that had IMDSv1 and IMDSv2 enabled (taken from a AWS Config snapshot that I pulled down trying to investigate this issue yesterday): |
@breser was Terraform running in a container? Can you share the contents of your provider configuration block, please? provider "aws" {
...
} |
Yes runnning in a container:
Using terraform 0.13.7 (yes I know it's old). Provider versions from the init output:
|
Why? |
Welcome to manage infra with code. |
The following solution works for me. Change the paths to aws config file and credential file from: provider "aws" {
region = "us-east-2"
shared_config_files=["~/.aws/config"] # Or $HOME/.aws/config
shared_credentials_files = ["~/.aws/credentials"] # Or $HOME/.aws/credentials
profile = "default"
} to provider "aws" {
region = "us-east-2"
shared_config_files=["/Users/me/.aws/config"]
shared_credentials_files = ["/Users/me/.aws/credentials"]
profile = "default"
} |
This reply seems unnecessarily dismissive.
Edit: oops, the issue is occurring on agents running on our infrastructure, which I do have control of. |
Then it's a bug on their part and you should report that to your provider
(hashicorp) , since you are a paying client.
…--
Fernando 🐼
On Mon, 14 Feb 2022, 05:26 Will Thames, ***@***.***> wrote:
In our case, our agents are managed by terraform cloud - we're paying
hashicorp good money to avoid having to manage terraform workers - and we
don't have the level of access to be able to configure metadata settings.
— Message ID: <hashicorp/terraform-provider-aws/issues/23110/1038659595@
github.com>
|
For people using the EC2 plugin in Jenkins and configure Jenkins as YAML code, this line (shown as the last one) helps:
|
surprised to see all the mentions of "fixing up existing instances" with various CLI incantations. the bread and butter of terraform is immutable infrastructure. IMO, the right sustainable fix is to modify The field to pay special attention to is In my case, we were using launch configurations, adding the following to the metadata_options {
http_endpoint = "enabled"
http_tokens = "optional"
http_put_response_hop_limit = 2
} |
thanks @dr-travis this solution was ok form me |
worked for me after running "aws configure" command |
This functionality has been released in v4.1.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I don't know if this will help. I've been using Terraform Cloud as my back end. I changed the workspace execution mode from "remote" to "local" and it worked. I didn't change any versions. I'm using whatever version terraform init installed which was hashicorp/aws v4.6.0 as of this writing. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Terraform CLI and Terraform AWS Provider Version
v1.1.2
Affected Resource(s)
The provider itself
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Debug Output
Panic Output
Expected Behavior
Terraform should plan and run using the EC2 metadata.
Actual Behavior
Steps to Reproduce
terraform plan
Important Factoids
Today when switching to v4.0, we discovered we could no longer run Terraform on EC2 instances that use the AWS Instance Metadata service. Running v4.0 locally works fine. But running the same terraform on an EC2 instance (such as for CICD) results in the error shown above.
Rolling back to 3.74.1 fixes the issue and all works as planned.
The instances in question are running both v1 and v2 of the Instance Metadata service.
The text was updated successfully, but these errors were encountered: