Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.2.2 regression: S3 storage does not work with AWS IRSA authentication anymore #2888

Closed
sberz opened this issue Sep 4, 2023 · 13 comments · Fixed by #2889 or #3006
Closed

2.2.2 regression: S3 storage does not work with AWS IRSA authentication anymore #2888

sberz opened this issue Sep 4, 2023 · 13 comments · Fixed by #2889 or #3006

Comments

@sberz
Copy link

sberz commented Sep 4, 2023

Describe the bug
After upgrading Tempo from 2.2.1to 2.2.2 our Tempo instances could no longer access S3. We use AWS IRSA 1 to provide AWS credentials.

The regression seems to be introduced by #2871
My current guess is the AWS SDK now prefers the EC2 instance role over IRSA.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy Tempo 2.2.1 on AWS EKS with S3 storage and IRSA authentication
  2. Tempo should start and be able to access S3 as expected
  3. Upgrade Tempo to 2.2.2
  4. Tempo should now crash during startup with S3 permission errors

Expected behavior
Tempo should continue to use the AWS IRSA provided credentials.

Environment:

  • Infrastructure: Kubernetes (AWS EKS)
  • Deployment tool: Helm (tempo-distributed - 1.6.3 - upgraded from 1.6.2)

Additional Context
Service logs (compactor):

level=info ts=2023-09-04T09:01:59.662782222Z caller=main.go:221 msg="initialising OpenTracing tracer"
level=info ts=2023-09-04T09:01:59.680278556Z caller=main.go:108 msg="Starting Tempo" version="(version=2.2.2, branch=HEAD, revision=5e18d78ba)"
level=error ts=2023-09-04T09:01:59.728222984Z caller=main.go:111 msg="error running Tempo" err="failed to init module services error initialising module: store: failed to create store unexpected error from ListObjects on <bucket_name>: Access Denied"

AWS environment variables in a running container:

$ env | grep -i aws
AWS_ROLE_ARN=arn:aws:iam::XXXXXXXXXXX:role/eks-development-v2-tempo
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_STS_REGIONAL_ENDPOINTS=regional
AWS_DEFAULT_REGION=eu-central-1
AWS_REGION=eu-central-1

Storage configuration (copied from tempo-distributed Helm values):

    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::XXXXXXXXXXX:role/eks-development-v2-tempo"

    storage:
      trace:
        backend: s3
        s3:
          bucket: <bucket_name>
          endpoint: s3.eu-central-1.amazonaws.com
          region: eu-central-1 # Added during troubleshooting

Footnotes

  1. https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

@sberz
Copy link
Author

sberz commented Sep 4, 2023

I found a workaround for this issue: after blocking access to IMDS Tempo uses IRSA credentials again.

@z0rc
Copy link

z0rc commented Sep 4, 2023

Facing same issue on EKS after upgrade to Tempo 2.2.2. In my case proposed workaround requires complex changes and I don't consider it as something simple to implement. For now I reverted to 2.2.1 and waiting fix in Tempo.

@mapno
Copy link
Member

mapno commented Sep 6, 2023

Hi folks. I've opened this PR #2889, which should fix the error you're seeing. I've manually tested it, but it'd be great if you're able to test it in your environments as well 🙏 .

To create a build from the PR, checkout the branch and run $ make tempo if you want the binary or $ make docker-tempo if you want a docker image. I've also uploaded builds to hub.docker.com: amd64, arm64.

@sberz
Copy link
Author

sberz commented Sep 8, 2023

Thanks @mapno for providing a bug fix!
I tested the Docker image on AWS EKS with IMDS available and access to it blocked. In both cases Tempo successfully authenticates via IRSA again.

@ekristen
Copy link
Contributor

This doesn't actually seem to fix anything. I am still getting failed to create store unexpected error from ListObjects while on version 2.2.3.

@ekristen
Copy link
Contributor

This is also more complicated than it needs to be ... the minio client chain works without needing separate options to turn it on or off. https://github.com/restic/restic/blob/master/internal/backend/s3/s3.go#L63C1-L95C3

@joe-elliott
Copy link
Member

This is also more complicated than it needs to be ... the minio client chain works without needing separate options to turn it on or off. https://github.com/restic/restic/blob/master/internal/backend/s3/s3.go#L63C1-L95C3

@ekristen

I am not an expert on s3 auth so I can't say for sure what the right fix is here. The config option added native_aws_auth_enabled was just mimicing mimir so we assumed it was a good path. Is it not workign for you?

If you can suggest/PR a series of credential checks that works for all parties we would be love to integrate it into the codebase. Please let us know 🙏

@joe-elliott
Copy link
Member

Ha, just saw your PR! thanks :). taking a look now

@ekristen
Copy link
Contributor

@joe-elliott all good. I just happened to hit the issue and comment and then take a look at the code and realize that I just ran into this on another project using minio SDK and decided to fix it. Let me know if you need anything.

@finda-yeongjo
Copy link

When will this issue be resolved?

We are deploying the operator pattern using helm.
Same problem occurs with 1.6.5 (latest) tempo-distributed.

Looking forward to a solve. Best

@joe-elliott
Copy link
Member

We are struggling to get a fix b/c it's difficult for us to test all of the different AWS auth modes. There is a proposed fix here:

#2958

The author included an image. Can you test to see if this works for you?

@obeleh
Copy link

obeleh commented Oct 26, 2023

For me adding an extraEnv to specify the token file fixed my s3 issues

  extraEnv:
  - name: AWS_ROLE_ARN
    value: ${aws_role_arn}
  - name: AWS_WEB_IDENTITY_TOKEN_FILE
    value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@z0rc
Copy link

z0rc commented Oct 26, 2023

@obeleh your recommendation is misguided. Variables you propose to add in fact are automatically injected by EKS when using IRSA. Your solution is wrong and most likely your problem is elsewhere.

In any case #3006 is merged, which properly resolves this issue and will be available in next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants