Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor: Workload identity stopped working after upgrade to v0.34.0 #7138

Closed
bartvanackooij opened this issue Feb 14, 2024 · 2 comments
Closed

Comments

@bartvanackooij
Copy link

In case of issues related to exact bucket implementation, please ping corresponded maintainer from list here: https://github.com/thanos-io/thanos/blob/main/docs/storage.md
@vglafirov

Thanos, Prometheus and Golang version used:

Thanos v0.34.0 
(Deployed through Bitnami Helm chart)

Object Storage Provider:
Azure Storage Account

What happened:
After upgrading from v0.33.0 to v0.34.0 the components connecting to Azure Storage (compactor & store gateway) using Workload Identity can't connect. And I get the error below, downgrading back to v0.33.0 and everything works as expected again.

When checking the Service Account which is connected to both components, I do see the client-id.
Annotations: azure.workload.identity/client-id: xxxxxx

But only on deploying the v0.33.0 version of Thanos, does this environment variable get set in the pods

On v0.34.0 the environment variable is empty. Which I assume is causing the issue.

Full logs to relevant components:
Log from the Store Gateway

❯ kl thanos-storegateway-0 -n monitoring

ts=2024-02-14T13:29:52.742610403Z caller=factory.go:53 level=info msg="loading bucket configuration"

ts=2024-02-14T13:29:53.011149871Z caller=main.go:135 level=error err="DefaultAzureCredential authentication failed
POST https://login.microsoftonline.com/80525e01-c1a1-4824-9b24-acd53a540aa8/oauth2/v2.0/token
RESPONSE 400 Bad Request
{
  \"error\": \"unauthorized_client\",
  \"error_description\": \"AADSTS700016: Application with identifier '####' was not found in the directory '####'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: 26a769f4-c39c-4652-bfcb-ddca8bf20c00 Correlation ID: 2bebca23-6f45-4439-887a-b7857bbac661 Timestamp: 2024-02-14 13:29:52Z\",
  \"error_codes\": [
    700016
  ],
  \"timestamp\": \"2024-02-14 13:29:52Z\",
  \"trace_id\": \"26a769f4-c39c-4652-bfcb-ddca8bf20c00\",
  \"correlation_id\": \"2bebca23-6f45-4439-887a-b7857bbac661\",
  \"error_uri\": \"https://login.microsoftonline.com/error?code=700016\"
}

create AZURE client
github.com/thanos-io/objstore/client.NewBucket
  /bitnami/blacksmith-sandox/thanos-0.34.0/pkg/mod/github.com/thanos-io/[email protected]/client/factory.go:90
main.runStore
  /bitnami/blacksmith-sandox/thanos-0.34.0/src/github.com/thanos-io/thanos/cmd/thanos/store.go:298
main.registerStore.func1
  /bitnami/blacksmith-sandox/thanos-0.34.0/src/github.com/thanos-io/thanos/cmd/thanos/store.go:237
main.main
  /bitnami/blacksmith-sandox/thanos-0.34.0/src/github.com/thanos-io/thanos/cmd/thanos/main.go:133
runtime.main
  /opt/bitnami/go/src/runtime/proc.go:267
runtime.goexit
  /opt/bitnami/go/src/runtime/asm_amd64.s:1650

preparing store command failed
main.main
  /bitnami/blacksmith-sandox/thanos-0.34.0/src/github.com/thanos-io/thanos/cmd/thanos/main.go:135
runtime.main
  /opt/bitnami/go/src/runtime/proc.go:267
runtime.goexit
  /opt/bitnami/go/src/runtime/asm_amd64.s:1650"

How to reproduce it (as minimally and precisely as possible)::
Let me check if I can find a way to easily replicate it.

Anything else we need to know:
Not sure if this matters:
I use object store file with the following values set:

type: AZURE
config:
  storage_account: "NAME"
  container: "metrics"

I do see this change in the latest release which might be relevant:
#6891

@GiedriusS
Copy link
Member

@rikhil-s maybe you could help out here?

@bartvanackooij
Copy link
Author

I will close the issue, it had nothing to do with the application version of Thanos being updated from v0.33.0 to v0.34.0, but with a change in the Bitnami's Helmchart I overlooked.

For the next person who wrongly identified this issue. In the upgrade of the Bitnami Thanos chart from v12.20.2 to v12.20.4 they changed the default value for automountServiceAccountToken from true to false causing the client_id environment variable not being set on the pods.

Apologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants