Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pod metrics-server pod crash on invalid configuration #5198

Closed
nielstenboom opened this issue Nov 21, 2023 · 2 comments
Closed

Make pod metrics-server pod crash on invalid configuration #5198

nielstenboom opened this issue Nov 21, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@nielstenboom
Copy link

nielstenboom commented Nov 21, 2023

Report

So we had a small issue in our cluster with KEDA not autoscaling pods any longer. After some research it turned out we set a wrong value for the role-arn annotation in EKS.

2023-11-21 14:40:21 | log="E1121 14:40:21.150843 1 provider.go:124] keda_metrics_adapter/provider \"msg\"=\"error getting metric for scaler\" \"error\"=\"WebIdentityErr: failed to retrieve credentials\\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/1234567890\\n\\tstatus code: 400, request id: xxxxx\" \"scaledObject.Name\"=\"redacted-aws-sqs-queue-scaledobject\" \"scaledObject.Namespace\"=\"redacted\" \"scaler\"={}\n"
-- | --

This was hard to catch because the pod kept running without erroring.

Expected Behavior

I would have expected the pod to go in crashloopbackoff

Actual Behavior

The pod keeps running only spawning error logs:

2023-11-21 14:40:21 | log="E1121 14:40:21.150843 1 provider.go:124] keda_metrics_adapter/provider \"msg\"=\"error getting metric for scaler\" \"error\"=\"WebIdentityErr: failed to retrieve credentials\\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/1234567890\\n\\tstatus code: 400, request id: xxxx\" \"scaledObject.Name\"=\"redacted-aws-sqs-queue-scaledobject\" \"scaledObject.Namespace\"=\"redacted\" \"scaler\"={}\n"
-- | --

Steps to Reproduce the Problem

  1. Deploy KEDA in AWS EKS through helm chart
  2. Set wrong eks service account annotation with wrong account id eks.amazonaws.com/role-arn: arn:aws:iam::wrong-account-id:role/keda-role
  3. Setup a scaledJob on SQS and make sure it triggers the scaling
  4. Now. the errors should appear but the pod will not crash

Logs from KEDA operator

No response

KEDA Version

2.8.1

Kubernetes Version

1.23

Platform

Amazon Web Services

Scaler Details

AWS SQS

Anything else?

No response

@nielstenboom nielstenboom added the bug Something isn't working label Nov 21, 2023
@JorTurFer
Copy link
Member

Hello @nielstenboom ,
We shouldn't go into crash loop because AWS auth it's just one of the possible scalers/auths working in the cluster. KEDA handles the error and shows a message in the log and also in the exported metrics, so I'd not say that it's a silent fail.
KEDA 2.8.1 is quite old version and we have improved a lot the observability during the last year, but despite that, in v2.8 you have some useful metrics: keda.sh/docs/2.8/operate/prometheus

You should see any error in keda_metrics_adapter_scaler_errors

@github-project-automation github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Nov 26, 2023
@nielstenboom
Copy link
Author

@JorTurFer okay clear, thanks a lot for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants