Make pod metrics-server pod crash on invalid configuration #5198

nielstenboom · 2023-11-21T15:14:16Z

Report

So we had a small issue in our cluster with KEDA not autoscaling pods any longer. After some research it turned out we set a wrong value for the role-arn annotation in EKS.

2023-11-21 14:40:21 | log="E1121 14:40:21.150843 1 provider.go:124] keda_metrics_adapter/provider \"msg\"=\"error getting metric for scaler\" \"error\"=\"WebIdentityErr: failed to retrieve credentials\\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/1234567890\\n\\tstatus code: 400, request id: xxxxx\" \"scaledObject.Name\"=\"redacted-aws-sqs-queue-scaledobject\" \"scaledObject.Namespace\"=\"redacted\" \"scaler\"={}\n"
-- | --

This was hard to catch because the pod kept running without erroring.

Expected Behavior

I would have expected the pod to go in crashloopbackoff

Actual Behavior

The pod keeps running only spawning error logs:

2023-11-21 14:40:21 | log="E1121 14:40:21.150843 1 provider.go:124] keda_metrics_adapter/provider \"msg\"=\"error getting metric for scaler\" \"error\"=\"WebIdentityErr: failed to retrieve credentials\\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.eu-west-1.amazonaws.com/id/1234567890\\n\\tstatus code: 400, request id: xxxx\" \"scaledObject.Name\"=\"redacted-aws-sqs-queue-scaledobject\" \"scaledObject.Namespace\"=\"redacted\" \"scaler\"={}\n"
-- | --

Steps to Reproduce the Problem

Deploy KEDA in AWS EKS through helm chart
Set wrong eks service account annotation with wrong account id eks.amazonaws.com/role-arn: arn:aws:iam::wrong-account-id:role/keda-role
Setup a scaledJob on SQS and make sure it triggers the scaling
Now. the errors should appear but the pod will not crash

Logs from KEDA operator

No response

KEDA Version

2.8.1

Kubernetes Version

1.23

Platform

Amazon Web Services

Scaler Details

AWS SQS

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-11-26T16:34:54Z

Hello @nielstenboom ,
We shouldn't go into crash loop because AWS auth it's just one of the possible scalers/auths working in the cluster. KEDA handles the error and shows a message in the log and also in the exported metrics, so I'd not say that it's a silent fail.
KEDA 2.8.1 is quite old version and we have improved a lot the observability during the last year, but despite that, in v2.8 you have some useful metrics: keda.sh/docs/2.8/operate/prometheus

You should see any error in keda_metrics_adapter_scaler_errors

nielstenboom · 2023-11-27T08:59:54Z

@JorTurFer okay clear, thanks a lot for the clarification!

nielstenboom added the bug Something isn't working label Nov 21, 2023

keda-automation added this to Roadmap - KEDA Core Nov 21, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Nov 21, 2023

JorTurFer closed this as completed Nov 26, 2023

github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make pod metrics-server pod crash on invalid configuration #5198

Make pod metrics-server pod crash on invalid configuration #5198

nielstenboom commented Nov 21, 2023 •

edited

Loading

JorTurFer commented Nov 26, 2023

nielstenboom commented Nov 27, 2023

Make pod metrics-server pod crash on invalid configuration #5198

Make pod metrics-server pod crash on invalid configuration #5198

Comments

nielstenboom commented Nov 21, 2023 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Nov 26, 2023

nielstenboom commented Nov 27, 2023

nielstenboom commented Nov 21, 2023 •

edited

Loading