Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback is triggered without fallback.failureThreshold being taken into account #6053

Closed
s-shirayama opened this issue Aug 9, 2024 · 7 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@s-shirayama
Copy link

Report

When the scaler fails to get metric with the fallback option enabled, we expect that KEDA would scale deployment to fallback.replica after consecutive failures defined on fallback.failureThreshold. But KEDA scaled deployment to fallback.replica immediately after the scaler's first failure.

This seems to be a different behavior from that described in the official documentation.

Expected Behavior

KEDA scales the target deployment to fallback.replica after consecutive failures defined on fallback.failureThreshold.

Actual Behavior

KEDA scales deployment to fallback.replica immediately after the scaler's first failure.

Steps to Reproduce the Problem

  1. Set up ScaledObject with the fallback option enabled
    1. Set a high number to fallback.failureThreshold
  2. Make the scaler fail to get metric (e.g. set wrong URL for MetricsAPI scaler)
  3. Check HPA's desired replica and Number Of Failures of the scaler.

This is ScaledObject spec to reproduce the issue.

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: fallback-test
spec:
  minReplicaCount: 1
  maxReplicaCount: 10
  fallback:
    failureThreshold: 10
    replicas: 5
  scaleTargetRef:
    name: nginx
  triggers:
  - type: metrics-api
    metadata:
      targetValue: "1"
      url: "http://dummy/"
      valueLocation: "dummy"
EOF

When checking HPA's desired replica, it was scaled to fallback.replicas immediately.

❯ k get hpa
NAME                     REFERENCE          TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-fallback-test   Deployment/nginx   <unknown>/1 (avg)   1         10        5          19s

Number Of Failures was less than fallback.failureThreshold, which was unexpected.

❯ k describe so
Name:         fallback-test
Namespace:    default
API Version:  keda.sh/v1alpha1
Kind:         ScaledObject
Spec:
  Fallback:
    Failure Threshold:  10
    Replicas:           5
  Max Replica Count:    10
  Min Replica Count:    1
  Scale Target Ref:
    Name:  nginx
  Triggers:
    Metadata:
      Target Value:    1
      URL:             http://dummy/
      Value Location:  dummy
    Type:              metrics-api
Status:
  Conditions:
    Status:   Unknown
    Type:     Ready
    Message:  Scaling is not performed because triggers are not active
    Reason:   ScalerNotActive
    Status:   False
    Type:     Active
    Message:  No fallbacks are active on this scaled object
    Reason:   NoFallbackFound
    Status:   False
    Type:     Fallback
    Status:   Unknown
    Type:     Paused
  External Metric Names:
    s0-metric-api-dummy
  Health:
    s0-metric-api-dummy:
      Number Of Failures:  1
      Status:              Failing
  Hpa Name:                keda-hpa-fallback-test
  Original Replica Count:  1
:
Events:
  Type     Reason              Age                From           Message
  ----     ------              ----               ----           -------
  Normal   KEDAScalersStarted  52s                keda-operator  Started scalers watch
  Normal   ScaledObjectReady   52s                keda-operator  ScaledObject is ready for scaling
  Warning  KEDAScalerFailed    22s (x2 over 52s)  keda-operator  error requesting metrics endpoint: Get "http://dummy/": dial tcp: lookup dummy on 192.168.194.138:53: no such host
  Normal   KEDAScalersStarted  7s (x3 over 52s)   keda-operator  Scaler metrics-api is built.

Logs from KEDA operator

The logs says Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas with "New Replicas Count": 5.

2024-08-09T04:05:29Z	INFO	Initializing Scaling logic according to ScaledObject Specification	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "dcff18d3-3077-4ef0-a235-b6166e8f8748"}
2024-08-09T04:05:29Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "2e67c111-cb43-4c60-a372-37bc117c9a76"}
2024-08-09T04:05:29Z	INFO	Detected resource targeted for scaling	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"fallback-test","namespace":"default"}, "namespace": "default", "name": "fallback-test", "reconcileID": "2e67c111-cb43-4c60-a372-37bc117c9a76", "resource": "apps/v1.Deployment", "name": "nginx"}
2024-08-09T04:05:29Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "default", "scaledObject.Name": "fallback-test", "scaler": "metricsAPIScaler", "error": "error requesting metrics endpoint: Get \"http://dummy/\": dial tcp: lookup dummy on 192.168.194.138:53: no such host"}
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalerState
	/workspace/pkg/scaling/scale_handler.go:780
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState.func1
	/workspace/pkg/scaling/scale_handler.go:633
2024-08-09T04:05:29Z	INFO	scaleexecutor	Successfully set ScaleTarget replicas count to ScaledObject fallback.replicas	{"scaledobject.Name": "fallback-test", "scaledObject.Namespace": "default", "scaleTarget.Name": "nginx", "Original Replicas Count": 1, "New Replicas Count": 5}

KEDA Version

2.15.0

Kubernetes Version

1.29

Platform

Other

Scaler Details

Any, but I used metrics-api for testing.

Anything else?

According to scale_scaledobjects.go, it seems this behavior (= scaling to fallback.replicas if there is no active scalers and scaler responds with an error) is intentional. But it looks to be a different behavior from that described in the official documentation.

We set a high value to fallback.failureThreshold to avoid triggering frequent fallbacks with temporary, short-lived failures on external metrics retrieval. But it doesn't work expectedly as described above.

@s-shirayama s-shirayama added the bug Something isn't working label Aug 9, 2024
@s-shirayama
Copy link
Author

Hi, is there any update on this? I can provide more details if needed.

Thanks!

Copy link

stale bot commented Oct 28, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Oct 28, 2024
@zroubalik zroubalik removed the stale All issues that are marked as stale due to inactivity label Nov 5, 2024
@zroubalik
Copy link
Member

Thanks for reporting, we should definitely check this.

Copy link

stale bot commented Jan 5, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jan 5, 2025
Copy link

stale bot commented Jan 14, 2025

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Jan 14, 2025
@github-project-automation github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Jan 14, 2025
@jlemaes
Copy link

jlemaes commented Jan 14, 2025

Hi, we're still seeing this same behavior. Can this be reopened?

@rickbrouwer
Copy link
Contributor

rickbrouwer commented Jan 15, 2025

As I see it, you can end up in a fallback in two ways.

  1. by a configuration error, as you indicate, for example, by specifying an incorrect URL. In other words: the scaler gives an error
  2. something goes wrong when retrieving a metric

With option 1, the fallback goes off in the code of scale_scaledobjects.go and immediately goes into the fallback. After all, it is apparently immediately known that there is an error in the configuration.
With option 2, the fallback goes off in the code of fallback.go, there is a piece of code about the FailureThreshold. This one will have to keep track of the number of attempts before it goes into a fallback.

I will verify this later and test it myself. I will come back to this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Status: Ready To Ship
Development

No branches or pull requests

4 participants