You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to scale HPA based on GPU metric; everything seems to be working but when I am trying to query metric using below command, I am getting the output as "Error from server (NotFound): the server could not find the requested resource."
Expected Behavior
HPA shpuld scale pods, when met condition.
I have checked hpa logs and it says valid metric found, but still it is not scaling up the pods ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: gpu-dcgmproftester-deployment-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},})
Actual Behavior
No events happening in KEDA HPA, even my GPU utilization goes above 10%, also, GPU metric is available in Prometheus
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
stalebot
added
the
stale
All issues that are marked as stale due to inactivity
label
Nov 7, 2024
Report
I am trying to scale HPA based on GPU metric; everything seems to be working but when I am trying to query metric using below command, I am getting the output as "Error from server (NotFound): the server could not find the requested resource."
Expected Behavior
HPA shpuld scale pods, when met condition.
I have checked hpa logs and it says valid metric found, but still it is not scaling up the pods
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-prometheus(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: gpu-dcgmproftester-deployment-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},})
Actual Behavior
No events happening in KEDA HPA, even my GPU utilization goes above 10%, also, GPU metric is available in Prometheus
KEDA HPA file
Steps to Reproduce the Problem
I have followed below article
https://gcore.com/docs/cloud/kubernetes/clusters/autoscaling/configure-gpu-autoscaling-for-kubernetes
Logs from KEDA operator
2024/09/06 05:25:58 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota 2024-09-06T05:25:58Z INFO setup Starting manager 2024-09-06T05:25:58Z INFO setup KEDA Version: 2.15.1 2024-09-06T05:25:58Z INFO setup Git Commit: 09a4951478746ba0d95521b786439e58aeda179b 2024-09-06T05:25:58Z INFO setup Go Version: go1.22.5 2024-09-06T05:25:58Z INFO setup Go OS/Arch: linux/amd64 2024-09-06T05:25:58Z INFO setup Running on Kubernetes 1.30+ {"version": "v1.30.3-eks-a18cd3a"} 2024-09-06T05:25:59Z INFO starting server {"kind": "health probe", "addr": "[::]:8081"} I0906 05:25:59.037583 1 leaderelection.go:250] attempting to acquire leader lease keda/operator.keda.sh... I0906 05:26:16.489836 1 leaderelection.go:260] successfully acquired lease keda/operator.keda.sh 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v1alpha1.ScaledObject"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "source": "kind source: *v2.HorizontalPodAutoscaler"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "source": "kind source: *v1alpha1.TriggerAuthentication"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "source": "kind source: *v1alpha1.ScaledJob"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "source": "kind source: *v1alpha1.CloudEventSource"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "source": "kind source: *v1alpha1.ClusterTriggerAuthentication"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication"} 2024-09-06T05:26:16Z INFO cert-rotation starting cert rotator controller 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *v1.Secret"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"} 2024-09-06T05:26:16Z INFO Starting EventSource {"controller": "cert-rotator", "source": "kind source: *unstructured.Unstructured"} 2024-09-06T05:26:16Z INFO Starting Controller {"controller": "cert-rotator"} 2024-09-06T05:26:16Z INFO cert-rotation no cert refresh needed 2024-09-06T05:26:16Z INFO cert-rotation certs are ready in /certs 2024-09-06T05:26:16Z INFO Starting workers {"controller": "cert-rotator", "worker count": 1} 2024-09-06T05:26:16Z INFO cert-rotation no cert refresh needed 2024-09-06T05:26:16Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"} 2024-09-06T05:26:16Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"} 2024-09-06T05:26:16Z INFO cert-rotation no cert refresh needed 2024-09-06T05:26:16Z INFO cert-rotation Ensuring CA cert {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"} 2024-09-06T05:26:16Z INFO cert-rotation Ensuring CA cert {"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"} 2024-09-06T05:26:16Z INFO Starting workers {"controller": "scaledjob", "controllerGroup": "keda.sh", "controllerKind": "ScaledJob", "worker count": 1} 2024-09-06T05:26:16Z INFO Starting workers {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "worker count": 5} 2024-09-06T05:26:16Z INFO Starting workers {"controller": "triggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "TriggerAuthentication", "worker count": 1} 2024-09-06T05:26:16Z INFO Starting workers {"controller": "cloudeventsource", "controllerGroup": "eventing.keda.sh", "controllerKind": "CloudEventSource", "worker count": 1} 2024-09-06T05:26:16Z INFO Starting workers {"controller": "clustertriggerauthentication", "controllerGroup": "keda.sh", "controllerKind": "ClusterTriggerAuthentication", "worker count": 1} 2024-09-06T05:26:17Z INFO cert-rotation CA certs are injected to webhooks 2024-09-06T05:26:17Z INFO grpc_server Starting Metrics Service gRPC Server {"address": ":9666"} 2024-09-06T05:32:28Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "3d73bd81-f844-4637-b4dd-04909f5a3c6b"} 2024-09-06T05:32:28Z INFO Adding Finalizer for the ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "3d73bd81-f844-4637-b4dd-04909f5a3c6b"} 2024-09-06T05:32:28Z INFO KubeAPIWarningLogger metadata.finalizers: "finalizer.keda.sh": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers 2024-09-06T05:32:28Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "3d73bd81-f844-4637-b4dd-04909f5a3c6b", "resource": "apps/v1.Deployment", "name": "gpu-api"} 2024-09-06T05:32:28Z INFO Creating a new HPA {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "3d73bd81-f844-4637-b4dd-04909f5a3c6b", "HPA.Namespace": "default", "HPA.Name": "keda-hpa-gpu-dcgmproftester-deployment-scaledobject"} 2024-09-06T05:32:28Z INFO Initializing Scaling logic according to ScaledObject Specification {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "3d73bd81-f844-4637-b4dd-04909f5a3c6b"} 2024-09-06T05:32:28Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "4e6ab4cb-a72c-42a1-badf-d4ff2b908d52"} 2024-09-06T05:32:28Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "4e6ab4cb-a72c-42a1-badf-d4ff2b908d52", "resource": "apps/v1.Deployment", "name": "gpu-api"} 2024-09-06T05:32:58Z INFO Reconciling ScaledObject {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "9ee7f368-ac5d-43ae-a4b8-34bf7c82357c"} 2024-09-06T05:32:58Z INFO Detected resource targeted for scaling {"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"gpu-dcgmproftester-deployment-scaledobject","namespace":"default"}, "namespace": "default", "name": "gpu-dcgmproftester-deployment-scaledobject", "reconcileID": "9ee7f368-ac5d-43ae-a4b8-34bf7c82357c", "resource": "apps/v1.Deployment", "name": "gpu-api"}
KEDA Version
2.15.1
Kubernetes Version
1.30
Platform
Amazon Web Services
Scaler Details
prometheus
Anything else?
Output of apiservice
kubectl describe apiservice v1beta1.external.metrics.k8s.io
Name: v1beta1.external.metrics.k8s.io Namespace: Labels: app.kubernetes.io/component=operator app.kubernetes.io/instance=keda app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=v1beta1.external.metrics.k8s.io app.kubernetes.io/part-of=keda-operator app.kubernetes.io/version=2.15.1 helm.sh/chart=keda-2.15.1 Annotations: meta.helm.sh/release-name: keda meta.helm.sh/release-namespace: keda API Version: apiregistration.k8s.io/v1 Kind: APIService Metadata: Creation Timestamp: 2024-09-06T05:25:54Z Resource Version: 16395 UID: 25ebf5a5-3448-4732-bd2a-b9ee7d33851f Spec: Ca Bundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURFRENDQWZpZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFoTVJBd0RnWURWUVFLRXdkTFJVUkIKVDFKSE1RMHdDd1lEVlFRREV3UkxSVVJCTUI0WERUSTBNRGt3TmpBME1qVTFOMW9YRFRNME1Ea3dOREExTWpVMQpOMW93SVRFUU1BNEdBMVVFQ2hNSFMwVkVRVTlTUnpFTk1Bc0dBMVVFQXhNRVMwVkVRVENDQVNJd0RRWUpLb1pJCmh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBS1E2MjNPZ3BGMnU0MXVHTjZsb01UMEJxQkJDUE51Q3NjbXUKNXoySDZYRjhBY04zcWNzRlMyS1J1TTV0aFYxRHI2OGNPaUR2UVB2a2Y1UFRnL0xRenRzMTE0Y3RuaGNsamliLwpBV2J4Q2poNlVud0Vocld4ZzBpbDlDWWYxcHBXbVhCQTE4SzJJMUxaQTh4YWppb0hGUjREa3VQc3ZwUUNTQ3d3CnNqVDdWVnZFTkEzYVNzbkhCMExDNXpYaDRwN3dyMzlVUmFNbktLRWV1czQ0K3U3NUtyWDFtM3Y4UVVjamRVbGwKbTRzazdSZXFCYlc4K0FoWFhiTXZuSkRpcHZlTUJUbzVoSnZnL1R0cmQ3NGpHQlZaak5QVkoxQ0o4NXNpTEZUdgpMZmg4dWdnYzNBZkdsTjhkYXZSSHpEWFZpbmNoQjFMR1JUS1JvY2lOQ1ZqMDNiUmxJODhDQXdFQUFhTlRNRkV3CkRnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFRkE3WkFkWEYKRFdSNy9XK2haam5ZRXF4Rm9kUVVNQThHQTFVZEVRUUlNQWFDQkV0RlJFRXdEUVlKS29aSWh2Y05BUUVMQlFBRApnZ0VCQUhHTWJOTVNaTHpuM09ZSnI2Rm5HcUxCUkY1RXU1M3NtVkV3T0t0Y3cxc2J5TWJqYnhzM1QyWWpIWE9EClZTc2k4OXlKWGhtekZrWDJ2OTIwcmVzYTFHWkhhUk5Dc1JVS01LZDZ2bVBrU2JBQzJ5RDRmVFlLaUUrcjgrU0cKWHlzT3BFYTJLUkw5ZnBjdS9scm0vQkwyOEo5Mk9tSy9KdkNHK1pZRVdGTnRWM3RrRmw5Nk9kQjVjNG56OFV3agpaSXFPUzg5Ujh0RjA2elpjaU9Lc1lsdTB1ZjF1c3Z6aVpNc3A3Um53STRvUTJPRWxmTWFOR2hCdCtWcjk1N0E3CjI3TXUvc0JtU2lFQU9ucHpaQ2loSXo1MzdMdzJnV0xkMS9xbDByMmNqRnhhK01IN01aaGx4YW50L28xdmVsSnMKWGtORit3V1UybHkwREZ5bFRmWUhGYURvcDZ3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== Group: external.metrics.k8s.io Group Priority Minimum: 100 Service: Name: keda-operator-metrics-apiserver Namespace: keda Port: 443 Version: v1beta1 Version Priority: 100 Status: Conditions: Last Transition Time: 2024-09-06T05:26:04Z Message: all checks passed Reason: Passed Status: True Type: Available Events: <none>
HPA Logs:
The text was updated successfully, but these errors were encountered: