can you provide prometheus alert rules? #297

xgengsjc2021 · 2022-08-02T23:57:20Z

I found there is zero rules in the values. yaml. Can you provide some rules for promtheus monitoring purpose?
Can you provide more rules? Is this one enough in values.yaml?

# - alert: KedaScalerErrors
        #   annotations:
        #     description: Keda scaledObject {{ $labels.scaledObject }} is experiencing errors with {{ $labels.scaler }} scaler
        #     summary: Keda Scaler {{ $labels.scaler }} Errors
        #   expr: sum by ( scaledObject , scaler) (rate(keda_metrics_adapter_scaler_errors[2m]))  > 0
        #   for: 2m
        #   labels:

The text was updated successfully, but these errors were encountered:

tomkerkhove · 2022-08-03T06:36:17Z

We don't actively add them given this is up to the end-user; we don't want to enforce things on end-users.

However, if you have suggestions we do welcome PRs where we can add them, but commented out.

xgengsjc2021 · 2022-08-03T19:16:06Z

@tomkerkhove
Where can I see all of metrics KEDA provided? I could not find any metrics on keda website, so I dont know what metrics I can use to setup alerts.

tomkerkhove · 2022-08-03T20:11:09Z

The overview is available on https://keda.sh/docs/2.7/operate/prometheus/

xgengsjc2021 · 2022-08-03T20:53:09Z

@tomkerkhove Thanks.
When I check "http://127.0.0.1:9022/metrics", I only get one metric. (However, in the document, they list 4 metrics. ) Please see my screenshot.

Why does KEDA only list one metric here? Btw, I am using the latest version of KEDA(2.7.2).

From the screenshot, you can see the metric name is: keda_metrics_adapter_scaler_errors_total (In the document, the metric name is keda_metrics_adapter_scaler_error_totals) From my metric, I can get result, but if I queried keda_metrics_adapter_scaler_error_totals, I get nothing.

Besides this, I also tried to query three other metrics below on Prometheus, did not get any result.

keda_metrics_adapter_scaled_object_error_totals
keda_metrics_adapter_scaler_errors
keda_metrics_adapter_scaler_metrics_value

My KEDA setting for Prometheus


prometheus:
    metricServer:
      enabled: true
      port: 9022
      portName: metrics
      path: /metrics
      podMonitor:
        # Enables PodMonitor creation for the Prometheus Operator
        enabled: true
        interval:
        scrapeTimeout:
        namespace: monitoring
        additionalLabels: 
          release: kube-prometheus-stack
        relabelings: []
    operator:
      enabled: true
      port: 8080
      path: /metrics
      podMonitor:
        # Enables PodMonitor creation for the Prometheus Operator
        enabled: true
        interval:
        scrapeTimeout:
        namespace: monitoring
        additionalLabels: 
          release: kube-prometheus-stack

Is there anything I need to modify? Please help check. Thanks

xgengsjc2021 · 2022-08-03T21:02:25Z

xgengsjc2021 · 2022-08-10T20:04:40Z

@tomkerkhove Hi, do you have any idea about what I commented above?

tomkerkhove · 2022-08-18T06:53:57Z

This might be a silly question, but does your cluster have any ScaledObject resources? Because if it does not, then that might explain why they are missing.

(sorry for the slow response)

xgengsjc2021 · 2022-08-23T16:34:09Z

@tomkerkhove Thanks but the reply made me confused. I do have scaledobject resources in my env.

tomkerkhove · 2022-08-24T07:56:53Z

That is odd, can this be related to kedacore/keda#3554 @JorTurFer ?

JorTurFer · 2022-08-24T10:08:46Z

I don't think so, that issue registers the metric with 0 as value, but the metric is registered. are you checking the metrics server (not the operator) in the port 9022, right?

xgengsjc2021 · 2022-08-24T16:45:02Z

@JorTurFer @tomkerkhove
Thanks for the response. I did check the metrics server, (not the operator). Btw, from the output below, in my keda ns, I only see on service, keda-operator-metrics-apiserver

kubectl -n keda get svc
NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
keda-operator-metrics-apiserver   ClusterIP   172.20.76.175   <none>        443/TCP,80/TCP,9022/TCP   147d

Then I run port-forward to check the metrics on metrics-apiserver

kubectl -n keda port-forward svc/keda-operator-metrics-apiserver 9022
Forwarding from 127.0.0.1:9022 -> 9022
Forwarding from [::1]:9022 -> 9022
Handling connection for 9022
Handling connection for 9022

JorTurFer · 2022-08-24T19:28:00Z

hum
weird... It's possible that after metrics server restart, the counters don't exist because they are created during the first access.
Do you have more than 1 metrics server instance? If you restart the metrics server and wait some minutes (having SOs), don't you have any metric either?

xgengsjc2021 · 2022-08-24T20:03:35Z

@JorTurFer I only have 1 metrics server pod as you can see below.

kubectl -n keda get pods
NAME                                               READY   STATUS    RESTARTS   AGE
keda-operator-848b9f56f7-5szlp                     1/1     Running   0          27d
keda-operator-metrics-apiserver-5cb9fd7947-gv47p   1/1     Running   0          19m

Based on your suggestion, I rebooted metric pod, then checked the http://127.0.0.1:9022/metrics, got the same result

# HELP keda_metrics_adapter_scaler_errors_total Total number of errors for all scalers
# TYPE keda_metrics_adapter_scaler_errors_total counter
keda_metrics_adapter_scaler_errors_total 0

JorTurFer · 2022-08-24T22:04:42Z

really weird...
Could you query some metric manually to ensure that at least one trigger is executed?
I have just seen in your picture that you are using CPU trigger, that trigger is processed by the Kubernetes metrics server (not by KEDA metrics server) and that's why you can't see any other metric. Do you have any trigger which is not CPU/Memory? Could you query it manuall?

xgengsjc2021 · 2022-08-25T02:26:10Z

@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.

Here is a question hope you can answer:
Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high(It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.

JorTurFer · 2022-08-25T10:25:04Z

@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.

That's why you can't see any other metric, because they are not generated yet due to KEDA metrics server hasn't received any query, all the requests are done against Kubernetes metric server. When you use CPU/Memory scaler, KEDA basically create a "regular" HPA hitting to the "regular" metrics server (that's why Kubernetes metrics server is needed)

Here is a question hope you can answer:
Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high (It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.

KEDA creates the HPA and exposes the metrics (except CPU and memory) and is the HPA Controller who manages the autoscaling,, so basically we don't have any change there. Why do you think that the CPU usage was low? I mean, do you have all the usage monitored? Another important thing is that the threshold is not a boundary, it's the desired value. I mean, the HPA Controller will try to be closest as possible to that value, not scaling out/in automatically when the value changes.
Remember also that HPA Controller is really aggressive scaling out and very conservator scaling in, a small peak could trigger the scaling out and several minutes are needed to scaling in. Using KEDA you can customize this default behaviour using advanced section in the ScaledObject.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can you provide prometheus alert rules? #297

can you provide prometheus alert rules? #297

xgengsjc2021 commented Aug 2, 2022 •

edited

Loading

tomkerkhove commented Aug 3, 2022

xgengsjc2021 commented Aug 3, 2022

tomkerkhove commented Aug 3, 2022

xgengsjc2021 commented Aug 3, 2022 •

edited

Loading

xgengsjc2021 commented Aug 3, 2022

xgengsjc2021 commented Aug 10, 2022

tomkerkhove commented Aug 18, 2022 •

edited

Loading

xgengsjc2021 commented Aug 23, 2022

tomkerkhove commented Aug 24, 2022

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 24, 2022 •

edited

Loading

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 24, 2022

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 25, 2022

JorTurFer commented Aug 25, 2022

can you provide prometheus alert rules? #297

can you provide prometheus alert rules? #297

Comments

xgengsjc2021 commented Aug 2, 2022 • edited Loading

tomkerkhove commented Aug 3, 2022

xgengsjc2021 commented Aug 3, 2022

tomkerkhove commented Aug 3, 2022

xgengsjc2021 commented Aug 3, 2022 • edited Loading

xgengsjc2021 commented Aug 3, 2022

xgengsjc2021 commented Aug 10, 2022

tomkerkhove commented Aug 18, 2022 • edited Loading

xgengsjc2021 commented Aug 23, 2022

tomkerkhove commented Aug 24, 2022

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 24, 2022 • edited Loading

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 24, 2022

JorTurFer commented Aug 24, 2022

xgengsjc2021 commented Aug 25, 2022

JorTurFer commented Aug 25, 2022

xgengsjc2021 commented Aug 2, 2022 •

edited

Loading

xgengsjc2021 commented Aug 3, 2022 •

edited

Loading

tomkerkhove commented Aug 18, 2022 •

edited

Loading

xgengsjc2021 commented Aug 24, 2022 •

edited

Loading