Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can you provide prometheus alert rules? #297

Open
xgengsjc2021 opened this issue Aug 2, 2022 · 16 comments
Open

can you provide prometheus alert rules? #297

xgengsjc2021 opened this issue Aug 2, 2022 · 16 comments

Comments

@xgengsjc2021
Copy link

xgengsjc2021 commented Aug 2, 2022

I found there is zero rules in the values. yaml. Can you provide some rules for promtheus monitoring purpose?
Can you provide more rules? Is this one enough in values.yaml?

# - alert: KedaScalerErrors
        #   annotations:
        #     description: Keda scaledObject {{ $labels.scaledObject }} is experiencing errors with {{ $labels.scaler }} scaler
        #     summary: Keda Scaler {{ $labels.scaler }} Errors
        #   expr: sum by ( scaledObject , scaler) (rate(keda_metrics_adapter_scaler_errors[2m]))  > 0
        #   for: 2m
        #   labels:
@tomkerkhove
Copy link
Member

We don't actively add them given this is up to the end-user; we don't want to enforce things on end-users.

However, if you have suggestions we do welcome PRs where we can add them, but commented out.

@xgengsjc2021
Copy link
Author

@tomkerkhove
Where can I see all of metrics KEDA provided? I could not find any metrics on keda website, so I dont know what metrics I can use to setup alerts.

@tomkerkhove
Copy link
Member

The overview is available on https://keda.sh/docs/2.7/operate/prometheus/

@xgengsjc2021
Copy link
Author

xgengsjc2021 commented Aug 3, 2022

@tomkerkhove Thanks.
When I check "http://127.0.0.1:9022/metrics", I only get one metric. (However, in the document, they list 4 metrics. ) Please see my screenshot.
Screenshot2022_08_03_133444

Why does KEDA only list one metric here? Btw, I am using the latest version of KEDA(2.7.2).

From the screenshot, you can see the metric name is: keda_metrics_adapter_scaler_errors_total (In the document, the metric name is keda_metrics_adapter_scaler_error_totals) From my metric, I can get result, but if I queried keda_metrics_adapter_scaler_error_totals, I get nothing.

Besides this, I also tried to query three other metrics below on Prometheus, did not get any result.

keda_metrics_adapter_scaled_object_error_totals
keda_metrics_adapter_scaler_errors
keda_metrics_adapter_scaler_metrics_value

My KEDA setting for Prometheus


prometheus:
    metricServer:
      enabled: true
      port: 9022
      portName: metrics
      path: /metrics
      podMonitor:
        # Enables PodMonitor creation for the Prometheus Operator
        enabled: true
        interval:
        scrapeTimeout:
        namespace: monitoring
        additionalLabels: 
          release: kube-prometheus-stack
        relabelings: []
    operator:
      enabled: true
      port: 8080
      path: /metrics
      podMonitor:
        # Enables PodMonitor creation for the Prometheus Operator
        enabled: true
        interval:
        scrapeTimeout:
        namespace: monitoring
        additionalLabels: 
          release: kube-prometheus-stack

Is there anything I need to modify? Please help check. Thanks

@xgengsjc2021
Copy link
Author

Screenshot2022_08_03_140114

@xgengsjc2021
Copy link
Author

@tomkerkhove Hi, do you have any idea about what I commented above?

@tomkerkhove
Copy link
Member

tomkerkhove commented Aug 18, 2022

This might be a silly question, but does your cluster have any ScaledObject resources? Because if it does not, then that might explain why they are missing.

(sorry for the slow response)

@xgengsjc2021
Copy link
Author

@tomkerkhove Thanks but the reply made me confused. I do have scaledobject resources in my env.

Screenshot2022_08_23_093215

@tomkerkhove
Copy link
Member

That is odd, can this be related to kedacore/keda#3554 @JorTurFer ?

@JorTurFer
Copy link
Member

I don't think so, that issue registers the metric with 0 as value, but the metric is registered. are you checking the metrics server (not the operator) in the port 9022, right?

@xgengsjc2021
Copy link
Author

xgengsjc2021 commented Aug 24, 2022

@JorTurFer @tomkerkhove
Thanks for the response. I did check the metrics server, (not the operator). Btw, from the output below, in my keda ns, I only see on service, keda-operator-metrics-apiserver

kubectl -n keda get svc
NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
keda-operator-metrics-apiserver   ClusterIP   172.20.76.175   <none>        443/TCP,80/TCP,9022/TCP   147d

Then I run port-forward to check the metrics on metrics-apiserver

kubectl -n keda port-forward svc/keda-operator-metrics-apiserver 9022
Forwarding from 127.0.0.1:9022 -> 9022
Forwarding from [::1]:9022 -> 9022
Handling connection for 9022
Handling connection for 9022

@JorTurFer
Copy link
Member

hum
weird... It's possible that after metrics server restart, the counters don't exist because they are created during the first access.
Do you have more than 1 metrics server instance? If you restart the metrics server and wait some minutes (having SOs), don't you have any metric either?

@xgengsjc2021
Copy link
Author

@JorTurFer I only have 1 metrics server pod as you can see below.

kubectl -n keda get pods
NAME                                               READY   STATUS    RESTARTS   AGE
keda-operator-848b9f56f7-5szlp                     1/1     Running   0          27d
keda-operator-metrics-apiserver-5cb9fd7947-gv47p   1/1     Running   0          19m

Based on your suggestion, I rebooted metric pod, then checked the http://127.0.0.1:9022/metrics, got the same result

# HELP keda_metrics_adapter_scaler_errors_total Total number of errors for all scalers
# TYPE keda_metrics_adapter_scaler_errors_total counter
keda_metrics_adapter_scaler_errors_total 0

@JorTurFer
Copy link
Member

really weird...
Could you query some metric manually to ensure that at least one trigger is executed?
I have just seen in your picture that you are using CPU trigger, that trigger is processed by the Kubernetes metrics server (not by KEDA metrics server) and that's why you can't see any other metric. Do you have any trigger which is not CPU/Memory? Could you query it manuall?

@xgengsjc2021
Copy link
Author

@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.

Here is a question hope you can answer:
Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high(It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.

@JorTurFer
Copy link
Member

@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.

That's why you can't see any other metric, because they are not generated yet due to KEDA metrics server hasn't received any query, all the requests are done against Kubernetes metric server. When you use CPU/Memory scaler, KEDA basically create a "regular" HPA hitting to the "regular" metrics server (that's why Kubernetes metrics server is needed)

Here is a question hope you can answer:
Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high (It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.

KEDA creates the HPA and exposes the metrics (except CPU and memory) and is the HPA Controller who manages the autoscaling,, so basically we don't have any change there. Why do you think that the CPU usage was low? I mean, do you have all the usage monitored? Another important thing is that the threshold is not a boundary, it's the desired value. I mean, the HPA Controller will try to be closest as possible to that value, not scaling out/in automatically when the value changes.
Remember also that HPA Controller is really aggressive scaling out and very conservator scaling in, a small peak could trigger the scaling out and several minutes are needed to scaling in. Using KEDA you can customize this default behaviour using advanced section in the ScaledObject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants