Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

Closed
4 tasks done
jeffaryhe opened this issue Dec 13, 2024 · 8 comments · Fixed by #1011
Closed
4 tasks done

[Bug]: Prometheus rule KubeletTooManyPods incorrect statistics #997

jeffaryhe opened this issue Dec 13, 2024 · 8 comments · Fixed by #1011
Assignees
Labels
question Further information is requested

Comments

@jeffaryhe
Copy link

What happened?

prometheus-operator/kube-prometheus#2558 pls look this

Please provide any helpful snippets.

No response

What parts of the codebase are affected?

Rules

I agree to the following terms:

  • I agree to follow this project's Code of Conduct.
  • I have filled out all the required information above to the best of my ability.
  • I have searched the issues of this repository and believe that this is not a duplicate.
  • I have confirmed this bug exists in the default branch of the repository, as of the latest commit at the time of submission.
@skl
Copy link
Collaborator

skl commented Dec 16, 2024

Hi @jeffaryhe, thanks for the report. I had a look at the issue you raised:

I also had a look at the KubeletTooManyPods alert rule:

expr: |||
count by(%(clusterLabel)s, node) (
(kube_pod_status_phase{%(kubeStateMetricsSelector)s,phase="Running"} == 1) * on(instance,pod,namespace,%(clusterLabel)s) group_left(node) topk by(instance,pod,namespace,%(clusterLabel)s) (1, kube_pod_info{%(kubeStateMetricsSelector)s})
)
/
max by(%(clusterLabel)s, node) (
kube_node_status_capacity{%(kubeStateMetricsSelector)s,resource="pods"} != 1
) > 0.95
||| % $._config,
'for': '15m',

From what I can see, the alert fires if the count of running pods on a node is at >95% that of the pod limit for that node.

Can you help me understand which statistics you see as incorrect? For example, do you think part of the alert rule could be improved?

@skl skl self-assigned this Dec 16, 2024
@skl skl added the question Further information is requested label Dec 17, 2024
@jeffaryhe
Copy link
Author

If a node supports 60 pods, 60 pods can be deployed, not counting the number of containers in the pod.

@skl
Copy link
Collaborator

skl commented Dec 30, 2024

Containers are not considered as part of the alert, only pods (kube_pod_status_phase and kube_pod_info metrics are pod-level, and kube_node_status_capacity{resource="pods"} provides the pod capacity per node).

I still do not understand the issue, perhaps you could reproduce your problem by creating a new unit test and show me how it fails? Here is the existing unit test for this rule:

kubernetes-mixin/tests.yaml

Lines 406 to 435 in 03c13f9

- interval: 1m
input_series:
- series: 'kube_node_status_capacity{resource="pods",instance="172.17.0.5:8443",cluster="kubernetes",node="minikube",job="kube-state-metrics",namespace="kube-system"}'
values: '3+0x15'
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-1",service="kube-state-metrics"}'
values: '1+0x15'
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-1",service="kube-state-metrics"}'
values: '1+0x15'
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-2",service="kube-state-metrics"}'
values: '1+0x15'
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-2",service="kube-state-metrics"}'
values: '1+0x15'
- series: 'kube_pod_info{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",node="minikube",pod="pod-3",service="kube-state-metrics"}'
values: '1+0x15'
- series: 'kube_pod_status_phase{endpoint="https-main",instance="172.17.0.5:8443",job="kube-state-metrics",cluster="kubernetes",namespace="kube-system",phase="Running",pod="pod-3",service="kube-state-metrics"}'
values: '1+0x15'
alert_rule_test:
- eval_time: 10m
alertname: KubeletTooManyPods
- eval_time: 15m
alertname: KubeletTooManyPods
exp_alerts:
- exp_labels:
cluster: kubernetes
node: minikube
severity: info
exp_annotations:
summary: "Kubelet is running at capacity."
description: "Kubelet 'minikube' is running at 100% of its Pod capacity."
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubelettoomanypods

@aleskiontherun
Copy link

In my case there are 2 instances of kube-state-metrics, so every metric is doubled with different instance lables, and the query returns all numbers doubled, e.g. for an instance with 19 running pods and capacity of 35 the alert fires with the number 1.0857142857142856. How would I de-duplicate it?

@skl
Copy link
Collaborator

skl commented Jan 7, 2025

@aleskiontherun thanks for the detail, as you have duplicate KSM on the instance label that would cause an issue with the first part of the rule, where your pod count will be doubled:

(kube_pod_status_phase{%(kubeStateMetricsSelector)s,phase="Running"} == 1) * on(instance,pod,namespace,%(clusterLabel)s) group_left(node) topk by(instance,pod,namespace,%(clusterLabel)s) (1, kube_pod_info{%(kubeStateMetricsSelector)s})

Whereas the capacity is already de-duplicated, which is why you get >100%:

max by(%(clusterLabel)s, node) (
kube_node_status_capacity{%(kubeStateMetricsSelector)s,resource="pods"} != 1
) > 0.95

So the pod count part of the rule needs to be rewritten, say something like:

count by (%(clusterLabel)s, node) (
  (kube_pod_status_phase{%(kubeStateMetricsSelector)s, phase="Running"} == 1)
  * on (%(clusterLabel)s, namespace, pod) group_left (node)
  group by (%(clusterLabel)s, namespace, pod, node) (
    kube_pod_info{%(kubeStateMetricsSelector)s}
  )
)
/
max by (%(clusterLabel)s, node) (
  kube_node_status_capacity{%(kubeStateMetricsSelector)s, resource="pods"} != 1
) > 0.95

I can get a PR together for this unless you'd like to?

@aleskiontherun
Copy link

@skl I'd need to spend some time to fully understand what's going on in the query and how the mixin works (I'm simply installing it with kube-prometheus-stack), so I'd really appreciate if you could do it. One uneducated guess: wouldn't it make sense to just divide the result by the number of instances?

In my own queries I'm using kubelet_active_pods metric, which is not part of KMS, but allows to simplify the query quite a bit.

@skl
Copy link
Collaborator

skl commented Jan 7, 2025

Sure no problem, I'll get a PR up shortly 👍

@skl
Copy link
Collaborator

skl commented Jan 7, 2025

Done in #1011 @aleskiontherun 😄

@skl skl closed this as completed in #1011 Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants