-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KubeClientCertificateExpiration alerts #941
Fix KubeClientCertificateExpiration alerts #941
Conversation
1) Change aggregation `by (le)` to `without(service,endpoint...)`, dropping only useless labels, but keeping external labels (like environment etc) intact. Otherwise they get dropped. 2) Change order of metrics in expression: `apiserver_client_certificate_expiration_seconds_bucket` metric comes first so actual expiration date is shown as result in Grafana->Explore queries, not `apiserver_client_certificate_expiration_seconds_count` value (which is quite useless). This make it easier to troubleshoot. 3) Finally, fix aggregation for `on(job)` to become `(job, cluster, instance)`. Otherwise, It would be enough to have just single instance with certificate expiration problem, and it would set all apiservers to 'firing' (false positive!).
This PR has been automatically marked as stale because it has not had any activity in the past 30 days. |
commenting to keep this open a little longer, looks like a genuine issue |
@7840vz would you have a moment to address the conflicts? |
resolved conflicts |
could you check this one as well pls? #942 |
It would be nice to have also the cluster indicated in the alert description (here, and here):
|
@lorenzofelletti Having the |
Fix aggregation for
on(job)
to become(job, cluster, instance)
. Otherwise, It would be enough to have just single instance with certificate expiration problem, and it would set all apiservers to 'firing' (false positive!).Also, change aggregation
by (le)
towithout(service,endpoint...)
, dropping only useless labels, but keeping external labels (like environment etc) intact. Otherwise they get dropped.Change order of metrics in expression:
apiserver_client_certificate_expiration_seconds_bucket
metric comes first so actual expiration date is shown as result in Grafana->Explore queries, notapiserver_client_certificate_expiration_seconds_count
value (which is quite useless). This make it easier to troubleshoot.