Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade kube-prometheus-stack chart to v66.1.1 #2341

Closed

Conversation

anders-elastisys
Copy link
Contributor

@anders-elastisys anders-elastisys commented Nov 13, 2024

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

What does this PR do / why do we need this PR?

Noticed that the kube-prometheus-stack was falling behind a bit, this PR upgrades the Helm chart to v66.1.1.
This fixes some ARP metrics and a log issue caused by this in the node-exporter (this is mentioned in the linked issue).

Alertmanager in the Mangement cluster is not upgraded, instead the image version is fixed to previous v0.26.0 due to v0.27.0 deprecating the v1 API endpoint, which is still used by Thanos. Once we upgrade Thanos to v0.35 or higher, the v2 endpoint will be default (see related upstream issue).

Information to reviewers

Since this PR changes a lot of files, I recommend simply testing out the migration script in a dev environment, i.e. deploy previous version of kube-prometheus-stack and then run:

CK8S_CLUSTER=both ./migration/v0.43/apply/11-kube-prometheus-stack.sh execute

Run tests and check that metrics looks fine.

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change upgrades CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@anders-elastisys anders-elastisys marked this pull request as ready for review November 19, 2024 15:31
@anders-elastisys anders-elastisys requested review from a team as code owners November 19, 2024 15:31
@simonklb simonklb removed the request for review from a team November 20, 2024 08:46
@simonklb
Copy link
Contributor

Unassigned @elastisys/goto-scripts due to policy update #2347

Copy link
Contributor

@Zash Zash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out this in a dev cluster. Grafana and metrics etc looks good, but (as expected?) Thanos has trouble.

Should this wait for a Thanos update?

@anders-elastisys
Copy link
Contributor Author

I tried out this in a dev cluster. Grafana and metrics etc looks good, but (as expected?) Thanos has trouble.

Should this wait for a Thanos update?

Hmm maybe, I will test and check it again, I know I had issues with the newer alertmanager version and Thanos, but that should have been fixed by keeping the alertmanager version we used previously... Are you getting any particular alerts for Thanos?

@Zash
Copy link
Contributor

Zash commented Dec 4, 2024

Are you getting any particular alerts for Thanos?

Many of the Thanos pods were crashlooping. I'll have to take another look, could be a misconfiguration of my cluster.

@anders-elastisys anders-elastisys force-pushed the anders-elastisys/upgrade-kube-prometheus-stack branch from 1c0fca2 to 3a9e990 Compare December 17, 2024 15:26
@anders-elastisys anders-elastisys requested a review from a team December 17, 2024 15:26
@anders-elastisys
Copy link
Contributor Author

Many of the Thanos pods were crashlooping. I'll have to take another look, could be a misconfiguration of my cluster.

Hmm when I tried upgrading to the new version I did not have any issues with Thanos.
I will try do a clean install from this version as well and test later.

@anders-elastisys
Copy link
Contributor Author

Just tried a clean install and thanos seems to be running fine, did you manage to investigate why they were crashing for you @Zash ?

@anders-elastisys
Copy link
Contributor Author

I will close this PR in favor of #2381 to upgrade Prometheus to v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Kube-prometheus-stack-60.0.0
3 participants