Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to kube stack 58.4.0 & prepare for deprecation of cattle.monitoring.io #42

Merged
merged 31 commits into from
Nov 6, 2024

Conversation

puffitos
Copy link
Member

@puffitos puffitos commented Oct 16, 2024

Motivation

To update the prometheus stack components, which were severely outdated and prepare the monitoring chart up to speed with the deprecation of the monitoring.cattle.io API group, which won't be used in the future (see caas-team/prometheus-auth#19)

Changes

  • Changed RBAC to not use the deprecated apiGroup
  • Radically minimized the values.yaml file, which had hundreds of default values from the kube-prometheus helm chart
  • Possibility to make chart (a la caas-cluster-monitoring)
  • Template README for the values
  • Split the README files up, so they make for a little bit of easier reading

TODO

  • Post a diff of the generated manifests with the old yaml and the current one (should be similar, according to the step by step tests and commits I've done)

Features

  • Update to kube-prometheus-stack 58.4.0
  • Update to grafana-7.3.9 chart
  • New Prometheus Rules:
    • container-cpu-usage-seconds-total
    • container-memory-cache
    • container-memory-rss
    • container-memory-swap
    • container-memory-working-set-bytes
    • container-resource
    • pod-owner

A new, final target_matchers in the inhibit_rules for AlertmanagerConfig, to inhibit more Alerts:

- target_matchers:
  - alertname = InfoInhibitor

Deprecation Warning

** This project monitoring version won't work with older versions of caas-cluster-monitoring (< 0.0.13).**

The ServiceAccount isn't allowed to view prometheus.monitoring.cattle.io, but only prometheus.monitoring.coreos.com. This is part of a bigger deprecation we're doing to remove this non-existent API Group from our monitoring stack.

Smaller changes

  • kubeprometheusstack/k8s-sidecar updated to 1.26.1
  • probes are now selected as well based on the project name label
  • field.cattle.io/projectId added as label to many resources

Other changes, not important for release notes

  • #magic___^_^___line added to alertmanager config (autoformatting, nothing to worry about, was the same in caas-cluster-monitoring) -> we should move the template to a file.
  • CPUThrottlingHigh, KubePersistentVolumeFillingUp, KubePersistentVolumeInodesFillingUp, rules aggregate on cluster as well (this may break the rule, nothing we can do here)

The following prom rules were also changed:

  • PrometheusNotIngestingSamples
  • AlertmanagerFailedToSendAlerts
  • AlertmanagerClusterFailedToSendAlerts

Tests done

A previous version (which was exactly similar to this one) is already running in our dev clusters. Still open:

  • E2E deployment of this version, so assert compatibility with new prometheus-auth

Updated the heavily outdated docs and simplified the structure of the
documentation.
added makefile to bundle chart
Updated chart to use the same version as the latest
caas-cluster-monitoring chart
Removed comments, old values, wrong values. The new values won't deploy
a fully working project-monitoring (the old ones didn't either), but now
it's hopefully clearer what each value does.

An installation via the UI should possibly even work now, since the CRDs
were removed from the chart completely.
Signed-off-by: Bruno Bressi <[email protected]>
Since only caas-project-owners have the verb view on prometheus.monitoring.coreos.com
they can also pass it down to their serviceaccounts, which will need to parse the central prometheus database
for federated metrics for their own namespaces.

Signed-off-by: Bruno Bressi <[email protected]>
Signed-off-by: Bruno Bressi <[email protected]>
Prometheus Auth will only use the `prometheus.monitoring.coreos.com` Resource to do its validation
so the users of this helm chart need only that.

Signed-off-by: Bruno Bressi <[email protected]>
Signed-off-by: Bruno Bressi <[email protected]>
@puffitos puffitos added the enhancement New feature or request label Oct 16, 2024
@puffitos puffitos self-assigned this Oct 16, 2024
Copy link
Contributor

@y-eight y-eight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The sidecar needs more juice for the default configuration

Signed-off-by: Bruno Bressi <[email protected]>
@puffitos puffitos merged commit 38ccdd2 into main Nov 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants