Skip to content

Monitoring

benoit74 edited this page Sep 4, 2023 · 7 revisions

The technical monitoring of our infrastructure is based on:

  • UpTime Robot for external monitoring of our web properties
  • Grafana for monitoring of our servers

Grafana

We use a Free Grafana Cloud instance. Our Grafana Cloud instance is https://kiwixorg.grafana.net/. This instance is configured only for k8s logs and metrics.

k8s configuration

Configuration has been done based on https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/config-k8s-agent-flow/.

Configuration is deployed via Helm, see https://github.com/kiwix/k8s/tree/main/grafana

Architecture:

  • Grafana Cloud provides us:
    • a Grafana instance displaying dashboards
    • a Prometheus instance: scrape / store metrics + respond to queries
    • a Loki instance : store logs + respond to queries
  • We host in our grafana namespace:
    • kube-state-metrics (deployment) : service that listens to the Kubernetes API server and generates metrics about the state of the objects
    • opencost (deployment): measures infrastructure costs
    • prometheus-operator-crd (not used yet): operator to configure Prometheus based on k8s resources
    • prometheus-node-exporter (daemonset) : running on each k8s node, grabs metrics at the node level
    • grafana-agent (statefulset): agent grabing metrics (from kube-state-metrics, node-exporter, kubelet, cadvisor, opencost) and sending them to Prometheus
    • grafana-agent-logs (daemonset): same binary as above, but grabing logs (Pods + Cluster events) and sending them to Loki

Grafana agent is installed in Flow Mode configuration.

⚠️ Since for now we use k8s 1.23, kube-state-metrics version 2.4.2 is adapted (next versions of kube-state-metrics is not aligned).

other servers (workers, ...)

ToDo