Consider Grafana or Netdata for system metrics #498

RobHooper · 2024-05-14T10:06:09Z

Grafana is a web interface for visualising data, perfect for our Prometheus system metrics.

Grafana will make it much easier to use and interpret server monitoring, including:

A time picker updating all graphs at once
A server picker to filter all graphs to one server at a time
Simple URL, the queries are stored server-side so we will no longer need long complex custom links.
A simpler interface will make it easier to use in an emergency.

To do:

Deploy Grafana
Configure user accounts
Integrate with Prometheus data source
Create custom dashboard

We already have custom queries written for the current monitor setup, we can re-use these in the new dashboard.

The Prometheus server will be best to host this assuming it has spare system resources.

RobHooper · 2024-05-27T10:03:41Z

We now have Netdata installed on each server as part of Dogsbody's monitoring stack.
Netdata could replace Prometheus and the above proposed Grafana install - needs investigating.

jpmckinney · 2024-06-07T01:36:59Z

Good reminder. Yes, it sounds like it will be simpler to switch to netdata than to setup (and maintain) Grafana.

jpmckinney · 2024-07-04T20:28:13Z

Noting that if RabbitMQ's management interface metrics are removed, then we would need Prometheus/Grafana. https://www.rabbitmq.com/docs/prometheus

Edit: If we switch to Netdata, we should add our our Salt states to install it (Dogsbody installed and configured it manually).

jpmckinney · 2024-10-02T22:40:14Z

I notice that Netdata uses GBs of data on e.g. ocp18 at /var/cache/netdata. Not sure if this can be avoided?

RobHooper · 2024-10-04T11:19:31Z

I notice that Netdata uses GBs of data on e.g. ocp18 at /var/cache/netdata. Not sure if this can be avoided?

This should be configurable in Netdata setting a limit to the amount of local storage it can use.
I would have expected monitoring data to be sent and stored in Netdata Cloud so it seems strange it is using so much.

RobHooper added the S: prometheus Relating to Prometheus services label May 14, 2024

jpmckinney added this to the Priority milestone Jul 4, 2024

jpmckinney changed the title ~~Consider Grafana for System Metrics~~ Consider Grafana or Netdata for system metrics Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider Grafana or Netdata for system metrics #498

Consider Grafana or Netdata for system metrics #498

RobHooper commented May 14, 2024

RobHooper commented May 27, 2024

jpmckinney commented Jun 7, 2024

jpmckinney commented Jul 4, 2024 •

edited

Loading

jpmckinney commented Oct 2, 2024

RobHooper commented Oct 4, 2024

Consider Grafana or Netdata for system metrics #498

Consider Grafana or Netdata for system metrics #498

Comments

RobHooper commented May 14, 2024

RobHooper commented May 27, 2024

jpmckinney commented Jun 7, 2024

jpmckinney commented Jul 4, 2024 • edited Loading

jpmckinney commented Oct 2, 2024

RobHooper commented Oct 4, 2024

jpmckinney commented Jul 4, 2024 •

edited

Loading