Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider Grafana or Netdata for system metrics #498

Open
4 tasks
RobHooper opened this issue May 14, 2024 · 5 comments
Open
4 tasks

Consider Grafana or Netdata for system metrics #498

RobHooper opened this issue May 14, 2024 · 5 comments
Labels
S: prometheus Relating to Prometheus services
Milestone

Comments

@RobHooper
Copy link
Contributor

Grafana is a web interface for visualising data, perfect for our Prometheus system metrics.

Grafana will make it much easier to use and interpret server monitoring, including:

  • A time picker updating all graphs at once
  • A server picker to filter all graphs to one server at a time
  • Simple URL, the queries are stored server-side so we will no longer need long complex custom links.
  • A simpler interface will make it easier to use in an emergency.

To do:

  • Deploy Grafana
  • Configure user accounts
  • Integrate with Prometheus data source
  • Create custom dashboard

We already have custom queries written for the current monitor setup, we can re-use these in the new dashboard.

The Prometheus server will be best to host this assuming it has spare system resources.

@RobHooper RobHooper added the S: prometheus Relating to Prometheus services label May 14, 2024
@RobHooper
Copy link
Contributor Author

We now have Netdata installed on each server as part of Dogsbody's monitoring stack.
Netdata could replace Prometheus and the above proposed Grafana install - needs investigating.

@jpmckinney
Copy link
Member

Good reminder. Yes, it sounds like it will be simpler to switch to netdata than to setup (and maintain) Grafana.

@jpmckinney
Copy link
Member

jpmckinney commented Jul 4, 2024

Noting that if RabbitMQ's management interface metrics are removed, then we would need Prometheus/Grafana. https://www.rabbitmq.com/docs/prometheus

Edit: If we switch to Netdata, we should add our our Salt states to install it (Dogsbody installed and configured it manually).

@jpmckinney jpmckinney added this to the Priority milestone Jul 4, 2024
@jpmckinney jpmckinney changed the title Consider Grafana for System Metrics Consider Grafana or Netdata for system metrics Oct 2, 2024
@jpmckinney
Copy link
Member

I notice that Netdata uses GBs of data on e.g. ocp18 at /var/cache/netdata. Not sure if this can be avoided?

@RobHooper
Copy link
Contributor Author

I notice that Netdata uses GBs of data on e.g. ocp18 at /var/cache/netdata. Not sure if this can be avoided?

This should be configurable in Netdata setting a limit to the amount of local storage it can use.
I would have expected monitoring data to be sent and stored in Netdata Cloud so it seems strange it is using so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: prometheus Relating to Prometheus services
Projects
None yet
Development

No branches or pull requests

2 participants