Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(prometheus): add prometheus support #134

Merged
merged 29 commits into from
Jan 15, 2025

Conversation

edobry
Copy link
Contributor

@edobry edobry commented Jan 10, 2025

Description

This PR adds support for monitoring with Prometheus, by:

  • adding a new top-level observability parameter section
  • deploying a Prometheus server
  • conditionally exposing metrics endpoints on all EL/CL clients, as well as OP services
  • registering metrics jobs for each with Prometheus

The prometheus module from the ethereum-package package was used as inspiration, but modified to add helper methods and improve the job-registration workflow.

This PR has been tested and successfully deploys a Prometheus server with functioning metrics scrape jobs:
image

@edobry edobry force-pushed the edobry/observability branch 2 times, most recently from d05dfdc to c8e424e Compare January 13, 2025 15:36
@edobry edobry force-pushed the edobry/observability branch from 7c07014 to a3880f8 Compare January 14, 2025 18:50
@zhwrd zhwrd merged commit e22047a into ethpandaops:main Jan 15, 2025
5 checks passed
@edobry edobry deleted the edobry/observability branch January 17, 2025 19:29
zhwrd pushed a commit that referenced this pull request Jan 27, 2025
### Description

This PR builds on the [previously-added Prometheus
support](#134) to
add Grafana support by:
- adding a `grafana_params` section to the top-level `observability`
parameter section
- deploying a Grafana server
- implementing dashboard provisioning

The `grafana` module from the `ethereum-package` package was used as
inspiration, but modified to simplify devX by removing support for
inline dashboards and improving remote dashboard source support.

Additionally, this PR implements API provisioning using the official
[grizzly](https://grafana.github.io/grizzly/) tool, over the existing
file-based provisioning approach to simplify the process of keeping
Kurtosis Grafana in-sync with hosted Grafana. To this end, two new
repositories
([`grafana-dashboards`](https://github.com/ethereum-optimism/grafana-dashboards),
[`grafana-dashboards-public`](https://github.com/ethereum-optimism/grafana-dashboards-public))
have been created, with the intention of tracking extant public &
private dashboards in hosted Grafana.

This PR has been tested and successfully deploys a Grafana server with
including all public dashboards present on our hosted Grafana instance,
organized into the same folder structure:
<img width="1214" alt="image"
src="https://github.com/user-attachments/assets/9a007f0e-fb95-4399-9a96-8be3f33e4eba"
/>

Not all dashboards are yet at full parity, but a fair number of them do
show data:
<img width="1807" alt="image"
src="https://github.com/user-attachments/assets/d1223067-9d31-4b8c-a76a-a9d7b4b4c787"
/>

If you want to try this out locally, add the following snippet to your
params file:
```yaml
optimism_package:
  observability:
    grafana_params:
      dashboard_sources:
        - github.com/ethereum-optimism/grafana-dashboards-public/resources@aa35389fc5dec4043838757e2372368c3efb0a29
```

Remaining work:
- continue converging metrics to enable additional dashboards
- implement promtail/loki support to enable log-based dashboard panels
- deploy any services required for certain dashboards (ie
replica-healthcheck) (?)
- support a subset of existing dashboards using tags/folders
- automatic updates of the
[`grafana-dashboards`](https://github.com/ethereum-optimism/grafana-dashboards)
repository
sigma added a commit that referenced this pull request Jan 29, 2025
#134 accidentally
broke the convention that an explicit "" for images meant using the
default ones.
This restores that behavior for now.
sigma added a commit that referenced this pull request Jan 30, 2025
#134 accidentally
broke the convention that an explicit "" for images meant using the
default ones.
This restores that behavior for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants