-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus for statistics #90
Conversation
@thearossman What is your opinion on this? |
Hi! There's a lot here, and I'm not familiar with Prometheus, so I'm still parsing. We're having some issues with our live traffic setup right now, but I'll take a closer look and try to run this (hopefully) later this week to check whether it holds up at line rate. Initial thoughts:
Going to request that either @tbarbette or @thegwan weigh in here as well, since it's a big addition and I'm not familiar with this framework. |
I haven't taken a look at the code in this PR, but I'm very supportive of the notion of exposing metrics to Prometheus (which is pretty much the choice for this type of monitoring in 2024). This feels like a very sane way of capturing metrics over time from Retina. |
Could you add setup/interpretation instructions for this? Will test for performance on our traffic, but it would be helpful to have easier setup steps! |
f4e9c42
to
2f872d4
Compare
I added a simple guide in the documentation of
That's really interesting! Maybe we can expose something so that users also can register their metrics and we export all of them. I need to think about that.
I will make it configurable in the |
Confirming I ran the prometheus-websites example and it worked on ~110Gbps of our traffic! I haven't yet gotten up the prometheus server (I think that's a me problem). I think it would still be nice to have a compile- or runtime feature flag to enable/disable this export. Could be in a later commit. Thanks for the additions, and thanks for your patience with the review turnaround over the holidays |
I added a runtime configuration which disables the exporter server and the atomic counters (thread locals will still get incremented).
Thanks for maintaining this. I worked with many open source projects, and retina is easily among top 10% in term of maintenance. |
I haven't looked at the actual UI (looks like there's a fw/config issue on our I did confirm that running the websites-prometheus example with the Prometheus config set up could handle the ~110Gbps that was on our network at the time. I also ran this in offline mode and the data looks as expected. I feel pretty good about merging this, especially since it can be switched on/off! Will leave it up for another day in case anyone else (@thegwan @tbarbette ?) has any thoughts. |
Good for me ! Having compile-time flags for new features is important, else we'll end up with a bloatware like Zeek. And I hear about Prometheus around me, so it's nice, thanks ! |
@HKalbasi if I merged this, how would you feel about changing the runtime config to a compile-time feature (similar to how the mlx5 feature is implemented) sometime in the future? Thanks again, excited to have this! |
I added a compile time feature for this. I kept the runtime configuration as well since we needed the port and ip configuration anyway and disabling the server in the absence of them is a sane default. |
This PR adds some statistics using prometheus. Prometheus is a time series database which is widely used for collecting metrics.
This is a per core chart of total bits per second. It is on my local machine with a fake traffic so it is in order of Kbit/s.
I put some effort in making the statistics cost negligible. I store them in thread local variables, and flush them into the atomic ones occasionally (currently every 1000 idle iterations). I also made a parameter for each core even for stats that their per core versions doesn't matter much, in order to prevent contention. To run the prometheus server, I used the main core. Previously it did only monitorings, now it does monitoring and runs a http server for prometheus concurrently using a single threaded tokio runtime.