-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Prometheus Plugin #692
Conversation
Can someone please review this pull request ? |
@pradeepchhetri Might be a little while - we're all very part time maintainers! I'll try to look on the weekend. But please be patient. :) |
@jamtur01 James, thank you for the quick reply. :) |
This looks promising @pradeepchhetri ! Thanks for working on this ! One thing that strikes me is that the attributes from the riemann event get dropped. Only tags and the "host" field seem to get submitted to the push-gateway. IMHO it's important to also pass (most of) the attributes along to prometheus. We'll typically want to see "service", "state" and friends on the prometheus side too. What do you think ? |
@mfournier Thank you for looking into it. We are actually passing the "service" and "metric" fields as the body of the http post request (https://github.com/pradeepchhetri/riemann/blob/prometheus/src/riemann/prometheus.clj#L18-L22). I wasn't able to figure out how to pass "state" field into each datapoint. The reason i am not attaching the timestamp is because prometheus overwrites the timestamp with the time it scrapes pushgateway. For more details: https://github.com/prometheus/pushgateway#about-timestamps Let me know if you feel we can improve the plugin in some way. |
Oh, just noticed this problem when submitting an event with an empty metric field:
Here is what the event looked like when dumped to the logs:
|
(fn [event] | ||
(let [url (generate-url opts event) | ||
datapoint (generate-datapoint event)] | ||
(when (and (:metric event) (:service event)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this logic repeated? Don't you already check for :service
and :metric
in generate-datapoint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I will fix it.
@pradeepchhetri Can you pass |
@mfournier I can reproduce the error. Looking into it. |
Looking at the implementation more in details, I must say I'm not really sold to the tagN, tagN+1, etc, idea. On the prometheus side, building a graph/alert rule based on this implies knowning in advance how many tags there are, and then searching each of them for the element we're looking for. Wouldn't it be better to join them together in a single label, seperated by a comma for example (better: configurable character) ? ie: About the attributes, again I think it's really important to keep them all. You want as much info as possible about your events in prometheus too. Cleaning up events from extraneous tags/attributes before submission to prometheus is easy in riemann. Having to work with incomplete data in prometheus, less so. So here's a sample event as seen from riemann: Finally, a couple of thoughts/wishlist for later (ie: I non-requirements for an initial implementation imho):
|
👍 for keeping all attributes |
Prometheus developer here. I'm not too familiar with Riemann, I've only read through the docs a few times.
This is our standard way of dealing with lists of items for service discovery. We also prefix and suffix a comma so that it's easier to write the regex so
This is explicitly not what the pushgateway is for, see https://prometheus.io/docs/practices/pushing/ I think what you want here is two fold. First something similar to the influxdb and graphite exporters that can take in data in the Riemann format and expose it in the Prometheus data format. Secondly an integration in Riemann itself to talk out to the Prometheus Alertmanager (https://prometheus.io/docs/alerting/clients/) as you would something like Pagerduty.
Push vs pull is not the issue here, there's ways to handle that. The real challenge is that this is an interface between an event logging system and a metric system. I'm not sure it's sane to try an create a generic zero-configuration link between such systems, something more query based is probably the most you can do (https://github.com/chop-dbhi/prometheus-sql is probably the closest example, and https://github.com/prometheus/nagios_plugins from the other direction). You want your time series to be continuous over time, which doesn't work with a dynamic label such as
No, the intention is relativity rarely run service-level batch jobs that might have say a few pushes a minute in aggregate. Accordingly we haven't done any tuning or benchmarking of the pushgateway. |
Taking a bit of a look at your data format, your |
@brian-brazil Thank you for joining the discussion and providing great suggestions in improving the plugin. One quick question: Is pushgateway the only way to push metrics to prometheus. Since riemann pushes streams of metrics while pushgateway is for batch kind of jobs, I personally feel that there is high possiblity of missing some datapoints while pushing through pushgateway. Can i push datapoints directly to prometheus in realtime. |
We don't and won't support this. However if you look at things like the influxdb, collectd and graphite exporters they all take in realtime pushes of metrics. You can lose information with exporter style approaches depending on exactly what you're doing, which is why we recommend using the Prometheus client libraries which are designed to be more resilient (you could even get them to output data in the Riemann format).
As far as I can tell Riemann pushes events, not metrics. This mismatch is what makes this trickier than usual, as it's not just the usual question of how to map your tags and attributes to ours. How has this problem been solved with graphite, opentsdb and kariosdb for riemann? They also deal in metrics rather than events. |
Graphite, opentsdb, kairosdb & other timeseries databases pre-define the attributes for each datapoint while riemann provides the flexibility of adding additional attributes apart from the pre-defined ones. Hence a riemann event is just a superset of each of these timeseries database metric. |
@jamtur01 @mfournier I have updated the plugin with the following fixes:
|
Bumping this. |
|
||
(def special-fields | ||
"A set of event fields in Riemann with special handling logic." | ||
#{:service :metric :tags :time :ttl}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there needs to be some way to exclude fields from labels. Let's say I create a new field with a value, like a metric, for which a label doesn't make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special-fields
is the variable for excluding riemann fields from prometheus labels (probably should give better name). Also should expose it as configuration variable so that anyone can overwrite it. @jamtur01 Does that sounds right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I meant exposing it.
Updated plugin with the above feedbacks. |
Bumping this :) |
@pradeepchhetri You can keep bumping it but I am afraid it's still going to be "we'll get to it soon"! |
Thanks @pradeepchhetri ! |
This pull request adds support for Prometheus backend. Since Prometheus uses pull model for collecting metrics, it provides a Pushgateway server which acts as an intermediate proxy for pushing datapoints.
Sample Screenshot: