-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add native support for Prometheus #1452
Add native support for Prometheus #1452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, left a few comments. A little worried about the gauge cache size
@@ -58,7 +57,7 @@ public void start() { | |||
List$.MODULE$.<StatsFactory>empty(), | |||
Option.<String>empty(), | |||
List$.MODULE$.<Regex>empty(), | |||
Map$.MODULE$.<String, CustomHttpHandler>empty(), | |||
new Map.Map1<>("/prometheus", new PrometheusHandler()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this line be conditioned on whether prometheus is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handler will return 404, if there is no PrometheusMetricRegistry added on Micrometer (see handler class itself).
But I guess we could add a check here too, to make it more obvious while reading the code.
|
||
/** | ||
* MicorMeter meters can integrate with many different metrics backend | ||
* (StatsD/Promethus/Graphite/JMX etc, see https://micrometer.io/docs) | ||
*/ | ||
public class MicroMeterMetricCollector implements MetricCollector { | ||
private static final Map<String, Double> GAUGE_CACHE = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big will this cache become?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends, on how many different gauges will be tracked. Currently it will be 2*topic-count. Maybe I should limit cache size to avoid memory leak, if someone adds way too many metrics with different tags?
The cache is used, because prometheus is returning NaNs otherwise. https://stackoverflow.com/questions/50821924/micrometer-prometheus-gauge-displays-nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add a config param for the cache size (e.g. 100), if the cache goes over that size, throw a runtime exception with error message to tell user to increase the cache size parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just log warning message? We probably shouldn't crash writing process because of metric cache overflow.
288cc6f
to
2b3701c
Compare
In that case use the ERROR level logging and set the cache size threshold a
little bigger: 500
…On Sun, Jul 12, 2020 at 10:48 PM Paulius ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
src/main/java/com/pinterest/secor/monitoring/MicroMeterMetricCollector.java
<#1452 (comment)>:
>
/**
* MicorMeter meters can integrate with many different metrics backend
* (StatsD/Promethus/Graphite/JMX etc, see https://micrometer.io/docs)
*/
public class MicroMeterMetricCollector implements MetricCollector {
+ private static final Map<String, Double> GAUGE_CACHE = new HashMap<>();
Maybe just log warning message? We probably shouldn't crash writing
process because of metric cache overflow.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1452 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYJP73RJ4DAIKMBYIJOZX3R3KN2ZANCNFSM4OX44WFA>
.
|
2b3701c
to
1113b4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a small comment
} | ||
|
||
@Override | ||
public void gauge(String label, double value, String topic) { | ||
String key = label + "_" + topic; | ||
if (mGaugeCache.size() <= mConfig.getMicroMeterCacheSize()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only do the cache size check and issue error message when the cache needs to grow, not each time gauge is called (there will be too frequent)
1113b4c
to
9c024af
Compare
if (!mGaugeCache.containsKey(key) && mGaugeCache.size() >= mConfig.getMicroMeterCacheSize()) { | ||
LOG.error("Gauge cache size reached maximum, this may result in inaccurate metrics, " | ||
+ "you can increase cache size by changing " | ||
+ "\"secor.monitoring.metrics.collector.micrometer.cache.size\" property."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put 'return' statement here? Otherwise Metrics.gauge() result might be un-deterministic
9c024af
to
0182d32
Compare
Looks good, thanks for the contribution. |
We are running Secor on K8s and monitoring our stack with Prometheus.
As Secor uses Micrometer, it is not that hard to add Prometheus support directly, and avoid using exporters for Secor metrics.
I've changed the way
gauge
method works, because Prometheus endpoint was returningNaN
values.More on that https://stackoverflow.com/questions/50821924/micrometer-prometheus-gauge-displays-nan
I've added
/prometheus
endpoint to Ostrich Server to avoid making things complicated with listening to some other port.Also moved initialization of metric controller up to
ConsumerMain
class. I guess we don't need to have different instances for each Consumer. Otherwise I'd have to ensure, that there is only one instance of PrometheusMetricRegistry to avoid duplications and other problems on Prometheus http endpoint. This shouldnt be a problem, as Metric registries are delegating calls to static methods anyway.ref: #430