-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd output is not good #327
Comments
You could ask the original committer of the approach or provide a better way of doing this. However, before you do this I'd like to understand what "really bad" means in this context. So far nobody has complained about it and aside from the one bug it seems to have been working for people. I usually prefer a fact based approach on discussions and I'm curious which problems the current implementation creates and how you would solve them. |
statsd is meant primarily as a stat aggregator. a typical pattern for using statsd is to emit events on instrumented code paths and allow the statsd implementation handle the aggregation. For example emit a counter event when a code path is hit and then allow for statsd to flush that counter at a configurable interval so you get roll up aggregates in hits per time unit where time unit is a property of the statsd server. Or to emit a timer event when an instrumented code path is hit and allow for statsd to bucket that timer into a configurable histograms/calculate quantiles etc. The pattern in use in fabio is to handle aggregation internally and expose pre-aggregated metrics only as gauges. This limits the ability of consumers of the metrics to shape the data as desired. For instance a consumer is limited to only the quantiles which you've chosen to expose via the code. AFAICT it looks like this is a side effect of using github.com/rcrowley/go-metrics which handles aggregation with intent of periodic flushing as gauges to backends that depend on that model, i.e. graphite |
Using raw metrics when sending to statsd would certainly make life simpler |
Can you provide a specific example? Which metric, what is the current format, what is the desired format? Please keep in mind that I'm not a statsd expert and that you may have to explain more to me than to people already familiar with the matter. |
I'm not sure I can give a better explanation without just writing an example PR. but.... Within the fabio code you've got the notion of a The current implementation of statsd sending relies on |
I have found a few key points that are frustrating us:
I'll elaborate on 2 and 3: min/max/mean/std-dev/50-percentile:As a test I have pushed through some data to fabio and recorded the metrics. The initial values for min/max/mean are: 0/0/0 I push through one request, the values are now each: 44462314.0/44462314.0/44462314.0. What exactly is 44462314.0? My guess is it is latency, but is it in ms? microseconds? Probably it is just me, but I have no idea. The actual request time from my client was ~353ms Now each flush of metrics the value of min/max/mean remains the same, even though I am not putting any traffic through. So the mean is always 44462314.0, forever. So the value is the min/mean/mean value for all the requests since the beginning of the process? That data is not really usable. one-minute, five-minute issueI found after putting through requests over time, the one-minute value never goes to 0. Even though there is no traffic going out for the past minute :S If I wait sometime, and then push traffic through for a minute, the 1 minute value != the values of traffic I actually sent in that last minute. What would be a solutionI think the metrics required would be:
Importantly they would be relative to that metrics period. So if For our purposes only, we just require metrics per service, 2 values: count/latency, we do not require a full suite of metrics per URL/host/fabioserver combination as that is just too much (and we push metrics to cloudwatch which makes it more expensive). ie. service-xyz.count and service-xyz.latency_min/max/avg/mean Or something along those lines... |
* adds new metric sink `raw_statsd` uses same config as `statsd` * new sink sends singl raw events directly to statsd server without any preaggreation or rollup. * TODO - use more featureful/performant statsd client implementation such as https://github.com/alexcesaro/statsd or https://github.com/quipo/statsd which implement features such as optional buffering.
I've issued a quick PR which demonstrates how a simple raw statsd sink can be implemented which doesn't do any internal aggregation and fits more directly into the |
Sorry for the delay but I'm on vacation with the family. The key point here is that you want to send an event to statsd when it happens and want to do the aggregation yourself. So far I've relied on go-metrics to do The Right Thing (™) but I recall some discussions around integrating the circonus support that the histograms in go-metrics aren't that useful. Mostly because they're a port of the codahale model from Java. I didn't dig deeper there. Maybe @maier can help here. @drawks thx for the PR. I'll try to look at it soon. Kids waking up now :) |
This patch adds a statds reporter which does not aggregate values. Fixes #327
This patch adds a statds reporter which does not aggregate values. Fixes #327
This patch adds a statds reporter which does not aggregate values. Fixes #327
@drawks I think your PR would be good enough for an experimental inclusion. Don't you need to add newlines between multiple metrics? The downside of metrics that are not pre-aggregated is that the API will no longer provide these values. If you are using this provider you probably don't care about that. However, integrating one of the libraries you suggested was simple enough that I've tried that in #329. To avoid the alloc per metrics I need to refactor the registry interface a bit but I'd appreciate if you could test the integration. You should set:
I'll sleep a bit on the naming of |
Similar situation to the one we faced with the Circonus integration. The receiving system (Circonus, StatsD, etc.) need the individual samples for metrics not a single value from the registry. |
I think a better solution is to remove the |
This patch adds a statds reporter which does not aggregate values. Fixes #327
This patch adds a statds reporter which does not aggregate values. Fixes #327
This patch adds a statds reporter which does not aggregate values. Fixes #327
This patch refactors the metrics support within fabio to achieve several goals: * support labels for DogstatD, Prometheus and others * support raw events for statsd and others without aggregation * support multiple backends simultaneously to support migration * make integration of new metrics backends and/or libraries simple * being fully backwards compatible to not break existing setups One of the main challenges is that fabio generates names for the metrics targets through templates and that these templates are configurable. A metrics backend which supports labels needs to generate a different name for the same metric than one that does not support labels. Combining this with the need to support multiple different metrics backends at runtime means that none of the existing metrics libraries like go-kit/metrics and hashicorp/go-metrics is a good fit. However, they can be used as drivers which reduces the need for testing and makes integration of new backends simpler since fabio does not need to rely on a single metrics library. This patch is work in progress with the new metrics code in the metrics4 package and the old metrics code in the 'metrics_old' package where it is kept for reference until it has been migrated or removed. The basic architecture should be good enough to add more providers and functionality so that the community can help. Right now there are two providers "flat" and "label" to demonstrate the concepts. They provide counter and timers in statsd/dogstatd format and can be configured with -metrics.target=flat[,label]. Other providers should be added in the same way. The metrics_old code should be converted to the new provider scheme. The go-kit/metrics library currently supports most of the necessary drivers and is the preferred way of integrating new drivers. The list below is a probably incomplete list of things that need to be done: * The configuration is not used except for metrics.target * The drivers from the metrics_old package need to be migrated * Additional drivers from go-kit need to be added * The old rcrowley/go-metrics code relies on 'registries' for namespacing. After the routing table has changed the registry needs to be pruned of services which no longer exist so that go-metrics stops reporting them. * The RX/TX counters in the TCP and TCP+SNI proxy are probably sub-optimal or broken. * Some counters may not have been initialized properly, e.g. WSConn * WSConn should probably be a gauge which also needs to be set to 0 on startup. * The approach of injecting a noop metrics provider if none has been set has already bitten me once since some metrics are reported while others are not. I need to think about this a bit more. Fixes #126 Fixes #165 Fixes #211 Fixes #253 Fixes #326 Fixes #327 Fixes #371 Closes #329 Closes #331 Closes #334 Closes #335
Please see #211 (comment) |
This patch refactors the metrics support within fabio to achieve several goals: * support labels for DogstatD, Prometheus and others * support raw events for statsd and others without aggregation * support multiple backends simultaneously to support migration * make integration of new metrics backends and/or libraries simple * being fully backwards compatible to not break existing setups One of the main challenges is that fabio generates names for the metrics targets through templates and that these templates are configurable. A metrics backend which supports labels needs to generate a different name for the same metric than one that does not support labels. Combining this with the need to support multiple different metrics backends at runtime means that none of the existing metrics libraries like go-kit/metrics and hashicorp/go-metrics is a good fit. However, they can be used as drivers which reduces the need for testing and makes integration of new backends simpler since fabio does not need to rely on a single metrics library. This patch is work in progress with the new metrics code in the metrics4 package and the old metrics code in the 'metrics_old' package where it is kept for reference until it has been migrated or removed. The basic architecture should be good enough to add more providers and functionality so that the community can help. Right now there are two providers "flat" and "label" to demonstrate the concepts. They provide counter and timers in statsd/dogstatd format and can be configured with -metrics.target=flat[,label]. Other providers should be added in the same way. The metrics_old code should be converted to the new provider scheme. The go-kit/metrics library currently supports most of the necessary drivers and is the preferred way of integrating new drivers. The list below is a probably incomplete list of things that need to be done: * The configuration is not used except for metrics.target * The drivers from the metrics_old package need to be migrated * Additional drivers from go-kit need to be added * The old rcrowley/go-metrics code relies on 'registries' for namespacing. After the routing table has changed the registry needs to be pruned of services which no longer exist so that go-metrics stops reporting them. * The RX/TX counters in the TCP and TCP+SNI proxy are probably sub-optimal or broken. * Some counters may not have been initialized properly, e.g. WSConn * WSConn should probably be a gauge which also needs to be set to 0 on startup. * The approach of injecting a noop metrics provider if none has been set has already bitten me once since some metrics are reported while others are not. I need to think about this a bit more. Fixes #126 Fixes #165 Fixes #211 Fixes #253 Fixes #326 Fixes #327 Fixes #371 Closes #329 Closes #331 Closes #334 Closes #335 Remove 'Register' methods and create objects directly. Add Gauge and use it for websocket connections Vendoring in github.com/alexcesaro/statsd First stab at integrating raw statsd from #335 Rebase off master 10/23/2020 - NJ
This patch refactors the metrics support within fabio to achieve several goals: * support labels for DogstatD, Prometheus and others * support raw events for statsd and others without aggregation * support multiple backends simultaneously to support migration * make integration of new metrics backends and/or libraries simple * being fully backwards compatible to not break existing setups One of the main challenges is that fabio generates names for the metrics targets through templates and that these templates are configurable. A metrics backend which supports labels needs to generate a different name for the same metric than one that does not support labels. Combining this with the need to support multiple different metrics backends at runtime means that none of the existing metrics libraries like go-kit/metrics and hashicorp/go-metrics is a good fit. However, they can be used as drivers which reduces the need for testing and makes integration of new backends simpler since fabio does not need to rely on a single metrics library. This patch is work in progress with the new metrics code in the metrics4 package and the old metrics code in the 'metrics_old' package where it is kept for reference until it has been migrated or removed. The basic architecture should be good enough to add more providers and functionality so that the community can help. Right now there are two providers "flat" and "label" to demonstrate the concepts. They provide counter and timers in statsd/dogstatd format and can be configured with -metrics.target=flat[,label]. Other providers should be added in the same way. The metrics_old code should be converted to the new provider scheme. The go-kit/metrics library currently supports most of the necessary drivers and is the preferred way of integrating new drivers. The list below is a probably incomplete list of things that need to be done: * The configuration is not used except for metrics.target * The drivers from the metrics_old package need to be migrated * Additional drivers from go-kit need to be added * The old rcrowley/go-metrics code relies on 'registries' for namespacing. After the routing table has changed the registry needs to be pruned of services which no longer exist so that go-metrics stops reporting them. * The RX/TX counters in the TCP and TCP+SNI proxy are probably sub-optimal or broken. * Some counters may not have been initialized properly, e.g. WSConn * WSConn should probably be a gauge which also needs to be set to 0 on startup. * The approach of injecting a noop metrics provider if none has been set has already bitten me once since some metrics are reported while others are not. I need to think about this a bit more. Fixes #126 Fixes #165 Fixes #211 Fixes #253 Fixes #326 Fixes #327 Fixes #371 Closes #329 Closes #331 Closes #334 Closes #335 Remove 'Register' methods and create objects directly. Add Gauge and use it for websocket connections Vendoring in github.com/alexcesaro/statsd First stab at integrating raw statsd from #335 Rebase off master 10/23/2020 - NJ
This patch refactors the metrics support within fabio to achieve several goals: * support labels for DogstatD, Prometheus and others * support raw events for statsd and others without aggregation * support multiple backends simultaneously to support migration * make integration of new metrics backends and/or libraries simple * being fully backwards compatible to not break existing setups One of the main challenges is that fabio generates names for the metrics targets through templates and that these templates are configurable. A metrics backend which supports labels needs to generate a different name for the same metric than one that does not support labels. Combining this with the need to support multiple different metrics backends at runtime means that none of the existing metrics libraries like go-kit/metrics and hashicorp/go-metrics is a good fit. However, they can be used as drivers which reduces the need for testing and makes integration of new backends simpler since fabio does not need to rely on a single metrics library. This patch is work in progress with the new metrics code in the metrics4 package and the old metrics code in the 'metrics_old' package where it is kept for reference until it has been migrated or removed. The basic architecture should be good enough to add more providers and functionality so that the community can help. Right now there are two providers "flat" and "label" to demonstrate the concepts. They provide counter and timers in statsd/dogstatd format and can be configured with -metrics.target=flat[,label]. Other providers should be added in the same way. The metrics_old code should be converted to the new provider scheme. The go-kit/metrics library currently supports most of the necessary drivers and is the preferred way of integrating new drivers. The list below is a probably incomplete list of things that need to be done: * The configuration is not used except for metrics.target * The drivers from the metrics_old package need to be migrated * Additional drivers from go-kit need to be added * The old rcrowley/go-metrics code relies on 'registries' for namespacing. After the routing table has changed the registry needs to be pruned of services which no longer exist so that go-metrics stops reporting them. * The RX/TX counters in the TCP and TCP+SNI proxy are probably sub-optimal or broken. * Some counters may not have been initialized properly, e.g. WSConn * WSConn should probably be a gauge which also needs to be set to 0 on startup. * The approach of injecting a noop metrics provider if none has been set has already bitten me once since some metrics are reported while others are not. I need to think about this a bit more. Fixes #126 Fixes #165 Fixes #211 Fixes #253 Fixes #326 Fixes #327 Fixes #371 Closes #329 Closes #331 Closes #334 Closes #335 Remove 'Register' methods and create objects directly. Add Gauge and use it for websocket connections Vendoring in github.com/alexcesaro/statsd First stab at integrating raw statsd from #335 Rebase off master 10/23/2020 - NJ
This patch refactors the metrics support within fabio to achieve several goals: * support labels for DogstatD, Prometheus and others * support raw events for statsd and others without aggregation * support multiple backends simultaneously to support migration * make integration of new metrics backends and/or libraries simple * being fully backwards compatible to not break existing setups One of the main challenges is that fabio generates names for the metrics targets through templates and that these templates are configurable. A metrics backend which supports labels needs to generate a different name for the same metric than one that does not support labels. Combining this with the need to support multiple different metrics backends at runtime means that none of the existing metrics libraries like go-kit/metrics and hashicorp/go-metrics is a good fit. However, they can be used as drivers which reduces the need for testing and makes integration of new backends simpler since fabio does not need to rely on a single metrics library. This patch is work in progress with the new metrics code in the metrics4 package and the old metrics code in the 'metrics_old' package where it is kept for reference until it has been migrated or removed. The basic architecture should be good enough to add more providers and functionality so that the community can help. Right now there are two providers "flat" and "label" to demonstrate the concepts. They provide counter and timers in statsd/dogstatd format and can be configured with -metrics.target=flat[,label]. Other providers should be added in the same way. The metrics_old code should be converted to the new provider scheme. The go-kit/metrics library currently supports most of the necessary drivers and is the preferred way of integrating new drivers. The list below is a probably incomplete list of things that need to be done: * The configuration is not used except for metrics.target * The drivers from the metrics_old package need to be migrated * Additional drivers from go-kit need to be added * The old rcrowley/go-metrics code relies on 'registries' for namespacing. After the routing table has changed the registry needs to be pruned of services which no longer exist so that go-metrics stops reporting them. * The RX/TX counters in the TCP and TCP+SNI proxy are probably sub-optimal or broken. * Some counters may not have been initialized properly, e.g. WSConn * WSConn should probably be a gauge which also needs to be set to 0 on startup. * The approach of injecting a noop metrics provider if none has been set has already bitten me once since some metrics are reported while others are not. I need to think about this a bit more. Fixes #126 Fixes #165 Fixes #211 Fixes #253 Fixes #326 Fixes #327 Fixes #371 Closes #329 Closes #331 Closes #334 Closes #335 Remove 'Register' methods and create objects directly. Add Gauge and use it for websocket connections Vendoring in github.com/alexcesaro/statsd First stab at integrating raw statsd from #335 Rebase off master 10/23/2020 - NJ
Aside from the actual broken protocol packets in #207 the implementation of instrumentation with statsd in this project is just really bad. Exporting gauges of pre-aggregated metrics is a serious anti-pattern to typical/normal statsd instrumentation best practices. I'm curious if there is some specific reason why this model was adopted. Is there any appetite for re-instrumenting in a more standard manner?
The text was updated successfully, but these errors were encountered: