Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor metrics #476

Merged
merged 8 commits into from
Apr 5, 2022
Merged

Refactor metrics #476

merged 8 commits into from
Apr 5, 2022

Conversation

magiconair
Copy link
Contributor

This patch refactors the metrics support within fabio to
achieve several goals:

  • support labels for DogstatD, Prometheus and others
  • support raw events for statsd and others without aggregation
  • support multiple backends simultaneously to support migration
  • make integration of new metrics backends and/or libraries simple
  • being fully backwards compatible to not break existing setups

One of the main challenges is that fabio generates names for the metrics
targets through templates and that these templates are configurable. A
metrics backend which supports labels needs to generate a different name
for the same metric than one that does not support labels.

Combining this with the need to support multiple different metrics
backends at runtime means that none of the existing metrics libraries
like go-kit/metrics and hashicorp/go-metrics is a good fit. However,
they can be used as drivers which reduces the need for testing and makes
integration of new backends simpler since fabio does not need to rely on
a single metrics library.

This patch is work in progress with the new metrics code in the metrics4
package and the old metrics code in the 'metrics_old' package where it
is kept for reference until it has been migrated or removed. The basic
architecture should be good enough to add more providers and
functionality so that the community can help.

Right now there are two providers "flat" and "label" to demonstrate the
concepts. They provide counter and timers in statsd/dogstatd format and
can be configured with -metrics.target=flat[,label]. Other providers
should be added in the same way.

The metrics_old code should be converted to the new provider scheme.

The go-kit/metrics library currently supports most of the necessary
drivers and is the preferred way of integrating new drivers.

The list below is a probably incomplete list of things that need to be
done:

  • The configuration is not used except for metrics.target
  • The drivers from the metrics_old package need to be migrated
  • Additional drivers from go-kit need to be added
  • The old rcrowley/go-metrics code relies on 'registries' for
    namespacing. After the routing table has changed the registry needs
    to be pruned of services which no longer exist so that go-metrics
    stops reporting them.
  • The RX/TX counters in the TCP and TCP+SNI proxy are probably
    sub-optimal or broken.
  • Some counters may not have been initialized properly, e.g. WSConn
  • WSConn should probably be a gauge which also needs to be set to 0 on
    startup.
  • The approach of injecting a noop metrics provider if none has been
    set has already bitten me once since some metrics are reported while
    others are not. I need to think about this a bit more.

Fixes #126
Fixes #165
Fixes #211
Fixes #253
Fixes #326
Fixes #327
Fixes #371
Closes #329
Closes #331
Closes #334
Closes #335

@magiconair magiconair added this to the 1.6 milestone Mar 24, 2018
@magiconair
Copy link
Contributor Author

As for approach please see #211 (comment)

@magiconair
Copy link
Contributor Author

Lets first try to add one go-kit/metrics driver to verify the approach and then we can spread the work to support more.

@magiconair
Copy link
Contributor Author

To test this I'm currently doing this:

# terminal 1
consul agent -dev

# terminal 2
cd $GOPATH/src/github.com/fabiolb/fabio/demo/server
go build
./server -prefix /foo

# terminal 3
cd $GOPATH/src/github.com/fabiolb/fabio
go build && ./fabio -metrics.target=flat,label

# terminal 4
curl -i localhost:9999/foo

@magiconair
Copy link
Contributor Author

I'm wondering whether https://opencensus.io/go.html might be a better foundation. Does someone have some experience with this?

@chenjpu
Copy link

chenjpu commented Sep 25, 2018

+1

@pvandervelde
Copy link

@magiconair Any progress on this. Would still love to have some decent metrics from Fabio.

@magiconair
Copy link
Contributor Author

Yeah, I know. I was switching jobs and moving my family to a different country. Lots of changes, new language and all. Also, fabio needs a bigger community of contributors since I have become a bottleneck. I am starting to catch my breath again and get some energy on working on fabio again. However, this project is now way more than what I've ever needed to build. As much as I like programming I'll never be able to keep up with the demand by myself.

The metrics code is still bugging me and I'd like to solve this properly but I could really use some help here. If someone wants to work with me on this we could finally tick this one off.

@ketzacoatl
Copy link

Maintaining a project like this can be such an under-appreciated, and often times thankless investment, so thanks for letting us know what you've got going on and what help you need. Good luck on the move and all the transitions that has with it.

As far as getting metrics incorporated.. I won't be much help with go, but I would be happy to make myself available for testing metrics in an updated fabio. It'd be most easy for me to test with prometheus, but I can help with others too.

@blalor
Copy link

blalor commented Dec 13, 2018

@magiconair what help do you need to get this branch where you're comfortable with merging it?

@aaronhurt aaronhurt mentioned this pull request Jan 27, 2020
@aaronhurt
Copy link
Member

@blalor @ketzacoatl Frank has handed maintenance of the project over to myself and others here at ENA. I'd love to continue the work here on metrics if either of you are available for testing and/or code contribution.

@ketzacoatl
Copy link

I'd be more than happy to help with testing. I'd be limited in golang, but happy to participate in pushing this one thru!

@aaronhurt
Copy link
Member

@ketzacoatl Thank you. I've gotten a response from @maximka777 in #586 ... if he's interested in taking up the torch we will definitely need independent testing.

@nathanejohnson
Copy link
Member

I'm planning to pick this back up. I just squashed Frank's commits and rebased off master, it builds and tests pass. I'm going to see if I can at least get a prometheus and statsd integration going with go-kit

@nathanejohnson
Copy link
Member

nathanejohnson commented Nov 5, 2020

so I have pushed a pretty major update here. Currently all previously supported backends have been implemented, though there will definitely be changes required for users of statsd and likely graphite as well. the statsd implementation was rather broken - counters never reset which is a violation of how counters are supposed to work in statsd. Graphite changed because the new implementation is based off the go-kit graphite adapter for their metrics framework. The main changes are around histograms, since they use a different backend package. This is unfortunate, however this gets us to the positives: namely, tag support for backends that can support them, and a way forward for backends that don't support tags.

Prometheus has been added, in order to use it you need to configure a listener with proto=prometheus . this lets you configure it like any other listeners, with TLS support or not, etc.

I would love some testers, and assuming people can suffer through the graphite / statsd pains changes, I'd like to get this slated for a 1.6 release sometime in the not too distant future.

I would really appreciate some help in testing the stats backends. This is still very much a work in progress

@ketzacoatl
Copy link

Thank you for all your work on this @nathanejohnson!

I would love some testers, and assuming people can suffer through the graphite / statsd pains changes, I'd like to get this slated for a 1.6 release sometime in the not too distant future.

I would really appreciate some help in testing the stats backends. This is still very much a work in progress

Should we build the branch from source to test, or is there a build artifact somewhere we should pull or use?

I haven't been running the statsd backend, but I would be happy to test that out now (with datadog).

magiconair and others added 6 commits December 1, 2020 16:45
This patch refactors the metrics support within fabio to
achieve several goals:

 * support labels for DogstatD, Prometheus and others
 * support raw events for statsd and others without aggregation
 * support multiple backends simultaneously to support migration
 * make integration of new metrics backends and/or libraries simple
 * being fully backwards compatible to not break existing setups

One of the main challenges is that fabio generates names for the metrics
targets through templates and that these templates are configurable. A
metrics backend which supports labels needs to generate a different name
for the same metric than one that does not support labels.

Combining this with the need to support multiple different metrics
backends at runtime means that none of the existing metrics libraries
like go-kit/metrics and hashicorp/go-metrics is a good fit. However,
they can be used as drivers which reduces the need for testing and makes
integration of new backends simpler since fabio does not need to rely on
a single metrics library.

This patch is work in progress with the new metrics code in the metrics4
package and the old metrics code in the 'metrics_old' package where it
is kept for reference until it has been migrated or removed. The basic
architecture should be good enough to add more providers and
functionality so that the community can help.

Right now there are two providers "flat" and "label" to demonstrate the
concepts. They provide counter and timers in statsd/dogstatd format and
can be configured with -metrics.target=flat[,label]. Other providers
should be added in the same way.

The metrics_old code should be converted to the new provider scheme.

The go-kit/metrics library currently supports most of the necessary
drivers and is the preferred way of integrating new drivers.

The list below is a probably incomplete list of things that need to be
done:

 * The configuration is not used except for metrics.target
 * The drivers from the metrics_old package need to be migrated
 * Additional drivers from go-kit need to be added
 * The old rcrowley/go-metrics code relies on 'registries' for
   namespacing. After the routing table has changed the registry needs
   to be pruned of services which no longer exist so that go-metrics
   stops reporting them.
 * The RX/TX counters in the TCP and TCP+SNI proxy are probably
   sub-optimal or broken.
 * Some counters may not have been initialized properly, e.g. WSConn
 * WSConn should probably be a gauge which also needs to be set to 0 on
   startup.
 * The approach of injecting a noop metrics provider if none has been
   set has already bitten me once since some metrics are reported while
   others are not. I need to think about this a bit more.

Fixes #126
Fixes #165
Fixes #211
Fixes #253
Fixes #326
Fixes #327
Fixes #371
Closes #329
Closes #331
Closes #334
Closes #335

Remove 'Register' methods and create objects directly.

Add Gauge and use it for websocket connections

Vendoring in github.com/alexcesaro/statsd

First stab at integrating raw statsd from #335

Rebase off master 10/23/2020 - NJ
Currently all previously supported backends have been implemented, though
there will definitely be changes required for users of statsd and likely
graphite as well.  the statsd implementation was rather broken - counters
never reset which is a violation of how counters are supposed to work in statsd.
Graphite changed because the new implementation is based off the go-kit
graphite adapter for their metrics framework.  The main changes are around
histograms, since they use a different backend package.  This is unfortunate,
however this gets us to the positives: namely, tag support for backends that
can support them, and a way forward for backends that don't support tags.

Prometheus support has been added with full tag support.

oldmetrics has been deleted

metrics4 package renamed to metrics

adding metrics names_test.go

Reworked stats so that we don't register duplicates.

This pisses prometheus right off

adding statsd provider unit test

added circonus

tentatively have graphite going now, deleted old metrics
@nathanejohnson
Copy link
Member

@ketzacoatl so I just pushed a new change to this branch that should support tagging with dogstatsd. I don't have an artifact just now that you can get to easily, but if you just checkout the metrics4 branch and issue a make build that should get you a working binary. If you'd like something prebuilt that's fine too, let me know GOOS / GOARCH and I can put something up somewhere for you to grab. It would be nice to do a beta release, but I'm not sure yet the best way to do it without cluttering up the releases page on github.

@ketzacoatl
Copy link

ketzacoatl commented Dec 5, 2020

so I just pushed a new change to this branch that should support tagging with dogstatsd.

Thanks for that @nathanejohnson!

I don't have an artifact just now that you can get to easily, but if you just checkout the metrics4 branch and issue a make build that should get you a working binary. If you'd like something prebuilt that's fine too, let me know GOOS / GOARCH and I can put something up somewhere for you to grab.

I don't have a golang env setup (though I have set it up in the past), so if you could build one for linux/amd64, that would be grand. I can figure it out if that's not easy to do.

It would be nice to do a beta release, but I'm not sure yet the best way to do it without cluttering up the releases page on github.

Maybe a RC can be released?

@nathanejohnson
Copy link
Member

https://drive.google.com/drive/folders/1lMsA1Q3PFi_xXOvV-ZUNR9GPI4EVFXv8

This is going to be a "private" beta for now, though anyone with the link can download the files. I did darwin and linux in amd64, I can add others if there is interest.

@ketzacoatl
Copy link

ketzacoatl commented Jan 28, 2021

Sorry, I've been busy. Testing this on my todo list though.

@nathanejohnson nathanejohnson merged commit 0633132 into master Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants