-
Notifications
You must be signed in to change notification settings - Fork 329
Improve performance of stats recording #1265
Comments
Will be looking out for potential optimization points: Switching opencensus-go/stats/view/collector.go Line 69 in 49838f2
string directly using concepts from string.Builder removes a call to slicebytetostring and drops allocs by 6.6%
|
Lines 131 to 135 in 49838f2
On top of this, there is no efficient way to pass in a set of tags. We cannot pass a simple map of tags like the prom library, instead we pass mutators. This means the library must allocate its own map, which it doesn't know the size of data it will store (so we cannot hint the type Map struct {
m map[Key]tagContent
} the prometheus equivalent is Some of these inefficiencies seem to stem from all For example, making a small tweak to the API to allow directly passing in a tag.map cuts alloc count in half and allocates 1/5 as much memory:
code: func RecordWithTagsFast(m *tag.Map, ms ...Measurement) error {
recorder := internal.DefaultRecorder
if recorder == nil {
return nil
}
record := false
for _, m := range ms {
if m.desc.subscribed() {
record = true
break
}
}
if !record {
return nil
}
recorder(m, ms, nil)
return nil
} |
with RecordWithTagsFast:
|
So there is actually a way to reduce allocations if we can precompute tags by using some different functions: b.Run("oc-ctx", func(b *testing.B) {
mLineLengths := stats.Float64("test", "my-benchmark", stats.UnitDimensionless)
key := tag.MustNewKey("key")
v := &view.View{
Measure: mLineLengths,
TagKeys: []tag.Key{key},
Aggregation: view.Sum(),
}
if err := view.Register(v); err != nil {
b.Fatal(err)
}
ctx, err := tag.New(context.Background(), tag.Upsert(key, "val"))
if err != nil {
b.Fatal(err)
}
m := mLineLengths.M(1)
for n := 0; n < b.N; n++ {
stats.Record(ctx, m)
}
}) Compared to the other method, this saves 50% of allocs:
We can improve this more.
drops us from 6->5 allocs (216B -> 136B). We can save one more by computing the
so down to 4. |
Replacing That leaves:
So with all of those tweaks ! |
Given most of this has landed and OC is deprecated I don't think there is anything left to do. OTEL is much better here: https://blog.howardjohn.info/posts/zero-alloc-otel/ |
Is your feature request related to a problem? Please describe.
Yes. Usage of opencensus-go to record metrics has a substantial overhead. In real world application, we have seen OC accounting for well over 10% of our memory allocations (and we generate GBs of protobufs per minute - so it should be negligible). This has led us to doing things we really shouldn't have to think about, like adding a caching layer on top of the library.
Describe the solution you'd like
Improve performance of the library; in particular memory allocations
Describe alternatives you've considered
Adding a caching layer above the library, using a different library.
Additional context
I wrote some benchmarks to compare to the prometheus client. There are two variants, one with a precompute label/tag and one computed in the loops:
Results:
So the prometheus counterpart actually has zero allocs once the label is created. It also is 10x faster. not even considering GC overhead, which is substantial, that means that (with above machine), I can record 1M metrics/s with OC and 10M with prom; of course in the real world the metrics recording should be a tiny portion of the CPU used by the process though.
The text was updated successfully, but these errors were encountered: