-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/prometheusremotewrite] Hash labels using xxhash for performance #31385
Conversation
7803f75
to
b689a3a
Compare
a8cf050
to
6e42467
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aknuds1 The new API is definitely an improvement over previous API (simple FromMetrics). Tested with cortex and it works. I have no complains there. But it needs more tests in my IMHO
Thanos is implementing OTLP using the same thanos-io/thanos#7227
I believe is compatible too.
func FromMetrics(md pmetric.Metrics, settings Settings) (map[string]*prompb.TimeSeries, error) { | ||
c := NewPrometheusConverter() | ||
errs := c.FromMetrics(md, settings) | ||
tss := c.TimeSeries() | ||
out := make(map[string]*prompb.TimeSeries, len(tss)) | ||
for i := range tss { | ||
out[strconv.Itoa(i)] = &tss[i] | ||
} | ||
|
||
return out, errs | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not covered by benchmarks or unit tests and I believe it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted BenchmarkFromMetrics
to use FromMetrics
, and made a new one for PrometheusConverter
: BenchmarkPrometheusConverter_FromMetrics
. Is this enough? I notice that the existing test suite already doesn't call FromMetrics
(just the benchmarks).
for i, v := range labels { | ||
if len(b)+len(v.Name)+len(v.Value)+2 >= cap(b) { | ||
// If labels entry is 1KB+ do not allocate whole entry. | ||
h := xxhash.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is not covered by tests, and it should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
} | ||
|
||
// New conflict | ||
ts = &prompb.TimeSeries{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is not tested by unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
Thanks for the review @friedrichg! |
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Yes, please. I think there are a few interested parties presently working on things that would be (likely positively) impacted by this API change, including @friedrichg and @jmichalek132. Let's make sure we're all on the same page regarding what we're expecting from this API and whether the proposed change meets everyone's needs before we move forward with it. |
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
@Aneurysm9 I pushed a revision where the new API is dropped, and improvements are hidden behind |
Co-authored-by: Anthony Mirabella <[email protected]>
Co-authored-by: Anthony Mirabella <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
CI failure looks like a flake. |
Description:
Improve performance in pkg/translator/prometheusremotewrite by using the same xxhash based time series signature algorithm as Prometheus itself (
labels.StableHash()
, which is guaranteed to not change over time). I became aware of time series signature calculation being a bit of a bottleneck when profiling Grafana Mimir's OTLP endpoint.This change involves moving from a string hash, to a
uint64
one. My thesis is that if Prometheus uses this algorithm to identify time series/label sets, it should be suitable for this translation logic too.Hash collisions
I've attempted to handle hash collisions in the same way as Prometheus:
PrometheusConverter
has aunique
field, which is its mainmap
from hash toTimeSeries
, as well as aconflicts
field, being its secondarymap
from hash to a slice ofTimeSeries
in case of hash collisions. If a label set should hash to an existing entry inunique
, but not be equal to the existing entry's label set, the label set is attempted matched to theconflicts
map
instead. If its equal is not found among the conflicts for the hash in question either, it's added to the conflicts slice (for the hash).Link to tracking Issue:
Testing:
I've run
make test
/make lint
and run theBenchmarkFromMetrics
benchmark. Benchmark stats included below, they show an average speedup of 3.68% and an average memory reduction of 17.13%.NB: The benchmark stats reveal performance regressions in a few cases, because of using the
prometheusConverter
API via theFromMetrics
function.Benchmarks
Benchmark stats
Documentation: