-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming engine: Native histogram aggregations #8360
Streaming engine: Native histogram aggregations #8360
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thanks for working on this.
Could you please run the existing sum
benchmarks before and after this change and include the results here? I'm particularly interested in the impact of the introduction of floatPointCount
.
(edit: I do not trust these benchmarks, and they likely need re-doing) Benchmark of this branch vs main:
Benchmark from this branch comparing the two engines:
|
for _, p := range s.Histograms { | ||
idx := (p.T - start) / interval | ||
if thisSeriesGroup.histogramSums[idx] == nil { | ||
// We copy here because we modify the histogram through Add later on. | ||
// It is necessary to preserve the original Histogram in case of any range-queries using lookback. | ||
thisSeriesGroup.histogramSums[idx] = p.H.Copy() | ||
thisSeriesGroup.histogramPointCount++ | ||
} else { | ||
thisSeriesGroup.histogramSums[idx] = thisSeriesGroup.histogramSums[idx].Add(p.H) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we do something like this? Then we only pay the price of copying the histogram when we need to, which should only be the case when lookback is involved.
var lastUncopiedHistogram *histogram.FloatHistogram
for _, p := range s.Histograms {
idx := (p.T - start) / interval
if thisSeriesGroup.histogramSums[idx] != nil {
thisSeriesGroup.histogramSums[idx] = thisSeriesGroup.histogramSums[idx].Add(p.H)
} else if lastUncopiedHistogram == p.H {
// We've already used this histogram for a previous point due to lookback.
// Make a copy of it so we don't modify the other point.
thisSeriesGroup.histogramSums[idx] = p.H.Copy()
thisSeriesGroup.histogramPointCount++
} else {
// This is the first time we have seen this histogram.
// It is safe to store it and modify it later without copying, as we'll make copies above if the same histogram is used for subsequent points.
thisSeriesGroup.histogramSums[idx] = p.H
thisSeriesGroup.histogramPointCount++
lastUncopiedHistogram = p.H
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will work.
Because after the first time we have a point at this idx
we add it to the histogram, if we see the histogram again (due to a lookback), it will have already been modified. So we can't just copy it at that point as it's already wrong. (Even if it was right, it would overwrite our summation, but we could work around that but that's not what's happening here).
To be sure, I quickly tested this and confirmed it does not pass the tests.
The challenge is that we need to be sure ahead of time that we won't see the same histogram again. Otherwise we can't confidentially modify it (or in other words we can't confidentially add to it). I'm not sure there is a sensible way to do this without implementing our own lookback. Even then, the challenge of keeping track of it would likely outweigh just making a copy at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something sorry - what will modify p.H
while we're iterating through a single series' samples and therefore modify a histogram before we've had a chance to copy it?
I'm imagining we have two input series with histogram samples like this, including some instances of lookback:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Series 1 | h1 |
h1 |
(nothing) | h2 |
Series 2 | h3 |
h3 |
h4 |
h4 |
After we've processed series 1, with the change I suggested, then histogramSums
will be:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Output | h1 |
Copy of h1 |
(nothing) | h2 |
And then after series 2:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Output | h1.Add(h3) |
(Copy of h1).Add(h3) |
h4 |
h2.Add(h4) |
At no point do we mutate a histogram from a series before we've finished processing that series. For example, we only mutate h1
when we start processing t=0 for series 2, but we've already copied it for t=1 by that point.
The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase |
(we may want to consider changing this line now that we've cut a release mid-way through development).
for _, p := range s.Histograms { | ||
idx := (p.T - start) / interval | ||
if thisSeriesGroup.histogramSums[idx] == nil { | ||
// We copy here because we modify the histogram through Add later on. | ||
// It is necessary to preserve the original Histogram in case of any range-queries using lookback. | ||
thisSeriesGroup.histogramSums[idx] = p.H.Copy() | ||
thisSeriesGroup.histogramPointCount++ | ||
} else { | ||
thisSeriesGroup.histogramSums[idx] = thisSeriesGroup.histogramSums[idx].Add(p.H) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something sorry - what will modify p.H
while we're iterating through a single series' samples and therefore modify a histogram before we've had a chance to copy it?
I'm imagining we have two input series with histogram samples like this, including some instances of lookback:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Series 1 | h1 |
h1 |
(nothing) | h2 |
Series 2 | h3 |
h3 |
h4 |
h4 |
After we've processed series 1, with the change I suggested, then histogramSums
will be:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Output | h1 |
Copy of h1 |
(nothing) | h2 |
And then after series 2:
t=0 | t=1 | t=2 | t=3 | |
---|---|---|---|---|
Output | h1.Add(h3) |
(Copy of h1).Add(h3) |
h4 |
h2.Add(h4) |
At no point do we mutate a histogram from a series before we've finished processing that series. For example, we only mutate h1
when we start processing t=0 for series 2, but we've already copied it for t=1 by that point.
Co-authored-by: Charles Korn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.