-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storegateway: Merge series concurrently #7456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes make sense, 13% is quite a lot for such a small change on such a high-level benchmark
522d7f0
to
821383b
Compare
pkg/storegateway/series_refs_test.go
Outdated
batch := make([]iterator[seriesChunkRefsSet], len(iterators)) | ||
for i := 0; i < numIterators; i++ { | ||
if withIO { | ||
// Delay only 2% of iterations to mimic IO ops, happening during real set iterations. | ||
batch[i] = newDelayedMaybeIterator(2, time.Millisecond, iterators[i]) | ||
} else { | ||
batch[i] = iterators[i] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the chunk of the diff, that was actually changed here. I.e. for the withIO
dimension we wrap the iterators with a delayedIterator
, to add some amount of IO to the iterations
time.Sleep(s.delay) | ||
func (s *delayedIterator[S]) delayMaybe() { | ||
if s.chance == 0 || rand.Intn(100) <= s.chance { | ||
time.Sleep(s.delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this randomness make the benchmarks less reproducible/consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, no. The chance of the delay is the same of all b.N
(e.g. "5% of iterations in the benchmark sleep"). I'm not sure, but statistically this should be fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the total run of these benchmarks looks to be at most 10ms. I think missing one or a few of these time.Sleep
can make a significant difference (I consider +/- 10% to be significant).
Would it be better to set the delay to be constant on all Next()
invocations, but reduce the duration? So if we are aiming for 1ms sleep on 5% of invocations, then this is statistically the same time slept as 50µs sleep on every invocation. In this case on average the time to sleep is the same in both cases, but the average, best, and worst cases are the same when we sleep a consistent amount on every invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in DM, my initial idea was to avoid having IO (represented by a sleep
) on every call to underlying's Iterator.Next()
. This shouldn't have been the goal in the first place, and it only makes the changes confusing.
I update this chunk to use a fixed delay. The results of these benchmarks are still the same (updated the description of the PR, also).
6824c24
to
5e946f2
Compare
The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase |
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
5e946f2
to
7a5b628
Compare
What this PR does
These changes are based on the observation, that
mergedSeriesChunkRefsSet
only fetches series from its underlying set iterators, when either of its cursors doesn't have items to read. Only when both cursors are empty, it makes sense to pre-populate items for them concurrently. Otherwise, the additional cost for the goroutine switch will outweigh the cost of two calls toensureItemAvailableToRead
.Below are the benchmarks that show the effect of these changes:
BenchmarkBucketStore_Series
Note that
BenchmarkBucketStore_Series
involve lots of work, on top of one being done perBucketStore.Series
. That is, the benchmarks literally call gRPC over the local network, thus they show a mix of results from the client's and server's POV.Also, I update
BenchmarkMergedSeriesChunkRefsSetIterators
benchmarks, adding one extra dimension: without and with IO in the set iterators. This allows to compare the results of the changes in both pure-CPU-bound and IO-bound benchmarks. That is, the CPU-bound workload was very snappy before. But this might be misleading, since the real workload, always include some amount of IO.BenchmarkMergedSeriesChunkRefsSetIterators
Which issue(s) this PR fixes or relates to
Fixes #4596
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.