Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy retrieval of series from object store. #5837

Merged
merged 14 commits into from
Nov 28, 2022

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Oct 28, 2022

The bucket store fetches series in a single blocking operation from object storage. This is likely not an ideal strategy when it comes to latency and resource usage. In addition, it causes the store to buffer everything in memory before starting to send results to queriers.

This commit modifies the series retrieval to use the proxy response heap and take advantage of the k-way merge used in the proxy store.

cc @GiedriusS @bwplotka

Latest benchmarks

name                                                          old time/op    new time/op    delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             57.4ms ±14%    38.1ms ± 1%  -33.58%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            53.7ms ± 1%    38.2ms ± 1%  -28.85%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        521ms ± 3%     431ms ± 1%  -17.17%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           3.80ms ± 1%    4.17ms ± 1%   +9.85%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         3.79ms ± 0%    4.19ms ± 3%  +10.66%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8    49.4ms ± 1%    44.3ms ± 1%  -10.45%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            135µs ± 1%     137µs ± 0%   +1.71%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          134µs ± 1%     137µs ± 1%   +2.05%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    12.1ms ± 0%    12.8ms ± 1%   +5.39%  (p=0.029 n=4+4)

name                                                          old alloc/op   new alloc/op   delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             60.9MB ± 0%    54.3MB ± 0%  -10.82%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            60.9MB ± 0%    54.3MB ± 0%  -10.82%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8       1.29GB ± 0%    0.80GB ± 0%  -38.11%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           4.58MB ± 0%    6.06MB ± 0%  +32.40%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         4.58MB ± 0%    6.06MB ± 0%  +32.39%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     120MB ± 3%      86MB ± 1%  -28.19%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            200kB ± 0%     202kB ± 0%   +0.81%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          200kB ± 0%     202kB ± 0%   +0.87%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    46.8MB ± 0%    46.8MB ± 0%     ~     (p=0.057 n=4+4)

name                                                          old allocs/op  new allocs/op  delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8              9.61k ± 0%     4.65k ± 0%  -51.61%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8             9.71k ± 0%     4.76k ± 0%  -51.03%  (p=0.029 n=4+4)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        10.0M ± 0%     11.0M ± 0%   +9.79%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/1of10000000-8            1.08k ± 0%     0.71k ± 0%  -34.19%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/100of10000000-8          1.11k ± 0%     0.75k ± 0%  -32.88%  (p=0.029 n=4+4)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     1.00M ± 0%     1.10M ± 0%   +9.80%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8              282 ± 0%       293 ± 0%   +3.90%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8            282 ± 0%       293 ± 0%   +3.90%  (p=0.029 n=4+4)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8      168k ± 0%      168k ± 0%   +0.03%  (p=0.029 n=4+4)

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
@fpetkovski fpetkovski force-pushed the lazy-bucket-store branch 2 times, most recently from 32320fa to 2257389 Compare October 28, 2022 11:26
@fpetkovski
Copy link
Contributor Author

fpetkovski commented Oct 29, 2022

Some benchmarks with batched retrieval. Seems like for retrieving 1 series things are slower, but after a certain threshold (~100 series) things start to get better.

name                                                          old time/op    new time/op    delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             54.5ms ± 2%    54.5ms ± 2%     ~     (p=1.000 n=5+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            54.3ms ± 2%    54.6ms ± 2%     ~     (p=0.548 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        510ms ± 1%     431ms ± 2%  -15.45%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           3.91ms ± 0%    3.86ms ± 1%   -1.28%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         3.90ms ± 1%    3.85ms ± 1%   -1.28%  (p=0.032 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8    50.3ms ± 2%    41.6ms ± 1%  -17.29%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            134µs ± 1%     154µs ± 2%  +14.26%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          135µs ± 1%     151µs ± 1%  +11.27%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    12.2ms ± 1%    12.2ms ± 1%     ~     (p=0.421 n=5+5)

name                                                          old alloc/op   new alloc/op   delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8             60.9MB ± 0%    61.0MB ± 0%   +0.23%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8            60.9MB ± 0%    61.0MB ± 0%   +0.24%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8       1.29GB ± 0%    0.89GB ± 0%  -30.90%  (p=0.016 n=4+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8           4.58MB ± 0%    4.67MB ± 0%   +1.95%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8         4.58MB ± 0%    4.67MB ± 0%   +1.94%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     120MB ± 2%      84MB ± 1%  -30.20%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8            200kB ± 0%     277kB ± 0%  +38.16%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8          200kB ± 0%     276kB ± 0%  +38.01%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8    46.8MB ± 0%    47.1MB ± 0%   +0.62%  (p=0.008 n=5+5)

name                                                          old allocs/op  new allocs/op  delta
BucketSeries/1000000SeriesWith1Samples/1of1000000-8              9.63k ± 0%    11.36k ± 0%  +17.99%  (p=0.016 n=4+5)
BucketSeries/1000000SeriesWith1Samples/10of1000000-8             9.71k ± 0%    11.49k ± 0%  +18.35%  (p=0.008 n=5+5)
BucketSeries/1000000SeriesWith1Samples/1000000of1000000-8        10.0M ± 0%     12.1M ± 0%  +20.20%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/1of10000000-8            1.08k ± 0%     1.25k ± 0%  +15.90%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/100of10000000-8          1.11k ± 0%     1.29k ± 0%  +16.08%  (p=0.008 n=5+5)
BucketSeries/100000SeriesWith100Samples/10000000of10000000-8     1.00M ± 0%     1.21M ± 0%  +20.15%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/1of10000000-8              282 ± 0%       286 ± 0%   +1.42%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/100of10000000-8            282 ± 0%       286 ± 0%   +1.42%  (p=0.008 n=5+5)
BucketSeries/1SeriesWith10000000Samples/10000000of10000000-8      168k ± 0%      168k ± 0%   +0.02%  (p=0.008 n=5+5)

@fpetkovski fpetkovski force-pushed the lazy-bucket-store branch 4 times, most recently from 12f2314 to 11c03fa Compare October 31, 2022 07:11
pkg/store/bucket.go Outdated Show resolved Hide resolved
@fpetkovski fpetkovski force-pushed the lazy-bucket-store branch 2 times, most recently from 9f4f13a to 184f695 Compare October 31, 2022 10:11
@fpetkovski
Copy link
Contributor Author

fpetkovski commented Oct 31, 2022

This should now be ready for review. I've added the latest benchmark results to the description of the PR.

I don't think the benchmark is realistic though, I feel it overestimates the performance of the current implementation, and underestimates the performance of lazy retrieval. The first reason for this is because the benchmark uses an in-memory dataset so chunk retrieval is instantaneous. The second one is because it buffers all received series, so their memory cannot be released when they are sent over a gRPC channel. I would expect performance in a realistic scenario to be better than what is predicted.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, some initial thoughts. Thanks!

.gitignore Outdated Show resolved Hide resolved
@@ -129,6 +130,9 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
Default("32").IntVar(&sc.blockMetaFetchConcurrency)

cmd.Flag("debug.series-batch-size", "The batch size when fetching series from object storage.").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention what happens if batch size is too small or too large?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure this is usable - if you have 2000 blocks you have 10000 * 2000 batch size kind of, and when store has one block it has 10000... Should we have global to request batch size? (otherwise name and help should change)

Copy link
Contributor Author

@fpetkovski fpetkovski Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch size is per block, and each block is retrieved concurrently through a separate goroutine. So the number of blocks should not really matter. We can also try to infer the batch size based on the number of total expanded postings.

cmd/thanos/store.go Outdated Show resolved Hide resolved
@@ -276,6 +282,11 @@ func newBucketStoreMetrics(reg prometheus.Registerer) *bucketStoreMetrics {
Help: "Total number of empty postings when fetching block series.",
})

m.emptyStreamResponses = promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "thanos_bucket_store_empty_stream_responses_total",
Help: "Total number of empty responses received.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

received from were?

pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
b.entries = b.entries[:0]
b.batch = b.batch[:0]

b.indexr.reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What reset is doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It clears the loaded series map inside the reader. I don't know if it is necessary, that part of the codebase is still cryptic to me.

pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
yeya24
yeya24 previously approved these changes Nov 3, 2022
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Some small nits only

cmd/thanos/store.go Show resolved Hide resolved
pkg/store/bucket.go Outdated Show resolved Hide resolved
pkg/store/bucket.go Show resolved Hide resolved
@fpetkovski fpetkovski force-pushed the lazy-bucket-store branch 6 times, most recently from ed7d240 to f4550c7 Compare November 6, 2022 12:45
@GiedriusS
Copy link
Member

Have you tried this with "real" data? Is there any noticeable improvement? 🤔

@fpetkovski
Copy link
Contributor Author

fpetkovski commented Nov 9, 2022

@GiedriusS In a staging environment a 30d range query took ~20-30% less, so from 14s to 10s. I would like to double check memory usage this week in a prod-like environment to make sure there are no leaks or regressions.

Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
pkg/store/bucket_test.go Outdated Show resolved Hide resolved
Signed-off-by: Filip Petkovski <[email protected]>
yeya24
yeya24 previously approved these changes Nov 13, 2022
chunkFetchDuration prometheus.Histogram

// Internal state.
i int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use uint64 here to not accidentally overflow the counter?

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit. I tested this and it gave me a few percent improvements in query durations however this might be because remote object storage here has latency on the order of milliseconds and it is very fast i.e. for us the bottleneck is elsewhere.

Signed-off-by: Filip Petkovski <[email protected]>
@fpetkovski
Copy link
Contributor Author

I noticed that expanding postings and reading the index is also a non-trivial amount of work, so maybe that is also a significant factor.

@yeya24
Copy link
Contributor

yeya24 commented Nov 26, 2022

Let's fix the conflict and I think this pr is good to go?

Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: fpetkovski <[email protected]>
@fpetkovski
Copy link
Contributor Author

Conflicts should be fixed now.

@yeya24
Copy link
Contributor

yeya24 commented Nov 28, 2022

Thanks!

@yeya24 yeya24 enabled auto-merge (squash) November 28, 2022 07:00
@yeya24 yeya24 merged commit 39fa005 into thanos-io:main Nov 28, 2022
defer span.Finish()

shardMatcher := req.ShardInfo.Matcher(&s.buffers)
defer shardMatcher.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we close the matcher properly in this pr? I didn't see where we close it after the refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response set should close it here: https://github.com/thanos-io/thanos/blob/main/pkg/store/proxy_heap.go#L590

Are you seeing any issues with this change?

ngraham20 pushed a commit to ngraham20/thanos that referenced this pull request May 18, 2023
* Implement lazy retrieval of series from object store.

The bucket store fetches series in a single blocking operation from
object storage. This is likely not an ideal strategy when it comes to
latency and resource usage. In addition, it causes the store to buffer
everything in memory before starting to send results to queriers.

This commit modifies the series retrieval to use the proxy response heap
and take advantage of the k-way merge used in the proxy store.

Signed-off-by: Filip Petkovski <[email protected]>

* Add batching

Signed-off-by: Filip Petkovski <[email protected]>

* Preload series in batches

Signed-off-by: Filip Petkovski <[email protected]>

* Emit proper stats

Signed-off-by: Filip Petkovski <[email protected]>

* Extract block series client

Signed-off-by: Filip Petkovski <[email protected]>

* Fix CI

Signed-off-by: Filip Petkovski <[email protected]>

* Address review comments

Signed-off-by: Filip Petkovski <[email protected]>

* Use emptyPostingsCount in lazyRespSet

Signed-off-by: Filip Petkovski <[email protected]>

* Reuse chunk metas

Signed-off-by: Filip Petkovski <[email protected]>

* Avoid overallocating for small responses

Signed-off-by: Filip Petkovski <[email protected]>

* Add metric for chunk fetch time

Signed-off-by: Filip Petkovski <[email protected]>

* Regroup imports

Signed-off-by: Filip Petkovski <[email protected]>

* Change counter to uint64

Signed-off-by: Filip Petkovski <[email protected]>

Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: fpetkovski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants