Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. #2710

Merged
merged 4 commits into from
Jun 3, 2020

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Jun 3, 2020

Actually do properly the #2546

This is a rebased version of #2603 - It was wrongly merged to chained, already merged PR. + small fix to avoid .String()

This also has to be cherry picked to 0.13

bwplotka added 3 commits June 3, 2020 17:41
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
@bwplotka bwplotka requested review from pracucci, brancz and yeya24 June 3, 2020 16:50
@bwplotka
Copy link
Member Author

bwplotka commented Jun 3, 2020

See benchmarks here: #2603 (comment)

Also we desperatedly needs Query benchmarks (there are in some old forgotten PR...)

Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😹

@@ -81,7 +79,7 @@ func removeExactDuplicates(chks []storepb.AggrChunk) []storepb.AggrChunk {
ret = append(ret, chks[0])

for _, c := range chks[1:] {
if ret[len(ret)-1].String() == c.String() {
if ret[len(ret)-1].Compare(c) == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz solving regression here.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
@bwplotka bwplotka force-pushed the dedup-same-chunks branch from 37af4c2 to 00c71a4 Compare June 3, 2020 18:11
@bwplotka bwplotka merged commit 2000451 into master Jun 3, 2020
@bwplotka bwplotka deleted the dedup-same-chunks branch June 3, 2020 18:28
bwplotka added a commit that referenced this pull request Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
bwplotka added a commit that referenced this pull request Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
brancz pushed a commit that referenced this pull request Jun 4, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710) (#2711)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
paulfantom added a commit to paulfantom/thanos that referenced this pull request Jul 8, 2020
openshift/master

* upstream/release-0.13:
  Cut release v0.13.0
  shipper: Be strict about upload order unless it's specified so & cut v0.13.0-rc.2 (thanos-io#2765)
  Cut 0.13.0 release. (thanos-io#2762)
  Cut release 0.13.0-rc.1 (thanos-io#2720)
  Store: `irate` and `resets` use now counter downsampling aggregations. (thanos-io#2719)
  deps: Updated minio-go dependency to v6.0.56 to add two region endpoints (thanos-io#2705) (thanos-io#2718)
  store/proxy: Deduplicate chunks on StoreAPI level. Recommend chunk sorting for StoreAPI + Optimized iter chunk dedup. (thanos-io#2710) (thanos-io#2711)
  Allow using multiple memcached clients at the same time. (thanos-io#2648) (thanos-io#2698)
  Updated Prometheus as little as possible to include Isolation fix. (thanos-io#2697)
  Release fix attempt2.
  Fixed test job. (thanos-io#2650)
  Fixed promu build to build in compatible directory that crossbuild understands.
  Cut v0.13.0-rc.0 (thanos-io#2628)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants