Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: split within chunks in sstBatcher instead of in split and scatter processor #81774

Closed
msbutler opened this issue May 24, 2022 · 2 comments
Assignees
Labels
A-disaster-recovery C-performance Perf of queries or internals. Solution not expected to change functional behavior. sync-me sync-me-5 T-disaster-recovery

Comments

@msbutler
Copy link
Collaborator

msbutler commented May 24, 2022

Currently, Restore divvies up work across the cluster in the following manner:

  • The coordinator node divides the restore spans to ingest into “chunks”. It doesn’t split/scatter anything.
  • Each split and scatter processor then:
    • Splits each chunk, and scatters it to find a chunkDestination
    • Iterates over the restore span entries and splits each entry (we do not scatter these)
    • Route each entry to its chunkDestination for ingestion

The second set of splits seem to cause many range merges to occur after the Restore completes, suggesting that we over split the restoring spans. To prevent these range merges, we should remove the second set of splits in the split and scatter processor, and let the SSTBatcher split the chunk as it writes SSTs, as it does for IMPORT currently.

Jira issue: CRDB-16444

@msbutler msbutler added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-disaster-recovery labels May 24, 2022
@blathers-crl
Copy link

blathers-crl bot commented May 24, 2022

cc @cockroachdb/bulk-io

@msbutler
Copy link
Collaborator Author

closing until after #83139 and #83144 are merged and evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-performance Perf of queries or internals. Solution not expected to change functional behavior. sync-me sync-me-5 T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants