backupccl: teach `makeSimpleImportSpans` to create restore spans at a target input size #83139

msbutler · 2022-06-21T18:02:08Z

Currently, makeSimpleImportSpans creates a slice of in-order key spans ([]execinfrapb.RestoreSpanEntry), each of which will get a dedicated range for ingestion. Once filled by RESTORE, these ranges tend to be too small, causing the cluster to issue a series of expensive range merges after the RESTORE. Instead, makeSimpleImportSpans should create a span with a hardcoded target size, like say 512 mb, a default range's worth of data.

To minimize post RESTORE merges (and even splits), RESTORE would ideally ingest a single range's worth of data into each range it creates. Without refactoring restore's distsql processors, this implies makeSimpleImportSpans should create each RestoreSpanEntry such that the amount of data ingested in its corresponding range is bounded by 512 mb, the default range size. Unfortunately, makeSimpleImportSpans can only make a reasonable estimate on a span's input data size based on the size of backup files, which may contain multiple MVCC values for a given key, and not a span's ingested data size, which contains only one MVCC value for a given key. Hence, the target size of each requiredSpan is an estimate of its input data size (aka the amount work), which is strictly less than or equal to the span's ingested size. Note that RESTORE would much rather slightly under-fill a range (merge is only triggered if range is <512/2) than slightly overfill a range (likely to trigger a split); hence, creating an upper bound on a restoreSpanEntry based on input size seems reasonable.

To implement, consider how makeSimpleImportSpans currently partitions the restoring key span. makeSimpleImportSpans does one iteration through the inputed slice of spans that we want to the restore (requiredSpans) and greedily merges these spans into restoreSpanEntrys such that the input spans in a restoreSpanEntry are covered by the same set of backup files.

Instead of obeying the "common backup files" invariant, makeSimpleImportSpan should merge enough input spans together until the estimated input size of the restoreSpanEntry reaches the target input size. To track the input size on a given span, makeSimpleImportSpans should keep a running sum of the logical file size of all backup files associated with that span. Note that a backup file's size may get counted several times across requiredSpans, but this probably wouldn't bias the size estimates too significantly because backup files tend to be quite small. Further, over estimating the amount of data in a span entry would only lead to an under-filled range, which isn't that big of a deal, as mentioned above.

Once this is implemented, consider increasing the target size to be C*512mb, where C could be 2 - 4, and allow the SSTBatcher to split when a range fills (#83144).

Jira issue: CRDB-16886

Epic CRDB-10127

The text was updated successfully, but these errors were encountered:

blathers-crl · 2022-06-21T18:02:11Z

cc @cockroachdb/bulk-io

msbutler · 2022-09-20T14:01:56Z

Fixed by #86496, though by default, the target span size is not used.

Further, David's implementation is slightly different from the implementation outlined in this issue, see comment here.

msbutler added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-disaster-recovery labels Jun 21, 2022

msbutler self-assigned this Jun 21, 2022

blathers-crl bot added the A-disaster-recovery label Jun 21, 2022

This was referenced Jun 21, 2022

backupccl: allow SSTBatcher to split ranges during RESTORE #83144

Closed

backupccl: split within chunks in sstBatcher instead of in split and scatter processor #81774

Closed

msbutler closed this as completed Sep 20, 2022

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backupccl: teach `makeSimpleImportSpans` to create restore spans at a target input size #83139

backupccl: teach `makeSimpleImportSpans` to create restore spans at a target input size #83139

msbutler commented Jun 21, 2022 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jun 21, 2022

msbutler commented Sep 20, 2022

backupccl: teach makeSimpleImportSpans to create restore spans at a target input size #83139

backupccl: teach makeSimpleImportSpans to create restore spans at a target input size #83139

Comments

msbutler commented Jun 21, 2022 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jun 21, 2022

msbutler commented Sep 20, 2022

backupccl: teach `makeSimpleImportSpans` to create restore spans at a target input size #83139

backupccl: teach `makeSimpleImportSpans` to create restore spans at a target input size #83139

msbutler commented Jun 21, 2022 •

edited by exalate-issue-sync bot

Loading