Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.1: backupccl: reduce over-splitting during RESTORE #91141

Merged
merged 3 commits into from
Nov 21, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Nov 2, 2022

Backport 2/2 commits from #86496.

/cc @cockroachdb/release


When constructing the spans to restore, we can produce fewer, wider spans
that each contain more files to avoid splitting as much.

Previously we split at the bounds of every file in the base backup, then
added incremental layer files to any span they overlapped. This changes
that slightly to keep appending files that would create a new span at the
right-hand-side of the span covering to instead extend current rightmost
entry in the covering if the total size of what has been added to it is
still below our target size per span (384mb).

Release justification: bug fix.

Release note (bug fix): reduce the amount that RESTORE over-splits ranges. The
behavior is enabled by default.

@adityamaru adityamaru requested review from dt and msbutler November 2, 2022 17:17
@blathers-crl
Copy link

blathers-crl bot commented Nov 2, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru
Copy link
Contributor Author

adityamaru commented Nov 2, 2022

Letting CI shake out any issues, because this was not a clean backport. ahh also i'll need to cherry pick the commit that makes it a setting.

@shermanCRL shermanCRL requested a review from baoalvin1 November 7, 2022 18:54
@adityamaru adityamaru marked this pull request as ready for review November 9, 2022 17:35
@adityamaru adityamaru requested review from a team as code owners November 9, 2022 17:35
@adityamaru
Copy link
Contributor Author

I'm now going to run some roachtest/restores with this diff to ensure it works as expected.

Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

Either in this PR or elsewhere, we should add a roachtest that restores from TPCCInc from a random incremental layer ,with target span size set to 384. This will provide extra coverage that this setting performs well on a variety of restore span configurations.

@msbutler
Copy link
Collaborator

^i'm also considering adding a backportable roachtest to master that starts a workload, creates backup chain, takes a fingerprint at some random system time, and restores at the fingerprint time and verifies things look the same.

@adityamaru
Copy link
Contributor Author

adityamaru commented Nov 14, 2022

TestComposeGSS failures are known/skipped, rebasing.

Previously sstBatcher had a boolean to control whether it would split
and then scatter a range it was filling when it could determine that it
was about to overfill it i.e. if had previously reported its remaining
capacity to be less than what the batcher was about to add. RESTORE set
this boolean to disable this splitting and scattering as it pre-split and
pre-scattered all of its ranges and thus did not expect the batcher to
need to do any as-it-fills splitting and scattering.

However blanket disabling this makes little sense: if the batcher knows
that it is about to overfill a range, regardless of whether or not that
range was supposed to be pre-split to avoid this, it always makes sense
to split it before adding to it, rather than overfill it.

Thus this change makes as-we-fill splits unconditional, and moves the
boolean to be more narrowly scoped to just disabling the scatters, as
RESTORE still pre-scatters its pre-chunked restore spans. This paves the
way for RESTORE to make and pre-split larger chunks, without worrying if
doing so will cause overfilling of ranges.

Release note: none.

Release justification: bug fix.
@adityamaru
Copy link
Contributor Author

I broke something with the rebase, taking a look.

dt added 2 commits November 18, 2022 13:00
When constructing the spans to restore, we can produce fewer, wider spans
that each contain more files to avoid splitting as much.

Previously we split at the bounds of every file in the base backup, then
added incremental layer files to any span they overlapped. This changes
that slightly to keep appending files that would create a new span at the
right-hand-side of the span covering to instead extend current rightmost
entry in the covering if the total size of what has been added to it is
still below our target size per span (384mb).

Release justification: bug fix.

Release note (bug fix): reduce the amount that RESTORE over-splits.
@adityamaru adityamaru merged commit 71ffe43 into cockroachdb:release-22.1 Nov 21, 2022
@adityamaru adityamaru deleted the backport22.1-86496 branch November 21, 2022 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants