Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors get_snapshot_storages() #3760

Merged
merged 3 commits into from
Nov 25, 2024

Conversation

brooksprumo
Copy link

@brooksprumo brooksprumo commented Nov 23, 2024

Problem

AccountsDb::get_snapshot_storages() is due for a refactor. We can speed it up for the common case, and also simplify.

Since #3737, we now call get_snapshot_storages() every time we clean. The observation here is we'll only need about 100 storages (on average), yet the current impl of get_snapshot_storages() Arc::clone's all the storages, and then filters out the unneeded ones. We can change this to only Arc::clone the useful ones instead.

Additionally, the filter step is done in parallel. When we only have 100 storages, the parallel execution does not help. In fact, with a chunk size of 5000, we end up getting zero benefit, but do have to pay the cost of running in the thread pool.

Summary of Changes

  • When getting storages, only Arc::clone the ones we need
  • When filtering, do not use a thread pool

Results

I ran this on against mnb and saw good results.

Since get_snapshot_storages() is called in two places, I wanted to look at the perf results in both.

  1. clean

Here, we only need ~100 storages each time. So not Arc::cloning and not using the thread pool really helps. The PR ends up running consistently, and meaningfully, faster:

branch get storages filter total
master 30-50 ms 100-400 us 30-50 ms
pr v1 11-13 ms 10-30 us 11-13 ms
pr v2 n/a n/a 7-9 ms
pr v3 n/a n/a 6-8 ms
  1. taking full snapshots

Full snapshots do need all the storages, so we will end up Arc::cloning almost all the storages. And it's possible the thread pool does help here. For the most part, runtimes are pretty similar. The PR does have a worse worst-case filter time.

branch get storages filter total
master 30-50 ms 30-40 ms 60-90 ms
pr v1 30-50 ms 30-50 ms 60-100 ms
pr v2 n/a n/a 40-60 ms
pr v3 n/a n/a 40-60 ms

Overall, I think the common case of clean makes this change clearly a win. There is maybe a slight slow down for full snapshots, but that code is both infrequent, and in the background, so I don't think it matters much. Additionally, by not using a thread pool, we may reduce resource usage for the system as a whole.

@brooksprumo brooksprumo self-assigned this Nov 23, 2024
@brooksprumo brooksprumo marked this pull request as ready for review November 23, 2024 17:08
@jeffwashington
Copy link

this is fine. But, since it is now called very clean, another alternative is to get all roots less than the cutoff and then get storages based on roots. I imagine that's faster by a lot. Or, we just leave this like it is since by the time master is rolled out we are hopefully ready to skip rewrites finally and enable ancient packing. Not sure what to do. If you're not backporting this, I'd just leave it alone I imagine.

@brooksprumo
Copy link
Author

brooksprumo commented Nov 24, 2024

since it is now called very clean, another alternative is to get all roots less than the cutoff and then get storages based on roots. I imagine that's faster by a lot.

Yes, this does make an improvement for both clean and snapshots. Thanks for the suggestion!

Or, we just leave this like it is since by the time master is rolled out we are hopefully ready to skip rewrites finally and enable ancient packing. Not sure what to do. If you're not backporting this, I'd just leave it alone I imagine.

My initial idea was to backport to v2.1 but not v2.0. Since we're planning to activate skipping rewrites in v2.1, I want to reduce the negative impact of getting the snapshot storages in every iteration of clean. Not backporting to v2.1 will still help with snapshots, but not with clean.

@brooksprumo brooksprumo requested a review from HaoranYi November 24, 2024 03:35
@HaoranYi
Copy link

HaoranYi commented Nov 24, 2024

looks like ci is failing ...

Not sure if it is because of the pr or ci flakey

I will approve when ci passed.

@brooksprumo
Copy link
Author

looks like ci is failing ...

Not sure if it is because of the pr or ci flakey

I will approve when ci passed.

Here's some discussion from Discord on the CI issue: https://discord.com/channels/428295358100013066/560503042458517505/1310571536221995068

We'll need PR #3774 to land, and then I can rebase this PR to get the CI fix.

@brooksprumo brooksprumo force-pushed the get-snapshot-storages branch from 063e7a5 to 9a5b097 Compare November 25, 2024 18:39
@brooksprumo
Copy link
Author

PR #3774 was merged. I've rebased and force-pushed to pull in this fix. No code was changed.

Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brooksprumo
Copy link
Author

Merging! @HaoranYi, I'm interpreting #3760 (comment) as an approval. Lmk if that's wrong!

@brooksprumo brooksprumo merged commit 8c7ae80 into anza-xyz:master Nov 25, 2024
41 checks passed
@brooksprumo brooksprumo deleted the get-snapshot-storages branch November 25, 2024 21:09
@brooksprumo
Copy link
Author

@jeffwashington @HaoranYi I'd like to backport to v2.1, since 2.1 and 2.0 are the main ones that'll be impacted by cleaning old storages during clean. Any objections?

@brooksprumo brooksprumo added the v2.1 Backport to v2.1 branch label Nov 25, 2024
Copy link

mergify bot commented Nov 25, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Nov 25, 2024
@HaoranYi
Copy link

@jeffwashington @HaoranYi I'd like to backport to v2.1, since 2.1 and 2.0 are the main ones that'll be impacted by cleaning old storages during clean. Any objections?

Fine with me. Do you happen to have the updated benchmark time after the new changes?

@brooksprumo
Copy link
Author

brooksprumo commented Nov 25, 2024

Do you happen to have the updated benchmark time after the new changes?

The v3 numbers correspond to the code that was merged. Are those the numbers you're looking for?

@HaoranYi
Copy link

yeah. lgtm.

brooksprumo added a commit that referenced this pull request Nov 26, 2024
brooksprumo added a commit that referenced this pull request Nov 27, 2024
brooksprumo added a commit that referenced this pull request Dec 2, 2024
brooksprumo added a commit that referenced this pull request Dec 6, 2024
Refactors get_snapshot_storages() (#3760)

(cherry picked from commit 8c7ae80)

Co-authored-by: Brooks <[email protected]>
KirillLykov pushed a commit that referenced this pull request Dec 9, 2024
Refactors get_snapshot_storages() (#3760)

(cherry picked from commit 8c7ae80)

Co-authored-by: Brooks <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.1 Backport to v2.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants