Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Shelved] Add flushed_roots to dirty_store #4304

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 71 additions & 30 deletions accounts-db/src/accounts_db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6171,29 +6171,37 @@ impl AccountsDb {
// Note even if force_flush is false, we will still flush all roots <= the
// given `requested_flush_root`, even if some of the later roots cannot be used for
// cleaning due to an ongoing scan
let (total_new_cleaned_roots, num_cleaned_roots_flushed, mut flush_stats) = self
.flush_rooted_accounts_cache(
requested_flush_root,
Some((&mut account_bytes_saved, &mut num_accounts_saved)),
);
let (
total_new_cleaned_roots,
num_cleaned_roots_flushed,
num_roots_added_to_dirty_stores,
mut flush_stats,
) = self.flush_rooted_accounts_cache(
requested_flush_root,
Some((&mut account_bytes_saved, &mut num_accounts_saved)),
);
flush_roots_elapsed.stop();

// Note we don't purge unrooted slots here because there may be ongoing scans/references
// for those slot, let the Bank::drop() implementation do cleanup instead on dead
// banks

// If 'should_aggressively_flush_cache', then flush the excess ones to storage
let (total_new_excess_roots, num_excess_roots_flushed, flush_stats_aggressively) =
if self.should_aggressively_flush_cache() {
// Start by flushing the roots
//
// Cannot do any cleaning on roots past `requested_flush_root` because future
// snapshots may need updates from those later slots, hence we pass `None`
// for `should_clean`.
self.flush_rooted_accounts_cache(None, None)
} else {
(0, 0, FlushStats::default())
};
let (
total_new_excess_roots,
num_excess_roots_flushed,
num_excess_roots_added_to_dirty_store,
flush_stats_aggressively,
) = if self.should_aggressively_flush_cache() {
// Start by flushing the roots
//
// Cannot do any cleaning on roots past `requested_flush_root` because future
// snapshots may need updates from those later slots, hence we pass `None`
// for `should_clean`.
self.flush_rooted_accounts_cache(None, None)
} else {
(0, 0, 0, FlushStats::default())
};
flush_stats.accumulate(&flush_stats_aggressively);

let mut excess_slot_count = 0;
Expand Down Expand Up @@ -6233,6 +6241,16 @@ impl AccountsDb {
("num_cleaned_roots_flushed", num_cleaned_roots_flushed, i64),
("total_new_excess_roots", total_new_excess_roots, i64),
("num_excess_roots_flushed", num_excess_roots_flushed, i64),
(
"num_roots_added_to_dirty_stores",
num_roots_added_to_dirty_stores,
i64
),
(
"num_excess_roots_added_to_dirty_stores",
num_excess_roots_added_to_dirty_store,
i64
),
("excess_slot_count", excess_slot_count, i64),
(
"unflushable_unrooted_slot_count",
Expand Down Expand Up @@ -6269,7 +6287,7 @@ impl AccountsDb {
&self,
requested_flush_root: Option<Slot>,
should_clean: Option<(&mut usize, &mut usize)>,
) -> (usize, usize, FlushStats) {
) -> (usize, usize, usize, FlushStats) {
let max_clean_root = should_clean.as_ref().and_then(|_| {
// If there is a long running scan going on, this could prevent any cleaning
// based on updates from slots > `max_clean_root`.
Expand All @@ -6295,34 +6313,56 @@ impl AccountsDb {
});

// Always flush up to `requested_flush_root`, which is necessary for things like snapshotting.
let cached_roots: BTreeSet<Slot> = self.accounts_cache.clear_roots(requested_flush_root);
let flushed_roots: BTreeSet<Slot> = self.accounts_cache.clear_roots(requested_flush_root);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename: cached_roots -> flushed_roots for clarity.


// Iterate from highest to lowest so that we don't need to flush earlier
// outdated updates in earlier roots
let mut num_roots_flushed = 0;
let mut num_roots_added_to_dirty_stores = 0;
let mut flush_stats = FlushStats::default();
for &root in cached_roots.iter().rev() {
for &root in flushed_roots.iter().rev() {
if let Some(stats) =
self.flush_slot_cache_with_clean(root, should_flush_f.as_mut(), max_clean_root)
{
num_roots_flushed += 1;
flush_stats.accumulate(&stats);
}

// Regardless of whether this slot was *just* flushed from the cache by the above
// `flush_slot_cache()`, we should update the `max_flush_root`.
// This is because some rooted slots may be flushed to storage *before* they are marked as root.
// This can occur for instance when
// the cache is overwhelmed, we flushed some yet to be rooted frozen slots
// These slots may then *later* be marked as root, so we still need to handle updating the
// `max_flush_root` in the accounts cache.
if let Some(storage) = self.storage.get_slot_storage_entry(root) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.uncleaned_pubkeys.insert(slot, dirty_keys);

These slots has already been added to uncleaned_pubkeys when calculating accounts_delta_hash.

When clean runs, it will include all these dirty pubkeys in the candidates.

self.uncleaned_pubkeys.remove(&uncleaned_slot)

Therefore, I think we don't need to add them to dirty store at flush. It seems redundant and is a waste of time to scan those stores again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to only added to dirty_stores when the slot is not found in uncleaned_pubkeys, and a stat to count how many time we added them.
I think it should be zero. I have started a node the verify this. We will know soon.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, mystery solved, thank you.

With the lt hash, we'll eventually remove the accounts delta hash calculation. At that point we'll need to fix this then.

Would it make sense to not add to uncleaned_pubkeys when calculating the accounts delta hash, and instead only add the storage to dirty_stores during flush?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can do that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental run on mainnet shows that the stats are indeed zeros. This means that we don't need to add them to dirty storage at flush time.

I will shelve this PR, and start two new other PRs for the obsolete comment change and the fetch_max optimization change.

// In this loop, "flush_slot_cache_with_clean" will handle
// cleaning the accounts only for the "flushed_roots". But,
// these "flushed_roots" may trigger additional clean for older
// roots. Therefore, we may need to add them to "dirty_stores"
// for clean to pick up such additional cleaning. However, the
// "dirty_keys" in those stores should have already been added
// to 'uncleaned_pubkeys' when we calculate accounts hash.
// Therefore, we don't need to add them to dirty_stores here
// again if they exists in 'uncleaned_pubkeys'.
if !self.uncleaned_pubkeys.contains_key(&root) {
num_roots_added_to_dirty_stores += 1;
self.dirty_stores.entry(root).or_insert(storage);
}
}
}

// Regardless of whether this slot was *just* flushed from the cache by
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optimization to move the "max_flush_root" update out of the loop.
We only need to fetch_max with the max value in the "flushed_roots".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively this looks right, but I wonder if there was a reason to not originally do it this way... Maybe worth looking at the git history to see?

Copy link
Author

@HaoranYi HaoranYi Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was introduced 4 years ago, and has never changed since them. @carllin

solana-labs@8f7c7fb#diff-1090394420d51617f3233275c2b65ed706b35b53b115fe65f82c682af8134a6fR3017

@carllin Is there any reason that we can't move set_max_flush_root out of the loop?

// the above loop. we should update the `max_flush_root`. This is
// because some rooted slots may be flushed to storage *before* they are
// marked as root. This can occur for instance when the cache is
// overwhelmed, we flushed some yet to be rooted frozen slots. These
// slots may then *later* be marked as root, so we still need to handle
// updating the `max_flush_root` in the accounts cache.
if let Some(&root) = flushed_roots.last() {
self.accounts_cache.set_max_flush_root(root);
}

// Only add to the uncleaned roots set *after* we've flushed the previous roots,
// so that clean will actually be able to clean the slots.
let num_new_roots = cached_roots.len();
(num_new_roots, num_roots_flushed, flush_stats)
let num_new_roots = flushed_roots.len();
(
num_new_roots,
num_roots_flushed,
num_roots_added_to_dirty_stores,
flush_stats,
)
}

fn do_flush_slot_cache(
Expand Down Expand Up @@ -8384,6 +8424,7 @@ impl AccountsDb {
.storage
.get_slot_storage_entry_shrinking_in_progress_ok(slot)
{
info!("haoran add_dirty_store when rooting: {}", slot);
self.dirty_stores.insert(slot, store);
}
store_time.stop();
Expand Down
Loading