Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] process sync committed sub dags #20956

Merged
merged 3 commits into from
Feb 21, 2025

Conversation

akichidis
Copy link
Contributor

Description

The PR is changing the way we are processing the synchronized committed sub dags when GC is enabled. This was a necessity given that it's possible during GC for committed sub dags to drop dependencies that are necessary to accept vote and decision round blocks, effectively leading us to having to fetch the missing blocks via the synchronizer. By empirical examination there is a correlation between the missing blocks and the gc depth - the lower the gc depth the higher the number of missing blocks that need to be fetched.

With this new approach we attempt to commit the synchronized committed sub dag without waiting for the commit rule to run (hence wait to have accepted blocks to form the voting & decision rounds). Attention needs to be given that to avoid edge cases we need to make sure that the synchronized sub dags needs to be processed first on their own , and then run on a next iteration the commit rule for any other leaders. That guarantees that any accepted blocks due to commit sync will not run into "missing ancestor" issues.

Test plan

CI/PT


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

Sorry, something went wrong.

Copy link

vercel bot commented Jan 23, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 19, 2025 6:26pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Feb 19, 2025 6:26pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Feb 19, 2025 6:26pm

Comment on lines +744 to +779
if decided_leaders.is_empty() {
// TODO: limit commits by commits_until_update, which may be needed when leader schedule length is reduced.
decided_leaders = self.committer.try_decide(self.last_decided_leader);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to note here; we could avoid even linearizing the DAG and just produce the necessary outputs, persist etc. However right now:

  • uitlising the same path saves us from having to specially handle the storing of committed sub dags, move GC rounds etc
  • provides some extra layer of confirmation as we can - and we do in the end - make sure that the same sub dag is produced with the current state

Copy link
Contributor

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high level strategy lgtm. This should be simpler than skipping & rewriting commit_observer.handle_commit().

@@ -724,6 +726,7 @@ mod tests {
}
}

#[async_trait]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@@ -84,17 +84,15 @@ impl BlockManager {
/// Tries to accept the provided blocks assuming that all their causal history exists. The method
/// returns all the blocks that have been successfully processed in round ascending order, that includes also previously
/// suspended blocks that have now been able to get accepted. Method also returns a set with the missing ancestor blocks.
/// When the `commit_sync_gc_round_override` is > 0 then the method will skip any missing ancestors that are <= `commit_sync_gc_round_override` round. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

pub(crate) fn try_accept_blocks(
&mut self,
mut blocks: Vec<VerifiedBlock>,
) -> (Vec<VerifiedBlock>, BTreeSet<BlockRef>) {
let _s = monitored_scope("BlockManager::try_accept_blocks");

blocks.sort_by_key(|b| b.round());
debug!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this moved somewhere?

match self
.inner
.core_thread_dispatcher
.add_commits(commits, blocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is minimal which is nice.

.first()
.expect("Synced commits should not be empty");
if synced_commit.index() <= last_commit_index {
info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info seems too much logging here

Copy link
Contributor

@arun-koshy arun-koshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction of the PR LGTM!

// The function returns the list of decided leaders and updates in place the remaining synced commits. If empty vector is returned, it means that
// there are no synced commits to be committed as `synced_commits` is either empty or all of the synced commits are already committed.
#[tracing::instrument(skip_all)]
pub(crate) fn try_decide_synced(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! It feels like this is in general how we should handle the trusted commits regardless of GC. Did you notice a performance boost for catchup doing this instead of trying to run the commit rule for the synced commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't benchmarked this but I'll do. So far didn't see any difference on the opposite direction at least.

) -> ConsensusResult<BTreeSet<BlockRef>> {
let _scope = monitored_scope("Core::add_commit");

// We want to enable the commit process logic when GC is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just do this even if GC is not enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well to be fair we could enable this without GC. Not sure if I wouldn't still feature flag it though. Let me think a bit about it.

// is reduced.
let decided_leaders = self.committer.try_decide(self.last_decided_leader);
// Always try to process the synced commits first
let mut decided_leaders = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to consider here is that you moved some of the logic of leader schedule into universal committer where try_decide does not work like that and we kept the commit logic as a black box that just spits out decided leaders. What do you think about continuing to keep that separation of logic?

Copy link
Contributor Author

@akichidis akichidis Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my initial iteration was without moving part of that logic in universal committer and keeping it in Core. We can certainly do that. It's just that the truncation of the leaders is a bit more complex and decided to do everything in the try_decide_synced method. I am open though to suggestions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion is to remove try_decide_certified from universal committer and keep it as a separate method in core i.e. get_certified_commit_leaders. We are actually bypassing the committer with your new logic so it makes sense to not use the component at all.

.try_decide_synced(&mut synced_commits, commits_until_update);

// If the synced `decided_leaders` is empty then try to run the decision rule.
if decided_leaders.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you update self.last_decided_leader from the synced decided leaders and then run the commit rule to fill whatever remaining space is left for the schedule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to run the commit rule until the synced leaders have been committed and GC rounds have been moved. That's important to make sure that when we try to run the commit rule we have the most updated GC and there is no possibility to try retrieve causal history (ex when we figure out the certificate support) that doesn't exist.

@akichidis akichidis temporarily deployed to sui-typescript-aws-kms-test-env February 12, 2025 13:39 — with GitHub Actions Inactive
@akichidis akichidis force-pushed the akichidis/commit-syncer-process-blocks branch from eb9c357 to e0d776a Compare February 12, 2025 13:40
@akichidis akichidis temporarily deployed to sui-typescript-aws-kms-test-env February 12, 2025 13:40 — with GitHub Actions Inactive
@mwtian mwtian requested review from mwtian and arun-koshy February 12, 2025 18:33
certified_commit.index(),
last_commit_index
);
certified_commits.remove(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inefficient in a loop. We may as well filter certified_commits, then check the 1st element for gap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the inefficiency. Let me refactor this.

certified_commits.remove(0);
} else {
// Make sure that the first commit we find is the next one in line and there is no gap.
if certified_commit.index() != last_commit_index + 1 {
Copy link
Contributor

@mwtian mwtian Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is pretty critical to the correctness of Core::add_certified_commits(). Personally I would be comfortable having it called first thing in Core::add_certified_commits() or included in the function itself. Otherwise we may loose or reorder this check after refactoring. I think try_decide_certified() can assume the CertifiedCommits have been filtered and validated, and can just assert_eq(...). Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point on the Core::add_certified_commits. It's just that the try_commit can be practically called from anywhere within the Core and the caller should always be mindful to do the validation. But given the limit scope we can continue with the assumption here and I can move the checks. Left me refactor this a bit.

return Vec::new();
}

let to_commit = if certified_commits.len() >= limit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logic above this line, I think they are better to be around core.rs. For logic below, they are indeed closer to commit rule. But tbh I'm not sure if we should actually move leader schedule commit loop to core as well instead. There are more refactor and we don't have to make things perfect in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll keep the logic for now in Core.rs. I agree that overall we could work out some refactoring with the leader schedule loop etc. Let's do that on upcoming PR.

Verified

This commit was signed with the committer’s verified signature.
Julesssss Jules
…above the last committed leader's gc.

Verified

This commit was signed with the committer’s verified signature.
Julesssss Jules

Verified

This commit was signed with the committer’s verified signature.
Julesssss Jules
@akichidis akichidis temporarily deployed to sui-typescript-aws-kms-test-env February 19, 2025 18:24 — with GitHub Actions Inactive
@akichidis akichidis merged commit 5c1fdc3 into main Feb 21, 2025
47 checks passed
@akichidis akichidis deleted the akichidis/commit-syncer-process-blocks branch February 21, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants