-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Consensus] process sync committed sub dags #20956
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
39f1a9b
to
fd30e73
Compare
if decided_leaders.is_empty() { | ||
// TODO: limit commits by commits_until_update, which may be needed when leader schedule length is reduced. | ||
decided_leaders = self.committer.try_decide(self.last_decided_leader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to note here; we could avoid even linearizing the DAG and just produce the necessary outputs, persist etc. However right now:
- uitlising the same path saves us from having to specially handle the storing of committed sub dags, move GC rounds etc
- provides some extra layer of confirmation as we can - and we do in the end - make sure that the same sub dag is produced with the current state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high level strategy lgtm. This should be simpler than skipping & rewriting commit_observer.handle_commit()
.
@@ -724,6 +726,7 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[async_trait] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
consensus/core/src/block_manager.rs
Outdated
@@ -84,17 +84,15 @@ impl BlockManager { | |||
/// Tries to accept the provided blocks assuming that all their causal history exists. The method | |||
/// returns all the blocks that have been successfully processed in round ascending order, that includes also previously | |||
/// suspended blocks that have now been able to get accepted. Method also returns a set with the missing ancestor blocks. | |||
/// When the `commit_sync_gc_round_override` is > 0 then the method will skip any missing ancestors that are <= `commit_sync_gc_round_override` round. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
pub(crate) fn try_accept_blocks( | ||
&mut self, | ||
mut blocks: Vec<VerifiedBlock>, | ||
) -> (Vec<VerifiedBlock>, BTreeSet<BlockRef>) { | ||
let _s = monitored_scope("BlockManager::try_accept_blocks"); | ||
|
||
blocks.sort_by_key(|b| b.round()); | ||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this moved somewhere?
consensus/core/src/commit_syncer.rs
Outdated
match self | ||
.inner | ||
.core_thread_dispatcher | ||
.add_commits(commits, blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here is minimal which is nice.
.first() | ||
.expect("Synced commits should not be empty"); | ||
if synced_commit.index() <= last_commit_index { | ||
info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info seems too much logging here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direction of the PR LGTM!
// The function returns the list of decided leaders and updates in place the remaining synced commits. If empty vector is returned, it means that | ||
// there are no synced commits to be committed as `synced_commits` is either empty or all of the synced commits are already committed. | ||
#[tracing::instrument(skip_all)] | ||
pub(crate) fn try_decide_synced( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool! It feels like this is in general how we should handle the trusted commits regardless of GC. Did you notice a performance boost for catchup doing this instead of trying to run the commit rule for the synced commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't benchmarked this but I'll do. So far didn't see any difference on the opposite direction at least.
) -> ConsensusResult<BTreeSet<BlockRef>> { | ||
let _scope = monitored_scope("Core::add_commit"); | ||
|
||
// We want to enable the commit process logic when GC is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just do this even if GC is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well to be fair we could enable this without GC. Not sure if I wouldn't still feature flag it though. Let me think a bit about it.
consensus/core/src/core.rs
Outdated
// is reduced. | ||
let decided_leaders = self.committer.try_decide(self.last_decided_leader); | ||
// Always try to process the synced commits first | ||
let mut decided_leaders = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing to consider here is that you moved some of the logic of leader schedule into universal committer where try_decide does not work like that and we kept the commit logic as a black box that just spits out decided leaders. What do you think about continuing to keep that separation of logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my initial iteration was without moving part of that logic in universal committer and keeping it in Core. We can certainly do that. It's just that the truncation of the leaders is a bit more complex and decided to do everything in the try_decide_synced
method. I am open though to suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion is to remove try_decide_certified
from universal committer and keep it as a separate method in core i.e. get_certified_commit_leaders
. We are actually bypassing the committer with your new logic so it makes sense to not use the component at all.
.try_decide_synced(&mut synced_commits, commits_until_update); | ||
|
||
// If the synced `decided_leaders` is empty then try to run the decision rule. | ||
if decided_leaders.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you update self.last_decided_leader from the synced decided leaders and then run the commit rule to fill whatever remaining space is left for the schedule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to run the commit rule until the synced leaders have been committed and GC rounds have been moved. That's important to make sure that when we try to run the commit rule we have the most updated GC and there is no possibility to try retrieve causal history (ex when we figure out the certificate support) that doesn't exist.
eb9c357
to
e0d776a
Compare
consensus/core/src/core.rs
Outdated
certified_commit.index(), | ||
last_commit_index | ||
); | ||
certified_commits.remove(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems inefficient in a loop. We may as well filter certified_commits
, then check the 1st element for gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about the inefficiency. Let me refactor this.
consensus/core/src/core.rs
Outdated
certified_commits.remove(0); | ||
} else { | ||
// Make sure that the first commit we find is the next one in line and there is no gap. | ||
if certified_commit.index() != last_commit_index + 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is pretty critical to the correctness of Core::add_certified_commits(). Personally I would be comfortable having it called first thing in Core::add_certified_commits()
or included in the function itself. Otherwise we may loose or reorder this check after refactoring. I think try_decide_certified() can assume the CertifiedCommits have been filtered and validated, and can just assert_eq(...). Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point on the Core::add_certified_commits
. It's just that the try_commit
can be practically called from anywhere within the Core
and the caller should always be mindful to do the validation. But given the limit scope we can continue with the assumption here and I can move the checks. Left me refactor this a bit.
return Vec::new(); | ||
} | ||
|
||
let to_commit = if certified_commits.len() >= limit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For logic above this line, I think they are better to be around core.rs. For logic below, they are indeed closer to commit rule. But tbh I'm not sure if we should actually move leader schedule commit loop to core as well instead. There are more refactor and we don't have to make things perfect in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll keep the logic for now in Core.rs. I agree that overall we could work out some refactoring with the leader schedule loop etc. Let's do that on upcoming PR.
8baad55
to
6209b58
Compare
6209b58
to
76c6717
Compare
76c6717
to
c6b4dfb
Compare
…above the last committed leader's gc.
c6b4dfb
to
537f02b
Compare
Description
The PR is changing the way we are processing the synchronized committed sub dags when GC is enabled. This was a necessity given that it's possible during GC for committed sub dags to drop dependencies that are necessary to accept vote and decision round blocks, effectively leading us to having to fetch the missing blocks via the synchronizer. By empirical examination there is a correlation between the missing blocks and the gc depth - the lower the gc depth the higher the number of missing blocks that need to be fetched.
With this new approach we attempt to commit the synchronized committed sub dag without waiting for the commit rule to run (hence wait to have accepted blocks to form the voting & decision rounds). Attention needs to be given that to avoid edge cases we need to make sure that the synchronized sub dags needs to be processed first on their own , and then run on a next iteration the commit rule for any other leaders. That guarantees that any accepted blocks due to commit sync will not run into "missing ancestor" issues.
Test plan
CI/PT
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.