-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QD Reconfig] 3. add reconfig observer #7024
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
ef60bc0
to
df4d920
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will review after rebasing and existing comments are addressed! (please re-request review from me when you're ready)
d8571ae
to
4a91c30
Compare
df4d920
to
3641a60
Compare
warn!("Ignored non-newer from reconfig channel: {}", committee); | ||
} | ||
} | ||
// Neither closed channel nor lagged shall happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just handle the lagged state? it should be recoverable. its not clear to me why it can't possibly happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for closed channel. Is there a reason for this assumption? Probably best to handle it in case we change the way executor works. As an example, we were toying with having executor restart at epoch end to make strong consistency of epoch store easier. This would create problems for you if you assume that this channel cannot close from underneath you
wait_for_nodes_transition_to_epoch(authorities.iter().chain(fullnodes.iter()), 1).await; | ||
|
||
// Give it some time for the update to happen | ||
tokio::time::sleep(tokio::time::Duration::from_secs(3)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't wait_for_nodes_transition_to_epoch
make the sleep unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait_for_nodes_transition_to_epoch
blocks until it gets a reconfig message from the channel. But it may take a bit more time for the reconfig observer to get the reconfig message itself and create/swap the auth-agg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments, overall looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from the last comment on handling the closed channel. Although it sounds like we may be moving the sender to SuiNode
, in which case the channel should never close.
4a91c30
to
b94d1f3
Compare
b94d1f3
to
6fe88b2
Compare
3641a60
to
3c6f11a
Compare
6fe88b2
to
d5cda8b
Compare
f21c8b9
to
6cd59f0
Compare
OnsiteReconfigObserver
is the RO that lives inFullnode
/TransactionOrchestrator
that subscribes to the checkpoint executor reconfig channel. Note the integration ofOnsiteReconfigObserver
andTransactionOrchestrator
happens in the follow-up PR