[QD Reconfig] 3. add reconfig observer #7024

longbowlu · 2022-12-25T04:01:17Z

This PR adds ReconfigObserver Trait. RO detects reconfigs and updates quorum driver the new committee.
OnsiteReconfigObserver is the RO that lives in Fullnode/TransactionOrchestrator that subscribes to the checkpoint executor reconfig channel. Note the integration of OnsiteReconfigObserver and TransactionOrchestrator happens in the follow-up PR

vercel · 2022-12-25T04:01:20Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
explorer	🔄 Building (Inspect)			Jan 6, 2023 at 0:57AM (UTC)

2 Ignored Deployments

Name	Status	Preview	Comments	Updated
explorer-storybook	⬜️ Ignored (Inspect)			Jan 6, 2023 at 0:57AM (UTC)
wallet-adapter	⬜️ Ignored (Inspect)			Jan 6, 2023 at 0:57AM (UTC)

crates/sui-core/src/authority_aggregator.rs

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs

crates/sui-core/src/quorum_driver/reconfig_observer.rs

crates/sui-node/src/lib.rs

mystenmark

will review after rebasing and existing comments are addressed! (please re-request review from me when you're ready)

crates/sui-core/src/quorum_driver/reconfig_observer.rs

mystenmark · 2023-01-04T19:12:45Z

crates/sui-core/src/quorum_driver/reconfig_observer.rs

+                        warn!("Ignored non-newer from reconfig channel: {}", committee);
+                    }
+                }
+                // Neither closed channel nor lagged shall happen


why not just handle the lagged state? it should be recoverable. its not clear to me why it can't possibly happen.

Similarly for closed channel. Is there a reason for this assumption? Probably best to handle it in case we change the way executor works. As an example, we were toying with having executor restart at epoch end to make strong consistency of epoch store easier. This would create problems for you if you assume that this channel cannot close from underneath you

crates/test-utils/src/network.rs

crates/test-utils/src/authority.rs

mystenmark · 2023-01-04T19:19:21Z

crates/sui/tests/onsite_reconfig_observer_tests.rs

+    wait_for_nodes_transition_to_epoch(authorities.iter().chain(fullnodes.iter()), 1).await;
+
+    // Give it some time for the update to happen
+    tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;


shouldn't wait_for_nodes_transition_to_epoch make the sleep unnecessary?

wait_for_nodes_transition_to_epoch blocks until it gets a reconfig message from the channel. But it may take a bit more time for the reconfig observer to get the reconfig message itself and create/swap the auth-agg

crates/sui/tests/onsite_reconfig_observer_tests.rs

mystenmark

just a few comments, overall looks good

williampsmith

LGTM aside from the last comment on handling the closed channel. Although it sounds like we may be moving the sender to SuiNode, in which case the channel should never close.

longbowlu requested review from lxfind, bmwill, mystenmark and williampsmith December 27, 2022 18:42

longbowlu commented Dec 27, 2022

View reviewed changes

crates/sui-core/src/authority_aggregator.rs Outdated Show resolved Hide resolved

longbowlu marked this pull request as ready for review December 27, 2022 18:43

williampsmith reviewed Dec 28, 2022

View reviewed changes

longbowlu force-pushed the add-reconfig-observer branch from ef60bc0 to df4d920 Compare December 28, 2022 23:01

longbowlu requested review from asonnino, akichidis, randall-Mysten and ronny-mysten as code owners December 28, 2022 23:01

mystenmark reviewed Dec 29, 2022

View reviewed changes

longbowlu mentioned this pull request Dec 29, 2022

allow to create AuthorityAggregator with existing metrics #7023

Merged

longbowlu force-pushed the remove-auth-active branch from d8571ae to 4a91c30 Compare December 29, 2022 20:23

longbowlu force-pushed the add-reconfig-observer branch from df4d920 to 3641a60 Compare December 30, 2022 01:31

longbowlu requested review from mystenmark and williampsmith December 30, 2022 01:55