improved gossip topology #3270

ordian · 2021-06-16T18:15:30Z

Changes parachain gossip topology to reduce the amount of hops and duplicate messages for approval assignments distribution.

resolve TODO (ordian)
update the guide
fix the tests

node/network/gossip-support/src/lib.rs

eskimor · 2021-06-16T19:31:13Z

node/network/bitfield-distribution/src/lib.rs

@@ -350,7 +356,11 @@ where
 			}
 		})
 		.collect::<Vec<PeerId>>();
-	let interested_peers = util::choose_random_sqrt_subset(interested_peers, MIN_GOSSIP_PEERS);
+	let interested_peers = util::choose_random_subset(


Why do we only use a subset?

I expect that the subset will be limited to MIN_GOSSIP_PEERS in two cases:

Small network (2 * sqrt(N) < MIN_GOSSIP_PEERS).

The node has just restarted and its DHT is not populated. It would go up on the next session, so there is no much point in making this logic more complex than it needs to be.

eskimor · 2021-06-16T19:33:35Z

node/network/bitfield-distribution/src/lib.rs

@@ -532,6 +542,9 @@ where
 			// get rid of superfluous data
 			state.peer_views.remove(&peerid);
 		}
+		NetworkBridgeEvent::NewGossipTopology(peers) => {


Shouldn't this per relay chain head? So at session boundaries, we would gossip to different peers depending on the current head? I think that would be good for smooth session changes.

Are you saying that during the session change we should gossip to two gossip group (from previous and current topology)?

No, usually we gossip per head - right? So if we are sending messages with regard to head x, we should be using gossip peers that belong to the session of head x. Otherwise we lose the guarantee that we are going to reach everybody in two hops. I mean in practice it will be fine, just because of the huge amount of redundancy.

newly added gossip peers should receive their updates as in 12de015

peers should receive NewGossipTopology on the first block that has new session index for child, which should happen roughly at the same time for peers. But if not, as you mentioned, redundancy helps.

we accept all valid incoming messages (not just from gossip peers)

* master: Companion #9019 (max rpc payload override) (#3276) Implementers' Guide: Chain Selection (#3262) CLI: Add missing feature checking and check if someone passes a file (#3283) Export 'TakeRevenue' trait. (#3278) Add XCM Decode Limit (#3273) Allow Council to Use Scheduler (#3237) fix xcm pallet origin (#3272) extract determine_new_blocks into a separate utility (#3261) Approval checking unit tests (#3252) bridges: update finality-grandpa to 0.14.1 (#3266) malus - mockable overseer mvp (#3224) use safe math (#3249) Companion for #8920 (Control Staking) (#3260) Companion for #8949 (#3216)

rphmeier · 2021-06-17T18:47:01Z

@ordian I added the releasenotes label.

Can you add something to the PR description about the reduced bandwidth in expectation to 2sqrt(n) / n, so for 900 validators expected 6.5% of current bandwidth?

* master: cleanup more tests and spaces (#3288)

ordian · 2021-06-17T19:03:46Z

@ordian I added the releasenotes label.

Can you add something to the PR description about the reduced bandwidth in expectation to 2sqrt(n) / n, so for 900 validators expected 6.5% of current bandwidth?

We'd send more messages for bitfield distribution and small statement distribution (2x), but less for approval distribution (because it's not based on active leaves, but rather on peers known messages and sqrt selection if now static for the duration of the session). But the real benefit would be guaranteed network diameter of 2 I think. Will update the description.

ordian · 2021-06-17T22:57:21Z

@ordian I added the releasenotes label.
Can you add something to the PR description about the reduced bandwidth in expectation to 2sqrt(n) / n, so for 900 validators expected 6.5% of current bandwidth?

We'd send more messages for bitfield distribution and small statement distribution (2x), but less for approval distribution (because it's not based on active leaves, but rather on peers known messages and sqrt selection if now static for the duration of the session). But the real benefit would be guaranteed network diameter of 2 I think. Will update the description.

confirmed by metrics:
number of /polkadot/validation/1 notifications went up < 2x, but the bandwidth went down > 2x and so is avg size of notifications

rphmeier · 2021-06-18T17:07:04Z

node/network/bridge/src/network.rs

@@ -303,3 +303,17 @@ impl Network for Arc<NetworkService<Block, Hash>> {
 		);
 	}
 }
+
+/// We assume one peer_id per authority_id.
+pub async fn get_peer_id_by_authority_id<AD: AuthorityDiscovery>(


So this would start to break if many nodes changed their PeerIds at once, without rotating their session keys.
I believe this is acceptable, as honest nodes will practically always rotate their keys before changing the machine / VM their node is running on, due to the risk of equivocation / slashing.

If a node changes their PeerId, I'd assume they restart the node and connect under the new PeerId and disconnect under the old one. They'd also publish a new record on the DHT with their new PeerId. It's true that our cache might contain the old record when that happens.
But the list of our neighbors is not necessarily connected to us, we'll use random peers if there are not enough.

roadmap/implementers-guide/src/types/overseer-protocol.md

node/subsystem-util/src/lib.rs

node/network/gossip-support/src/lib.rs

* master: Set new staking limits (#3299) Bump derive_more from 0.99.11 to 0.99.14 (#3248) add revert consensus log (#3275) Add bridge team as codeowners of `bridges` Subtree (#3291) Extract and test count_no_shows method for approval voting (#3264)

node/network/gossip-support/src/lib.rs

gossip-support: gossip topology

e1eba8f

ordian added B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jun 16, 2021

github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label Jun 16, 2021

bkchr reviewed Jun 16, 2021

View reviewed changes

node/network/gossip-support/src/lib.rs Outdated Show resolved Hide resolved

eskimor reviewed Jun 16, 2021

View reviewed changes

ordian added 7 commits June 16, 2021 22:33

some fixes

38ffd6b

handle view update for newly added gossip peers

12de015

fix neighbors calculation

c06e56d

fix test

547f5df

resolve TODOs

39811a0

typo

3888f0a

guide updates

7c5fcbe

ordian marked this pull request as ready for review June 17, 2021 15:51

ordian added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jun 17, 2021

ordian added 2 commits June 17, 2021 17:54

spaces in the guide

f5fac43

rphmeier added B1-releasenotes and removed B0-silent Changes should not be mentioned in any release notes labels Jun 17, 2021

ordian added 2 commits June 17, 2021 20:53

Merge branch 'master' into ao-improved-gossip-topology

0cf698a

* master: cleanup more tests and spaces (#3288)

sneaky spaces

e5c604f

hash randomness

24befd1

ordian mentioned this pull request Jun 17, 2021

Add dispute_period and random_seed to SessionInfo #3227

Closed

ordian mentioned this pull request Jun 17, 2021

experimental branch for rococo #3290

Closed