Cache known votes in gossip #227

tomusdrw · 2021-07-02T13:11:35Z

This PR introduces a cache for known votes (ideally I think we should be avoiding the RwLock stuff). Each time we receive a valid vote for a particular round, we keep it's hash in gossip to quickly approve such votes in the future if they are coming from other peers. AFAICT the sc_network_gossip does de-duplicate incoming messages, but only if they are coming from the same peer, not necessarily if they are sent by multiple peers.

My guess is that receiving & verifying the same votes from multiple peers might be causing the high CPU usage we are observing on Rococo, so this cache should alleviate the issue before we implement #212 and #182.

Also in validate function we quickly reject messages that are for old rounds. Previously we were always verifying and accepting such messages, but later on they were considered "expired" by the gossip subsystem, hence removed.

adoerr

Looks reasonable for a stop gap solution.

adoerr · 2021-07-02T16:44:41Z

beefy-gadget/src/gossip.rs


 			expired
 		})
 	}

 	#[allow(clippy::type_complexity)]
 	fn message_allowed<'a>(&'a self) -> Box<dyn FnMut(&PeerId, MessageIntent, &B::Hash, &[u8]) -> bool + 'a> {
-		let live_rounds = self.live_rounds.read();
+		let known_votes = self.known_votes.read();
 		Box::new(move |_who, _intent, _topic, mut data| {


According to how propagate() handles MessageIntent::PeriodicRebroadcast, I'm inclined to just drop rebroadcasts. Something like

Box::new(move |_who, intent, _topic, mut data| { // We drop rebroadcasts. If a peer doesn't know a message, the gossip state machine will // treat a rebroadcast as a broadcast. In other words, if we do see a rebroadcast, it is // *alwayas* an already known message. if intent == MessageIntent::PeriodicRebroadcast { return false; }

I tested this with rococo-local and it worked well. Theoretically this should give us a pretty minimal amount of gossip messages.

Hmm, but this doesn't guarantee that the round is concluded for everyone really. Until we have nodes requesting missing BEEFY justifications (I guess part of import queue stuff) I think we should be on the safe side of gossiping more, but without risking some sessions not being concluded.

I guess it is a potential workaround for now as well, but given this cache receiving re-broadcasted votes should be cheap.

The gossip state machine seems to assure, that every peer sees every message at least once as a Broadcast. That should be good enough to conclude a round. Especially since we only need 2/3 + 1 votes and not every vote from every validator.

Whether rebroadcasting is cheap totally depends on the connectivity of the nodes. If every nodes sees every vote, we do rebroadcast n * (n - 1) times.

tomusdrw and others added 4 commits July 2, 2021 14:31

Implement known messages cache.

db9934c

Add tests.

200511c

Appease clippy.

0976c20

More clippy

05f2d7c

adoerr self-requested a review July 2, 2021 16:36

adoerr approved these changes Jul 2, 2021

View reviewed changes

adoerr merged commit d7bf55c into master Jul 5, 2021

adoerr deleted the td-cpu branch July 5, 2021 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache known votes in gossip #227

Cache known votes in gossip #227

tomusdrw commented Jul 2, 2021

adoerr left a comment

adoerr Jul 2, 2021 •

edited

Loading

tomusdrw Jul 2, 2021

adoerr Jul 2, 2021 •

edited

Loading

Cache known votes in gossip #227

Cache known votes in gossip #227

Conversation

tomusdrw commented Jul 2, 2021

adoerr left a comment

Choose a reason for hiding this comment

adoerr Jul 2, 2021 • edited Loading

Choose a reason for hiding this comment

tomusdrw Jul 2, 2021

Choose a reason for hiding this comment

adoerr Jul 2, 2021 • edited Loading

Choose a reason for hiding this comment

adoerr Jul 2, 2021 •

edited

Loading

adoerr Jul 2, 2021 •

edited

Loading