Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Cache known votes in gossip #227

Merged
merged 4 commits into from
Jul 5, 2021
Merged

Cache known votes in gossip #227

merged 4 commits into from
Jul 5, 2021

Conversation

tomusdrw
Copy link
Contributor

@tomusdrw tomusdrw commented Jul 2, 2021

This PR introduces a cache for known votes (ideally I think we should be avoiding the RwLock stuff). Each time we receive a valid vote for a particular round, we keep it's hash in gossip to quickly approve such votes in the future if they are coming from other peers. AFAICT the sc_network_gossip does de-duplicate incoming messages, but only if they are coming from the same peer, not necessarily if they are sent by multiple peers.

My guess is that receiving & verifying the same votes from multiple peers might be causing the high CPU usage we are observing on Rococo, so this cache should alleviate the issue before we implement #212 and #182.

Also in validate function we quickly reject messages that are for old rounds. Previously we were always verifying and accepting such messages, but later on they were considered "expired" by the gossip subsystem, hence removed.

@adoerr adoerr self-requested a review July 2, 2021 16:36
Copy link
Contributor

@adoerr adoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable for a stop gap solution.


expired
})
}

#[allow(clippy::type_complexity)]
fn message_allowed<'a>(&'a self) -> Box<dyn FnMut(&PeerId, MessageIntent, &B::Hash, &[u8]) -> bool + 'a> {
let live_rounds = self.live_rounds.read();
let known_votes = self.known_votes.read();
Box::new(move |_who, _intent, _topic, mut data| {
Copy link
Contributor

@adoerr adoerr Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to how propagate() handles MessageIntent::PeriodicRebroadcast, I'm inclined to just drop rebroadcasts. Something like

		Box::new(move |_who, intent, _topic, mut data| {
			// We drop rebroadcasts. If a peer doesn't know a message, the gossip state machine will
			// treat a rebroadcast as a broadcast. In other words, if we do see a rebroadcast, it is
			// *alwayas* an already known message.
			if intent == MessageIntent::PeriodicRebroadcast {
				return false;
			}

I tested this with rococo-local and it worked well. Theoretically this should give us a pretty minimal amount of gossip messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but this doesn't guarantee that the round is concluded for everyone really. Until we have nodes requesting missing BEEFY justifications (I guess part of import queue stuff) I think we should be on the safe side of gossiping more, but without risking some sessions not being concluded.

I guess it is a potential workaround for now as well, but given this cache receiving re-broadcasted votes should be cheap.

Copy link
Contributor

@adoerr adoerr Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gossip state machine seems to assure, that every peer sees every message at least once as a Broadcast. That should be good enough to conclude a round. Especially since we only need 2/3 + 1 votes and not every vote from every validator.

Whether rebroadcasting is cheap totally depends on the connectivity of the nodes. If every nodes sees every vote, we do rebroadcast n * (n - 1) times.

@adoerr adoerr merged commit d7bf55c into master Jul 5, 2021
@adoerr adoerr deleted the td-cpu branch July 5, 2021 03:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants