Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossipsub mesh does not converge #215

Closed
aschmahmann opened this issue Sep 26, 2019 · 6 comments
Closed

Gossipsub mesh does not converge #215

aschmahmann opened this issue Sep 26, 2019 · 6 comments

Comments

@aschmahmann
Copy link
Contributor

The rules for gossipsub's GRAFT and PRUNE messages are:

  • If mesh size less than D_low, GRAFT peers until we're at D
  • If mesh size greater than D_high PRUNE peers until we're at D

If the mesh is a star with a single node connected to D_high + 1 nodes then the center node will be fighting to PRUNE nodes while the edge nodes GRAFT. This will go on indefinitely.

While it's not the end of the world to send these extra messages 1/second (current heartbeat), it's definitely a waste of bandwidth for basically no reason. The "correct" answer is probably to have some tree forming algorithm as part of gossipsub. However, for now it'd be pretty straightforward to implement a backoff system where if A sends GRAFT to B and B sends PRUNE to A that A waits a while before trying to send GRAFT again.

Thoughts @raulk @Stebalien @vyzo ?

@raulk
Copy link
Member

raulk commented Oct 2, 2019

My off-the-cuff thought here is that it's difficult to create peer-to-peer gossip algorithms that operate optimally for all possible topologies.

In this particular case, the hub is deliberately acting like a broker, and therefore you may want to increase D for it (it should be configurable). Ideally you'd maintain D in lockstep with the peer count the hub observes, but I don't know if it's adjustable during runtime.

Essentially what you're getting by using gossipsub in this design is the topic-based messaging functionality. You're not interested in the routing, because that's always static.

@aschmahmann
Copy link
Contributor Author

it's difficult to create peer-to-peer gossip algorithms that operate optimally for all possible topologies.

Sure, although if you implemented a backoff so peers stopped GRAFTing peers that just PRUNEd them that would at least give some convergence.

I don't know if it's adjustable during runtime

In Go it's a global variable, so it's situational (e.g. I can't run two gossipsub topics, or even instances, with different D)

You're not interested in the routing, because that's always static.

I suspect many users will not know their exact network topology ahead of time and will rely on various libp2p Discovery mechanisms to find peers to connect to. The chaotic nature of just discovering peers instead of carefully GRAFTing them into an optimal topology can certainly result in this GRAFT-PRUNE loop emerging (especially when you take into account NAT'd nodes connecting to the rest of the network).

@vyzo
Copy link
Contributor

vyzo commented Nov 25, 2019

This is resolved by gossipsub v1.1, which is the first iteration moving to episub.
See libp2p/go-libp2p-pubsub#234

@aschmahmann
Copy link
Contributor Author

@vyzo this is fixed now, right?

@vyzo
Copy link
Contributor

vyzo commented May 7, 2020

Yes, if you enable peer exchange (WithPeerExchange(true)), then the mesh will converge.

@mxinden
Copy link
Member

mxinden commented Apr 7, 2021

Considering this fixed with libp2p/go-libp2p-pubsub#234 and gossipsub v1.1. Closing. Please comment in case I am missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants