Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsub: TestSimpleDiscovery is hanging with go-libp2p 0.36.1 #2910

Closed
Stebalien opened this issue Aug 6, 2024 · 4 comments
Closed

pubsub: TestSimpleDiscovery is hanging with go-libp2p 0.36.1 #2910

Stebalien opened this issue Aug 6, 2024 · 4 comments

Comments

@Stebalien
Copy link
Member

See libp2p/go-libp2p-pubsub#572. This can be readily reproduced locally with go test -run TestSimpleDiscovery (it takes a few tries, but it happens pretty regularly). On prior libp2p versions, this test passes in less than a second (very reliably).

I tried disabling all transports except the QUIC and/or TCP transport and still ran into the issue, so it's not the new webrtc transport. The tests also pass on 0.35.0.

@sukunrt
Copy link
Member

sukunrt commented Aug 7, 2024

It should work on QUIC or TCP only. This patch works for me.

diff --git a/pubsub_test.go b/pubsub_test.go
index 245a69d..d874e20 100644
--- a/pubsub_test.go
+++ b/pubsub_test.go
@@ -8,13 +8,16 @@ import (
 	"github.com/libp2p/go-libp2p"
 	"github.com/libp2p/go-libp2p/core/host"
 	"github.com/libp2p/go-libp2p/core/network"
+	libp2pquic "github.com/libp2p/go-libp2p/p2p/transport/quic"
 )
 
 func getDefaultHosts(t *testing.T, n int) []host.Host {
 	var out []host.Host
 
 	for i := 0; i < n; i++ {
-		h, err := libp2p.New(libp2p.ResourceManager(&network.NullResourceManager{}))
+		h, err := libp2p.New(libp2p.ResourceManager(&network.NullResourceManager{}),
+			libp2p.Transport(libp2pquic.NewTransport),
+			libp2p.ListenAddrStrings("/ip4/0.0.0.0/udp/0/quic-v1"))
 		if err != nil {
 			t.Fatal(err)
 		}

This is a webrtc problem. The connection first succeeds on listener and then on dialer. In our case, the listener thinks it has two connections and the dialer has just 1 connection.

  1. The listening side gets two connections(1 for 127.0.0.1 addr and 1 for 192.168.X.X addr)
  2. The dialer closes 1 of these connections
  3. There is no provision in webrtc of signaling to the peer that the connection has closed. So the listener still thinks it has two connections
  4. When we do host.NewStream, opening a stream with this now closed connection on the listener side fails. This causes an issue somewhere within gossip sub. I think it doesn't get the initial rpc to add the peer to the topic's peer list.

I assumed that there was no way of knowing when webrtc connection closes. But turns out there is. The SCTP association is closed(either via an SCTP Abort or a DTLS abort), but this signal is not propagated by pion yet. We will need to plumb this through pion.

For the time being, this PR does something equivalent: #2914

@Stebalien
Copy link
Member Author

Hm. I maybe the issue was that I didn't change the default listen addresses?

@sukunrt
Copy link
Member

sukunrt commented Aug 20, 2024

I think not explicitly mentioning the listen address was the problem.

@sukunrt
Copy link
Member

sukunrt commented Aug 20, 2024

Fixed by v0.36.2

@sukunrt sukunrt closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants