Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic: make the reuseport feature optional #1428

Closed
gfanton opened this issue Apr 14, 2022 · 3 comments · Fixed by #1476
Closed

quic: make the reuseport feature optional #1428

gfanton opened this issue Apr 14, 2022 · 3 comments · Fixed by #1476
Labels
effort/hours Estimated to take one or several hours exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors help wanted Seeking public contribution on this issue P3 Low: Not priority right now

Comments

@gfanton
Copy link
Contributor

gfanton commented Apr 14, 2022

Hi from @berty,
We recently discovered that if we enable the reuseport feature on the libp2p-tcp-transport while using a cellular connection, it could lead to network errors with some mobile operators. This is the case with the French operator Free Mobile which is massively used in France and we suspect that the problem occurs with several other operators around the world.

It seems that the QUIC transport has exactly the same issue but does not have an option to disable the reuseport. At the moment, this prevents us from using the QUIC transport on cellular connection even though it is particularly suitable for mobile use.

It would be nice to make the feature optional like in the libp2p-tcp-transport. I checked if it was possible to simply add an option to be able to toggle it off, but the feature seems too tightly bonded to the rest of the code.

@gfanton gfanton changed the title Disable reuseport option Make the reuseport feature optional Apr 14, 2022
@marten-seemann
Copy link
Contributor

That would imply that they've built special logic to identify QUIC connections. QUIC connections rotate their Connection IDs on a regular basis to make exactly that more difficult.
What's the error you're seeing? Do you have any qlog traces?

@gfanton
Copy link
Contributor Author

gfanton commented Apr 15, 2022

The main reason it took us a while to figure out the cause is that no explicit error is raised since the error actually occurs on the operator's side. The only thing we can notice as a user of their network is that a libp2p node with go-libp2p-tcp-transport and reuseport feature enabled, once it has opened several connections, will starts hanging on endlessly without being able to read on their connections.

The same behavior seems to occurs with go-libp2p-quic-transport, so I don't think it's a QUIC related problem, but rather something on the operator side: their network seems unable for some reason to communicate with different peers using the same port.

To demonstrate this, I wrote a simple golang script that starts several hosts on a remote server and a single host on the local machine. The latter then starts pinging each remote host in a loop and if a ping takes more than 10 seconds to complete, the connection is considered dead. The test succeeds if after a given period (e.g. 1 minute) all connections are still active.

When the test is run on a Free Mobile cellular connection using tcp-transport with reuseport enabled or quic-transport, all connections will eventually enter a dead state. And the more connections that use the same port, the faster they die. However, the test always succeeds in all other cases:

  • On a Free Mobile cellular connection using a tcp-transport with reuseport disabled.
  • On a Free Mobile cellular connection using a tcp-transport with reuseport enabled or a quic-transport BUT using a VPN, so only one connection from the point of view of the Free Mobile network.
  • On any other connection tested so far (cellular or not) using quic-transport or tcp-transport whether the reuseport option is enabled or not.

You can find the script here: https://github.com/gfanton/libp2p-reuseport-test
I wrote another golang script that only use the TCP standard library (without any libp2p import) which produces exactly the same behavior as described above: https://gist.github.com/gfanton/83ea66cb1dfaa7c5a6c623c288d52fa4

I tried to find relevant information in the qlog traces using QLOGDIR without success since it will only hang on read. If you need to do further tests, we can provide you with an ssh endpoint on a machine connected to the internet via a Free Mobile cellular connection.

@marten-seemann
Copy link
Contributor

Interesting. And an impressive amount of research to locate the problem. Have you ever tried getting in touch with Free Mobile? This very much sounds like a problem on their side, and they might be able to fix it for good.

Regarding disabling reuseport in this repo, I think that shouldn't be too hard. It would probably (haven't tried it yet myself though) be enough to just use quic.Dial here: https://github.com/libp2p/go-libp2p-quic-transport/blob/36ab344ec0a4b2cb17b2a0bbd7904b357aaa0bc6/transport.go#L91-L97. Contributions welcome!

@marten-seemann marten-seemann transferred this issue from libp2p/go-libp2p-quic-transport Apr 22, 2022
@marten-seemann marten-seemann changed the title Make the reuseport feature optional quic: make the reuseport feature optional Apr 22, 2022
@marten-seemann marten-seemann added help wanted Seeking public contribution on this issue good first issue Good issue for new contributors exp/beginner Can be confidently tackled by newcomers effort/hours Estimated to take one or several hours labels Apr 22, 2022
@BigLep BigLep added the P3 Low: Not priority right now label Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/beginner Can be confidently tackled by newcomers good first issue Good issue for new contributors help wanted Seeking public contribution on this issue P3 Low: Not priority right now
Projects
None yet
3 participants