-
Notifications
You must be signed in to change notification settings - Fork 28
Simultaneous open #21
Comments
Better TCP solutionUse a poison byte. That is, when connecting as a client:
If both sides do the same thing (both clients), they'll both receive a zero byte and their initial handshake will be poisoned. |
Why kill the connection? It's a perfectly valid established connection (that has potentially traversed a NAT), and we should try to keep it. We could see the |
As a user which sometimes hits this, ideally the issue would be automatically dealt with by libp2p so that in the end I have a usable connection, rather than a failed one (failing right away is only marginally better than timing out). |
I think we should rework the multistream handler to handle simultaneous open. A simple way is to have one of the two peers act as the initiator deterministically (say by lexicographic comparison of peer ids). |
I'd love too but I don't want to introduce any too-horrible hacks to get there. Also, this case is absurdly rare (and is only causing problems because we have a test for it).
First of all, famous last words 😆. There is nothing simple about this issue. I considered doing that but this happens before secio so we don't necessarily know their peer ID. However, we do know what it should be so we could do one best-effort reconnect, I guess. (I'm also not a fan of adding an extra message to every multistream negotiation for the absurdly rare cases where we have a simultaneous connect.) To get on the same page, I currently have the following pipelines:
This fix would require the dial pipeline switch to an "inbound" negotiation in the middle of the security multistream negotiation. There are a few ways to do this:
An alternative to the "iamclient" solution is to use the poison byte solution but, instead of treating it as poison, treat it as a multistream "oops" message. This would allow us to completely bail out of the first multistream handshake before it starts (making it easier to bail all the way to the UpgradeOutbound pipeline) and would avoid sending an additional message with every multistream handshake. However, this would still require weird custom logic...
Have you ever hit this outside of a test? If we do error out, we should do so with an error indicating that we should retry however, renegotiating a TCP connection isn't really all that expensive (given how unlikely this case is, at least). |
that might actually work best if it's too complicated to handle |
But mind that If I have a cluster, and I happen to restart all libp2p hosts at the same time, there are increased chances that this happens. Restarting all peers at once is a pretty common operation when maintaining a cluster of things (i.e. after an upgrade). So while rare, it can happen during "normal operations" and cause impact. |
@hsanjuan your concerns will be fixed by QUIC (which doesn't have this problem). So... I've decided to punt on this for now (too many things in flight). It came up because some of my changes cause go-libp2p-swarm to more reliably fail the simultaneous connect test cases but the problem isn't actually new (it just looks like the timing has changed). |
@Stebalien SGTM |
We don't actually handle simultaneous connects correctly (TCP transport issue). See libp2p/go-tcp-transport#21
We don't actually handle simultaneous connects correctly (TCP transport issue). See libp2p/go-tcp-transport#21
@raulk said (libp2p/go-libp2p-swarm#79 (comment)):
|
I've looked into this (quite extensively, reading quite a bit of kernel code...) and couldn't find a single even semi-reliable solution.
Maybe? Actually, I'm pretty sure the iamclient solution wouldn't add a round trip.
Unfortunately, that solution would break existing systems. That's why why this is so tricky. The "poison byte" solution above tries to do something like this without breaking anything but, well, it's very hacky. |
some background for those of us (like me) who weren't familiar with this corner of the TCP spec |
@Stebalien I researched We'd have to pair it with some kind of random backoff to reduce the risk of a simultaneous open happening again on the retry. Overall, this dance would increase the latency to establish the connection, but given how infrequent this situation is, I'd argue it's a fair tradeoff. Looking at the link that @bigs posted, it is evident that TCP supports peer-to-peer communications. It's the upper level protocols (security, multiplexing, etc.) that define roles like initiator/responder, client/server, with specifically ordered handshakes, that become the partypoopers. |
Yep. Note: We could try to support this in yamux but, when we switch to TLS, that work will go out the window. |
yeah, i've just cooked up an idea i think would work for yamux, need to dig
into TLS more to get a better understanding here.
…On Wed, Sep 19, 2018 at 2:36 PM Steven Allen ***@***.***> wrote:
Yep. Note: We *could* try to support this in yamux but, when we switch to
TLS, that work will go out the window.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANBWi4GjaJ2uMUiOAohINuIRpnTnFq8ks5uco6vgaJpZM4Ro3Hs>
.
|
Closing this issue, as
|
We need a way to deal with and detect simultaneous open. When two computers open a TCP connection with the same 5 tuple at the same time, they both get the same connection and each ends up thinking that it's the initiator. This breaks multistream-select, stream muxers, tls (eventually), etc. Worse, it causes the current go-multistream protocol negotiation to hang (because we expect the server to send us the
/multistream/1.0.0
header first and both sides think they're the client).Multistream solution
To fix this in multistream, clients could send (without waiting for the server as they currently do):
If the server sends back:
We know that we're actually the client (keep the connection).
If, instead, they send back anything other than
na
, we know that we both think we're the client and we kill the connection.Unfortunately, this means we'd need to build this into multistream select but, really, we only need it for TCP.
TCP solution
We could fix this by having each side that thinks its the client send an out-of-band message (yes, these exist) on the TCP channel stating "I am the client". If we receive an "I am the client" message while we think we're the client, we kill the connection. That will allow us to avoid waiting for the connection itself to timeout.
Unfortunately, this is shitty.
The text was updated successfully, but these errors were encountered: