Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Simultaneous open #21

Closed
Stebalien opened this issue Jan 23, 2018 · 16 comments
Closed

Simultaneous open #21

Stebalien opened this issue Jan 23, 2018 · 16 comments

Comments

@Stebalien
Copy link
Member

We need a way to deal with and detect simultaneous open. When two computers open a TCP connection with the same 5 tuple at the same time, they both get the same connection and each ends up thinking that it's the initiator. This breaks multistream-select, stream muxers, tls (eventually), etc. Worse, it causes the current go-multistream protocol negotiation to hang (because we expect the server to send us the /multistream/1.0.0 header first and both sides think they're the client).

Multistream solution

To fix this in multistream, clients could send (without waiting for the server as they currently do):

/multistream/1.0.0
iamclient
/myprotocol/1.0.0

If the server sends back:

/multistream/1.0.0
na
/myprotocol/1.0.0

We know that we're actually the client (keep the connection).

If, instead, they send back anything other than na, we know that we both think we're the client and we kill the connection.

Unfortunately, this means we'd need to build this into multistream select but, really, we only need it for TCP.

TCP solution

We could fix this by having each side that thinks its the client send an out-of-band message (yes, these exist) on the TCP channel stating "I am the client". If we receive an "I am the client" message while we think we're the client, we kill the connection. That will allow us to avoid waiting for the connection itself to timeout.

Unfortunately, this is shitty.

@Stebalien
Copy link
Member Author

Better TCP solution

Use a poison byte.

That is, when connecting as a client:

  1. Set OOB_INLINE.
  2. Send an OOB_BYTE 0.
  3. Start using the connection.

If both sides do the same thing (both clients), they'll both receive a zero byte and their initial handshake will be poisoned.

@vyzo
Copy link

vyzo commented Jan 23, 2018

If, instead, they send back anything other than na, we know that we both think we're the client and we kill the connection.

Why kill the connection? It's a perfectly valid established connection (that has potentially traversed a NAT), and we should try to keep it.

We could see the iamclient instead of na and proceed to a code path that handles it accordingly.

@hsanjuan
Copy link
Contributor

As a user which sometimes hits this, ideally the issue would be automatically dealt with by libp2p so that in the end I have a usable connection, rather than a failed one (failing right away is only marginally better than timing out).

@vyzo
Copy link

vyzo commented Jan 23, 2018

I think we should rework the multistream handler to handle simultaneous open.

A simple way is to have one of the two peers act as the initiator deterministically (say by lexicographic comparison of peer ids).
So when we detect we are opening simultaneously because we got iamaclient on both sides, the handler can decide which one is the initiator and proceed to establish the connection.

@Stebalien
Copy link
Member Author

Stebalien commented Jan 23, 2018

Why kill the connection? It's a perfectly valid established connection (that has potentially traversed a NAT), and we should try to keep it.

I'd love too but I don't want to introduce any too-horrible hacks to get there. Also, this case is absurdly rare (and is only causing problems because we have a test for it).

A simple way is to have one of the two peers act as the initiator deterministically (say by lexicographic comparison of peer ids).

First of all, famous last words 😆. There is nothing simple about this issue.

I considered doing that but this happens before secio so we don't necessarily know their peer ID. However, we do know what it should be so we could do one best-effort reconnect, I guess.

(I'm also not a fan of adding an extra message to every multistream negotiation for the absurdly rare cases where we have a simultaneous connect.)

To get on the same page, I currently have the following pipelines:

-------------------------------------TCP Transport Accept------------------------------------
 ⤷------------------------------------UpgradeInbound--------------------------------------⤴
   ⤷--------------SecurityInbound------------ 🡒 ------------MuxerInbound----------------⤴
     ⤷-----MultistreamSecurityInbound------⤴    ⤷-----MultistreamMuxerInbound---------⤴
       ⤷MultistreamInbound 🡒 {TLS, SECIO}⤴       ⤷MultistreamInbound 🡒 {TLS, SECIO}⤴


--------------------------------------TCP Transport Dial----------------------------------------
 ⤷-------------------------------------UpgradeOutbound---------------------------------------⤴
   ⤷--------------SecurityOutbound------------ 🡒 ------------MuxerOutbound-----------------⤴
     ⤷-----MultistreamSecurityOutbound------⤴    ⤷-----MultistreamMuxerOutbound----------⤴
       ⤷MultistreamOutbound 🡒 {TLS, SECIO}⤴       ⤷MultistreamOutbound 🡒 {Yamu, SECIO}⤴

This fix would require the dial pipeline switch to an "inbound" negotiation in the middle of the security multistream negotiation. There are a few ways to do this:

  1. Have the multistream handshake bail with an error telling us to retry with an inbound handshake. This will suck because we'll be in the middle of the handshake. We could make it work but we'd have to read not only the "iamclient" message from the other end but then we'd have to read and throw away the next message (because, to avoid a round trip, the other side will have sent us its intended security protocol immediately). Even if we manage this, we'd have to bubble this information all the way back up to the UpgradeOutbound pipeline so it can start over.
  2. Have the multistream handshake complete but tell us that it switched to a server-side negotiation. Unfortunately, multistream knows nothing about the peer ID so it has no good way to figure out which side should stay as the initiator. I really don't want to have to tell it the peer ID. Worse, we'd now end up returning an "inbound" connection to the UpgradeOutbound pipeline and we'll have the same problem when we get to the muxer stage. However, unlike the security transport, the muxer transport doesn't know the peer ID (and has no business knowing it).

An alternative to the "iamclient" solution is to use the poison byte solution but, instead of treating it as poison, treat it as a multistream "oops" message. This would allow us to completely bail out of the first multistream handshake before it starts (making it easier to bail all the way to the UpgradeOutbound pipeline) and would avoid sending an additional message with every multistream handshake.

However, this would still require weird custom logic...

As a user which sometimes hits this, ideally the issue would be automatically dealt with by libp2p so that in the end I have a usable connection, rather than a failed one (failing right away is only marginally better than timing out).

Have you ever hit this outside of a test? If we do error out, we should do so with an error indicating that we should retry however, renegotiating a TCP connection isn't really all that expensive (given how unlikely this case is, at least).

@vyzo
Copy link

vyzo commented Jan 23, 2018

An alternative to the "iamclient" solution is to use the poison byte solution but, instead of treating it as poison, treat it as a multistream "oops" message. This would allow us to completely bail out of the first multistream handshake before it starts (making it easier to bail all the way to the UpgradeOutbound pipeline) and would avoid sending an additional message with every multistream handshake.

that might actually work best if it's too complicated to handle iamaclient.

@hsanjuan
Copy link
Contributor

Also, this case is absurdly rare (and is only causing problems because we have a test for it).

But mind that If I have a cluster, and I happen to restart all libp2p hosts at the same time, there are increased chances that this happens. Restarting all peers at once is a pretty common operation when maintaining a cluster of things (i.e. after an upgrade). So while rare, it can happen during "normal operations" and cause impact.

@Stebalien
Copy link
Member Author

@hsanjuan your concerns will be fixed by QUIC (which doesn't have this problem).

So... I've decided to punt on this for now (too many things in flight). It came up because some of my changes cause go-libp2p-swarm to more reliably fail the simultaneous connect test cases but the problem isn't actually new (it just looks like the timing has changed).

@hsanjuan
Copy link
Contributor

@Stebalien SGTM

Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this issue Feb 17, 2018
We don't actually handle simultaneous connects correctly (TCP transport issue).
See libp2p/go-tcp-transport#21
Stebalien added a commit to libp2p/go-libp2p-kad-dht that referenced this issue Mar 13, 2018
We don't actually handle simultaneous connects correctly (TCP transport issue).
See libp2p/go-tcp-transport#21
@Stebalien
Copy link
Member Author

@raulk said (libp2p/go-libp2p-swarm#79 (comment)):

Been reading through the issues referenced here; quite an elusive scenario. IIUC, both sides of the connection end up thinking they're the dialer, and carrying out the handshake ceremonies (security, muxing) under the assumption they're the initiator/client.

Couple of questions here:

  • is there some trick we can play with syscalls after we get the TCP socket to check if it was indeed opened as a result of our connect, or if it pre-dated our request? e.g. some kind of flag, or even checking timestamps, or sequences, or something?
  • how do other systems solve this issue? I would imagine it being pretty common across TCP applications.
  • can we add a phase in the upgrader that does not incur in an extra exchange of packets (e.g. like the "iamclient" solution described in Simultaneous open #21 would do), but instead piggybacks on the next message that would be sent anyway as part of the handshake? Instead of explicitly setting the direction of the connection, we would rely on that mechanism to report back to us.

For example, if this is the first message in the security sequence: "HANDSHAKE_INIT", we could prefix it with a 0x00 or a 0xFF reporting what the sender thinks they are, e.g. 0x00 initiator, 0xFF receiver, e.g. 0x00HANDSHAKE_INIT. This mechanism would act as a proxy in the Read(), stripping off the extra byte, and informing the next layer (security, muxer) to change their initial assumption, or handing off the message as-is if the assumption stood.

If both parties believe they're the initiator (0x00) they would fall back to a transport-specific heuristic to resolve the conflict, like the highest [IP address, port] tuple becoming the initiator if we're using TCP/IP (where the problem is seen).

Just some brainstorming, I hope I'm not adding noise.

@Stebalien
Copy link
Member Author

is there some trick we can play with syscalls after we get the TCP socket to check if it was indeed opened as a result of our connect, or if it pre-dated our request? e.g. some kind of flag, or even checking timestamps, or sequences, or something?

I've looked into this (quite extensively, reading quite a bit of kernel code...) and couldn't find a single even semi-reliable solution.

how do other systems solve this issue? I would imagine it being pretty common across TCP applications.

  1. Most TCP applications use the client-server model. The client always establishes the connection to the server so they never get a collision.
  2. This collision only happens when the client's source port matches the server's dest port. Unfortunately, in libp2p, we use something called reuseport to always reuse (when possible) the port we're listening on as our source port. We do this to help with NAT traversal (some NATs will consistently map the same internal port to the same external port).

can we add a phase in the upgrader that does not incur in an extra exchange of packets (e.g. like the "iamclient" solution described in #21 would do), but instead piggybacks on the next message that would be sent anyway as part of the handshake? Instead of explicitly setting the direction of the connection, we would rely on that mechanism to report back to us.

Maybe? Actually, I'm pretty sure the iamclient solution wouldn't add a round trip.

For example, if this is the first message in the security sequence: "HANDSHAKE_INIT", we could prefix it with a 0x00 or a 0xFF reporting what the sender thinks they are, e.g. 0x00 initiator, 0xFF receiver, e.g. 0x00HANDSHAKE_INIT.

Unfortunately, that solution would break existing systems. That's why why this is so tricky. The "poison byte" solution above tries to do something like this without breaking anything but, well, it's very hacky.

@bigs
Copy link

bigs commented Sep 19, 2018

some background for those of us (like me) who weren't familiar with this corner of the TCP spec

@raulk
Copy link
Member

raulk commented Sep 19, 2018

@Stebalien I researched SO_OOBINLINE and now I understand it! It's essentially what I was suggesting, but done "right" (or as right as this could be...).

We'd have to pair it with some kind of random backoff to reduce the risk of a simultaneous open happening again on the retry.

Overall, this dance would increase the latency to establish the connection, but given how infrequent this situation is, I'd argue it's a fair tradeoff.

Looking at the link that @bigs posted, it is evident that TCP supports peer-to-peer communications. It's the upper level protocols (security, multiplexing, etc.) that define roles like initiator/responder, client/server, with specifically ordered handshakes, that become the partypoopers.

@Stebalien
Copy link
Member Author

Yep. Note: We could try to support this in yamux but, when we switch to TLS, that work will go out the window.

@bigs
Copy link

bigs commented Sep 24, 2018 via email

@marten-seemann
Copy link
Contributor

Closing this issue, as

  1. we now have the simopen protocol: https://github.com/libp2p/specs/blob/master/connections/simopen.md
  2. DCUtR coordinates the roles when using hole punching: https://github.com/libp2p/specs/blob/master/relay/DCUtR.md
  3. Protocol Select uses a backoff logic for uncoordinated simultaneous opens: protocol-select/: Add Protocol Select specification specs#349

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants