Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] UDP Support #6481

Closed
Stebalien opened this issue Jul 2, 2019 · 22 comments
Closed

[META] UDP Support #6481

Stebalien opened this issue Jul 2, 2019 · 22 comments
Labels
topic/libp2p Topic libp2p topic/meta Topic meta

Comments

@Stebalien
Copy link
Member

Stebalien commented Jul 2, 2019

This issue tracks all things related to "UDP" support. Why UDP?

  1. Fewer file descriptors. A constant number instead of one per connection.
  2. Userspace congestion control.
  3. Fewer round-trips when initializing connections (no separate TCP handshake).
  4. Easier NAT hole punching.
  5. Message-based protocols.

The current status is:

  1. Full QUIC support, enabled by default.
  2. Broken UTP support (https://github.com/libp2p/go-utp-transport). We dropped support due to a misattributed bug. Patches welcome to bring this package back up to date.
  3. No message/packet transport support (unreliable or reliable). This requires some significant interface design work. See Messaging layer libp2p/specs#71.
  4. Hole punching in-progress: Project Flare(decentralised Hole Punching) Phase1 Meta Issue libp2p/go-libp2p#1039
@RubenKelevra
Copy link
Contributor

@Stebalien

I was wondering why running QUIC over UDP/IP is favorable compared to SCTP/IP which is has no additional layer between the protocol and IP.

Additionally SCTP can do Multihoming and
can bundle multiple IPs to one connection (multipath) out of the box. On QUIC on the other hand, this is still under development AFAIR.

Additionally UDP doesn't support ECN, which is a major drawback, since if there's a bandwidth excess, packets need to be dropped to notify the sender, instead of delivering them and set just a flag.

With SCTP there's also the advantage that the OS will handle the connection and all necessary retries on not received packages, while with QUIC that's fully on the application to handle this low level stuff, which adds some latency and necessary copies from user to kernel space, for the necessary signalling back to the sender, which is done in Kernel space with SCTP.

SCTP will also use heartbeats to confirm that each path is working and will switch without delay and transparent for the application from one primary path to another one, if a connection fails (like if a temporary IPv6 address changes).

Additionally SCTP is message oriented instead of stream oriented. Which would fit a p2p protocol much better, the order of messages inside a communication channel can be set to unordered, too. This would allow for ordered channels for e.g. signalling and an unordered channel for e.g. data transmission.

In QUIC you have to create a new stream for each data package or if you use one stream for data, it will stall when there's packet lossy and continue in a bursty way, when the data loss was recovered.

SCTP streams could also be individually tagged with DCSP flags when bundling is disabled. While this is probably not implemented and need to be written either as firewall rule or as an addition to the implementation of SCTP it might be worth looking into this.

Best regards

Ruben

@xaionaro
Copy link

xaionaro commented Feb 12, 2020

@RubenKelevra : IIRC, SCTP has a handshake, so it makes SCTP (still possible, but:) very unhandy/unreliable/difficult for hole punching.

@RubenKelevra
Copy link
Contributor

@xaionaro if you want to support legacy NAT which cannot handle firewall rules properly, we could fall back and encapsulate SCTP in UDP this will allow hole punching.

But since most internet connections already got IPv6 and UPnP, this shouldn't be an issue a widespread issue anymore. :)

@xaionaro
Copy link

xaionaro commented Feb 12, 2020

But since most internet connections already got IPv6 and UPnP

This is a strong assumption. For example I had neither of these in 3 countries I lived in the last 2 years :)

P.S.: It's funny, but I blindly assumed you are from Germany (because I heard there's a lot of IPv6 in Germany). So I looked into the profile and: "Deutschland, NRW" :). It's not an argument, of course. Just...)

@Stebalien
Copy link
Member Author

Really, libp2p can support arbitrary transports so having an SCTP transport in addition to QUIC would be awesome. However, QUIC has a bunch of advantages for us:

  • UDP is already a common protocol so middleboxes, firewalls, etc. are less likely to unilaterally block it.
  • QUIC is on track to become a web standard which, again, means it's more likely to work everywhere. This also means there will be (and has been) a lot off effort to make it fast.
  • QUIC gives 1 (or 0 for reconnects) RTT handshakes that completely bootstrap the connection, including a crypto handshake. That means we can go from no connection to fully secure, multiplexed connection in one round trip.
  • The QUIC working group is already looking into "unreliable" streams and multihoming. This will likely take a very long time but it's "coming".

SCTP has some downsides for us:

  • We'd need to spend an extra round trip to secure the connection after the 4-way SCTP handshake.
  • It's hard to reliably layer encryption/authentication on top of SCTP. Basically, we really shouldn't rely on all the nice SCTP features because they're unencrypted and unauthenticated. QUIC was designed from the ground up to put all the crypto on the outside.

Note: I haven't looked into SCTP extensively.

@RubenKelevra
Copy link
Contributor

I have used SCTP in a small project where status changes were announced in a large network in a hierarchical fashion and a redundant group at the top which was all interconnected. (closed source/unreleased software which I sadly cannot share).

I found it quite pleasing not to deal with streams but have individual channels for each purpose, which reduces the length of the message and you can just send messages from one peer to another peer and the receiver can process individual messages from many different clients one by one regardless in which order they are received. If a message is complete, you can grab it from the socket and put it in a queue for the worker jobs.

That felt much more intuitive than using a stream protocol (TCP) and have to parse which is now the start of the message, where does it end... does it really ends at this location or is the package malformed, etc.

There's also zero amount of 'stream reading' neccessary, you just get your message from the socket when it's done. To get a degree of QoS you can just read the most important channels first, until there are no new messages, and then going down to fetch a data package. This will lower the response time inside of IPFS if there's a large amount of traffic from many different connections.

To select which one of the connections you should read or write to, you could adopt something like a fair queueing (similar to what sch_cake is doing), which gives every node a fair amount of bandwidth. This avoids that many fast streams will increase the latency of other operations inside of IPFS since those get's pushed back in the queue for a while, other stuff gets processed with a similar amount of traffic until the next fast stream package comes in.

This is probably especially interesting for the relay function, where many different connections and many hosts battle about processing and bandwidth.

Running TLS over SCSP is possible, but I dislike the whole TLS idea.

It's a huge library and we only need a very small section of the features.

How about using SCSP with the same crypto Wireguard is using or curvetun:

Wireguard

WireGuard uses state-of-the-art cryptography, like the Noise protocol framework, Curve25519, ChaCha20, Poly1305, BLAKE2, SipHash24, HKDF, and secure trusted constructions. It makes conservative and reasonable choices and has been reviewed by cryptographers.

curvetun

For key management, public-key cryptography based on elliptic curves are used and packets are encrypted end-to-end by the symmetric stream cipher Salsa20 and authenticated by the MAC Poly1305, where keys have previously been computed with the ECDH key agreement protocol
Curve25519.

Cryptography is based on Daniel J. Bernstein's networking and cryptography library “NaCl”.

It's a very modern design by a reputable source, quite heavily proven to work and got a very small fast library NaCl with the implementation. If we depend on SSL for a SHA256 implementation, we probably want to switch to BLAKE2b which is stronger and much much faster.

Each message is signed and then encrypted before handling it down to SCTP, giving you the speed of a ChaCha20/Salsa20, which is basically near 10 GBit/s line speed and also doesn't degrade performance on mobile phones running on a fast 5G network.

Best Regards

Ruben

@Stebalien
Copy link
Member Author

I found it quite pleasing not to deal with streams but have individual channels for each purpose, which reduces the length of the message and you can just send messages from one peer to another peer and the receiver can process individual messages from many different clients one by one regardless in which order they are received. If a message is complete, you can grab it from the socket and put it in a queue for the worker jobs.

I agree. We do plan on making libp2p and most of our protocols message oriented instead of stream oriented.

Note: Message orientation versus stream orientation is independent of the underlying transport. Some work was started in libp2p/specs#225.

Basically, we can stream data over messages and send messages over streams. It all comes down to reliability requirements and interfaces.

How about using SCSP with the same crypto Wireguard is using or curvetun:

That would give us reliable messages over SCTP and we could build our own streams over that.

Note: any security transport over SCTP would still take at least three round trips to make a request in the optimal case.

RTT1: First half of the SCTP handshake.
RTT2: Second half of the SCTP handshake, smuggling the wireguard handshake into the packet (SCTP allows this).
RTT3: First request.


Really, it's not "which transport should we use", it's "which transport should we implement first". QUIC is still the obvious choice there but, after we ship that, there's no reason not to start adding support for additional transports (which might eventually become the default).

@RubenKelevra
Copy link
Contributor

Note: any security transport over SCTP would still take at least three round trips to make a request in the optimal case.

RTT1: First half of the SCTP handshake.
RTT2: Second half of the SCTP handshake, smuggling the wireguard handshake into the packet (SCTP allows this).
RTT3: First request.

That's true. But does this really matter in reality?

Currently we hold a listen on TCP for IPv4, one TCP for IPv6, one UDP for IPv4 and one UDP for IPv6. If we got a Modem with LTE connection we have additionally 4 more. And if the Wifi drops out, we would and reconnect to our way 600 nodes with 600 new connections, since the sockets for Wifi are now returning an error.

After some minutes the client is back in WiFi and we probably want to establish new connections over WiFi now, since the WiFi connection is configured as non-metered. So we have to kill all connections again and establish 600 new connections.

And in the case of QUIC all this low level management has to be done in the userspace.

When using SCTP we would transmit all available IPs to our neighbors nodes and if the primary connection dies SCTP would just transparently switch over to a new one, including keeping track of all dropped packages, which might need to be resend, while already transmitting new packages.

That's a hell lower overhead in the application instead of needing to ask over the new connection again for whole large blocks, because single chunks was missing and thus dropped by the protocol. Multiply this with 600 active connections which transmit some data, and you end up with a much longer recovery time than just one roundtrips to setup a connection.

Since SCTP bundles everything in one connection, we can also advise if to switch from one IP pair to a different one, as it's primary: If we went on a WiFi and after some seconds mDNS discoveres a node we're already talking to in IPFS, we can add the local IP to SCTP and switch the communication over to use this local channel. If we or the other node leaves the WiFi, the primary connection will transparently switch to use a different route.

I found it quite pleasing not to deal with streams but have individual channels for each purpose, which reduces the length of the message and you can just send messages from one peer to another peer and the receiver can process individual messages from many different clients one by one regardless in which order they are received. If a message is complete, you can grab it from the socket and put it in a queue for the worker jobs.

I agree. We do plan on making libp2p and most of our protocols message oriented instead of stream oriented.

Cool!

Note: Message orientation versus stream orientation is independent of the underlying transport. Some work was started in libp2p/specs#225.

Basically, we can stream data over messages and send messages over streams. It all comes down to reliability requirements and interfaces.
Really, it's not "which transport should we use", it's "which transport should we implement first". QUIC is still the obvious choice there but, after we ship that, there's no reason not to start adding support for additional transports (which might eventually become the default).

This sounds like some unnecessarily work for me. I would design the library with minimum complexity to do just abstracted messages which can be pushed and received into/from a protocol with different types and different priorities. Specific types can be marked as 'need to be in order' or not.

An SCTP connection adapter would just receive this abstracted messages and find the right connection, channel and build up a send queue or wake up a mutex every time there's a new message received.

On those protocol adapters the whole traffic management is done, like 'rate limit' and prioritization of channels and queues as well as fair queuing in both directions to get a fair distribution of processing time to each communication partner.

This way even with very complex protocols like SCTP a ratelimit for LTE could be done, while WLAN (to internet) and Local LAN are unaffected.

This also allows for priorities specific important data transmissions, for example a cluster can notify ipfs that a pinning operation is important, because the redundancy dropped below minimum and need to be restored.

In this case up to xx% of the data transmissions can be prioritized for this operation, filling the bandwidth with the rest of the operations.

Overall adding traffic optimizations inside IPFS like the ones used in sch_cake could help to reduce the response time for network operations on nodes which are under heavy load significantly, since much stuff can be shifted to background or bulk operations which can only do run if there's nothing more important going on.

How about using SCSP with the same crypto Wireguard is using or curvetun:

That would give us reliable messages over SCTP and we could build our own streams over that.

I don't think you want streams over SCTP, but you can chunk transfer a stream on one channel with SCTP.

Hope some of my brainstorming helped.

Best regards

Ruben

@Stebalien
Copy link
Member Author

That's true. But does this really matter in reality?

Yes. Every single modern transport (QUIC, HTTP2, TLS1.3, etc.) is built around reducing round trips.

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Feb 15, 2020

That's true. But does this really matter in reality?

Yes. Every single modern transport (QUIC, HTTP2, TLS1.3, etc.) is built around reducing round trips.

But those protocols are mainly made for requesting a website.

IPFS connections are long term data exchange connections.

I don't bother for example how long my notebook boots, since I reboot it like once a week. But I care about the latency when opening applications or how fast I can copy files.

@Stebalien
Copy link
Member Author

  1. The eventual goal is to replace the current web stack with IPFS and protocols built on-top-of IPFS. We still have quite a ways to go but we need to be able to have equivalent performance for that to be realistic.
  2. When you want to download content, there's a very good chance you won't be connected to the peers that have it. Actually finding the content usually involves querying multiple peers.

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Feb 21, 2020

True for this application. But remember most websites still use TCP plus a TLS handshake and a http/1.x connection work fine.

So doing a 4 way handshake and mitigating some DDoS scenarios by doing so doesn't sound THAT bad, especially if there are multiple provider for content and you could pick the most regional one.

Doing a 4 way handshake on a 10 ms connection is faster than doing the web equivalent on a transatlantic connection, since the webserver is there.

This comparison obviously doesn't hold up when comparing an IPFS over SCTP to a http/3 CDN which is conveniently placed nearby with an anycast-IP.

I think for this a very simple query protocol with asymmetrical encryption (without handshake, session key and Perfect Forward Secrecy), over simple UDP packages, like DNS, to pin down the right peer could replace the current approach. This would remove the overhead of QUIC and the complete TLS stack.

Since we close the socket for the UDP connection, the operation system can also clean up the connection state pretty quickly. Just a conntracking firewall might run into limits this way - but I think this wouldn't be different for QUIC too, since the connection tracking is probably quite a while away from being able to detect a closing QUIC connection (and this might then be limited to http/3 specific details).

@ROBERT-MCDOWELL
Copy link

any news on this topic?

@RubenKelevra
Copy link
Contributor

RubenKelevra commented Aug 17, 2021

any news on this topic?

UDP, Hole Punching and QUIC are features now and enabled by default. :)

@ROBERT-MCDOWELL
Copy link

@RubenKelevra
great! but how is it possible since all socket listening are based on TCP?
also QUIC is depracated, WebTransport going to replace it, does go-ipfs will keep QUIC?

@marten-seemann
Copy link
Member

also QUIC is depracated, WebTransport going to replace it, does go-ipfs will keep QUIC?

That's not true, nothing about QUIC deprecated.
In fact, WebTransport is a protocol built on top of QUIC.

@ROBERT-MCDOWELL
Copy link

@marten-seemann
yes indeed, but http/3 let the developer to use whatever protocol which can be interesting rather than to be stuck to one protocol.

@RubenKelevra
Copy link
Contributor

@RubenKelevra
great! but how is it possible since all socket listening are based on TCP?

Where do you got that info from? This documentation needs obviously an update since it is outdated.

also QUIC is depracated, WebTransport going to replace it, does go-ipfs will keep QUIC?

That's also not true.

@marten-seemann
Copy link
Member

yes indeed, but http/3 let the developer to use whatever protocol

That's not correct. HTTP/3 is tied to QUIC v1: https://datatracker.ietf.org/doc/html/draft-ietf-quic-http-34#section-3.2

@ROBERT-MCDOWELL
Copy link

in fact HTTP/3 uses some Quic definitions (among its own) but I'm not sure it's fullly Quic
https://w3c.github.io/webtransport/#biblio-web-transport-http3

@ROBERT-MCDOWELL
Copy link

@RubenKelevra
I don't see any ipfs settings relating UDP sockets, please correct me if I'm wrong.

@Jorropo
Copy link
Contributor

Jorropo commented Aug 16, 2023

We have very good QUIC support now including some holepunching,
The only missing thing is message based protocols, but with QUIC's 1 and 0 RTTs connections we currently see limited benefits and high cost to implementing them (message based protocol require duplicating part of the network stack and protocols with weaker garentees)

QUIC is the most used transport by Kubo nodes. 🎉

@Jorropo Jorropo closed this as completed Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/libp2p Topic libp2p topic/meta Topic meta
Projects
None yet
Development

No branches or pull requests

6 participants