-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct file transfer is slow #6599
Labels
Comments
Stebalien
added
kind/bug
A bug in existing code (including security flaws)
topic/meta
Topic meta
labels
Aug 21, 2019
@Stebalien the link in there is for this issue, not the correct issue.
Looks like Github cut you off there 😦 |
(thanks) |
Just curious what is the situation for now? |
We have:
However, there's still a lot of room for improvement. |
I've updated the issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
This is a meta-issue explaining why direct file transfer may be slower than tools like SCP and the related issues.
This issue assumes the following setup:
Explicitly, this issue is not about finding peers that have content (#6383).
There may or may not be other connected nodes that have the content.
Background
IPFS chunks files/directories into smaller pieces and links them together into a merkledag (a generalized merkle-tree). We call these pieces "blocks".
To fetch a file or a directory tree, IPFS currently uses a protocol called bitswap to ask for each block in the file or directory tree. Specifically, IPFS (currently) uses bitswap to tell a subset of our connected peers when we "want" a block. The set of blocks we're looking for is called our "wantlist". If the peer has the block, they send it to us. At the moment, peers that don't have the block stay silent.
Dag Traversal
We have two issues with dag traversal that may limit throughput.
ipfs get
When calling
ipfs get
to fetch a file, go-ipfs will traverse the merkledag in-order (approximately). We do a bit of pre-fetching but our prefetching algorithm isn't optimal: ipfs/go-ipld-format#53ipfs pin
When calling
ipfs pin add
, traverse the graph in parallel with 32 worker goroutines, each fetching one block at a time. Unfortunately, this means we can only request 32 blocks at a time when pinning a file. Given 256KiB blocks, that's a maximum throughput of 8MiB/RTT where RTT is the latency between the two nodes. Assuming an RTT of ~100ms, that's a maximum throughput of 80MiB/s (or ~670mbps). In other words, it won't max out a gigabit connection.Unfortunately, this is also an optimistic upper-bound. If we're downloading a large directory tree of small files, these blocks will be much smaller (e.g., 1KiB). In that case, our throughput will be more like 2-3mbps.
Bitswap Issues
Bitswap works on a block-by-block basis. That is, a layer on-top-of bitswap will ask bitswap for a block, deserialize the block, determine if that block is some kind of intermediate block, and then ask bitswap for that block's children (if any).
Round Trips
One unfortunate side effect is that traversing a path (
/ipfs/QmFoobar/a/b/c/d
) requires one round trip per path component. The path resolution logic will ask bitswap for the block identified by the CID (content identifier)QmFoobar
, deserialize the block into a directory, lookup "a" within that directory, find the associated content identifier (let's sayQmA
), and then ask bitswap forQmA
. We then repeat this process for b, c, and d.A more efficient protocol would simply send the entire path to the remote peer and ask for the entire file all at once. We are currently working on such a protocol (https://github.com/ipfs/go-graphsync) but it may be a while until it's feature complete enough to be included in go-ipfs.
Duplicate Blocks
STATUS: We have now fixed the protocol limitations here. We can now ask if a peer has a block and ask peers to tell us immediately if they don't have a block. Now, when we try to download a block, we ask:
This means:
Previous Text:
When fetching blocks from more than one peer, bitswap will occasionally fetch the same block from multiple peers in case one doesn't respond. We do this because:Bitswap doesn't currently have any form of acknowledgement. That is, if peer A tells peer B that it "wants" a block, peer B will only respond (by sending the block) if and when it gets the block. It won't tell peer A that it doesn't have the block. This means that we can't currently tell the difference between a peer that's taking a while to respond and a peer that doesn't have the block and will never respond.When asking a peer for a block, we have no way to know if the peer actually has it. We can guess based on whether or not the peer had a related block but that peer may only have part of the directory tree/file that we're trying to download.We're currently considering augmenting the protocol with additional messages to solve both of these issues.Finally, the current logic that decides how many peer from which we should request a block is will over-optimize for asking multiple peers (ipfs/go-bitswap#120). If multiple peers have the block we're looking for, we'll have a 50% duplicate block overhead in the best case scenario.Wantlist Size Limitations
When asking a single peer for a set of blocks, we currently only ask for at most 32 at a time. We do this for several reasons:
We're considering adding a protocol extension (insert link here when we finish the writeup) to better determine the correct wantlist size per peer.
Worker Limitations
Bitswap currently sends out 6 blocks in parallel using 6 workers to avoid consuming too much memory.
We should probably bump this up, possibly scaling with available memory (ipfs/boxo#116).
Libp2p
Another issue is overhead in libp2p itself.
Small Packets
At the moment, go-libp2p doesn't do a good job of combining small writes into larger TCP/IP packets. This can significantly limit the available network throughput.
We have fixes for this issue but had to temporarily revert them as they exposed a bug in secio (libp2p/go-libp2p#644) that would have lead to network interoperability issues. Once the network has upgraded, we'll turn this feature back on.
STATUS: We've tried to revive these fixes but ran into other performance issues in go's scheduler.
Slow Stream Multiplexers
The default go-libp2p stream multiplexer, yamux, is a bit slow. This comes from the fact that it has a fixed send window per stream so bandwidth is pretty limited on a per-stream basis.
However, due to how bitswap works, I'm doubtful that this is an issue:
This means the fixed window shouldn't matter when sending blocks.
It's also unlikely that this preventing us from sending wantlist entries fast enough. On connections with a 100ms latency, we should be able to request ~1,677,720 blocks per second. At 1KiB per block (low bound), that's ~1GiB per second (8x a gigabit link).
UPDATE 1: Yamux is an issue, at least on a 10gbps link.
UPDATE 2:
The text was updated successfully, but these errors were encountered: