Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow transfer over LAN #5037

Closed
piedar opened this issue May 26, 2018 · 21 comments
Closed

Slow transfer over LAN #5037

piedar opened this issue May 26, 2018 · 21 comments
Assignees
Labels
topic/perf Performance

Comments

@piedar
Copy link

piedar commented May 26, 2018

It takes over 1 minute to transfer 21 MB between two machines on a local network.

For most of this test, the receiver's ipfs daemon uses 150% CPU - suggesting it bottlenecks the old dual-core hardware. However, ipfs add hashes new 30 MB files in under 10 seconds. I don't yet understand why the network transfer adds so much overhead.

Workaround: run the slow node with ipfs daemon --routing=none.

Sender

# ipfs version 0.4.15
ipfs daemon &
ipfs add vlc-2.2.8.tar.xz

Receiver

# ipfs version 0.4.15
ipfs daemon &
ipfs repo gc
time ipfs cat QmaSSwsS2nAjExnxrqwKtmK5rLLhmqpju1HCsnPSigtHmV > /dev/null

After a few seconds, the transfer starts but it stutters at 1.38 MB, 3.75 MB, 7.50 MB, etc.

real    1m17.238s
user    0m0.652s
sys     0m1.255s

Without ipfs repo gc it takes about 5 seconds. As iperf shows, the network connection is fine.

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   112 MBytes  93.8 Mbits/sec

X-Post discuss.ipfs.io

@bonedaddy
Copy link
Contributor

bonedaddy commented May 27, 2018

are you sure that you're requesting it off the LAN and not the internet? there's a chance your machine could be fetching it off the internet? why are you running the garbage collect before attempting to download the file?

@piedar
Copy link
Author

piedar commented May 27, 2018

It could be fetching off the internet, but I would still expect it to be fast since ipfs swarm peers finds the LAN node. I'm only running ipfs repo gc for the benefit of the performance test.

My hunch right now is it's CPU-bound due to older hardware, with ipfs daemon at about 150%. Could the hash verification be the bottleneck? But ipfs add is faster and it calculates hashes too...

[edit: Report of 5000% CPU usage might be just a display bug with htop over ssh.]

@bonedaddy
Copy link
Contributor

Ah yea if swarm can find the peer than it should be connecting to the peer.
Holy crap! That is insane CPU usage. Given the symptoms you mention, stuttering at specific points, along with that intense CPU utilization, I think your hunch is right.

@whyrusleeping
Copy link
Member

@piedar hrm... could you see how fast an ipfs pin add takes to do the same transfer? It uses a different codepath under the hood to do the transfer, a more optimized graph traversal.

Another thing to try that would help us debug is to run the daemon fetching the file with --routing=none and ensuring the two nodes get connected before starting the transfer. If this noticeably improves the performance then it's likely the DHT is interfering (your node sends out notifications that you are now hosting this new content as you receive it, and we've been seeing issues around that lately)

@whyrusleeping whyrusleeping added the topic/perf Performance label May 31, 2018
@piedar
Copy link
Author

piedar commented Jun 4, 2018

Yes @whyrusleeping, it's certainly faster without the routing!

ipfs daemon --routing=none &
ipfs repo gc
time ipfs cat QmaSSwsS2nAjExnxrqwKtmK5rLLhmqpju1HCsnPSigtHmV > /dev/null
real    0m8.123s
user    0m0.366s
sys     0m0.509s

And --routing=dhtclient is in the middle, clocking in around 30 seconds.

@whyrusleeping
Copy link
Member

That very interesting... It would appear then that getting this PR: #4333 merged should help transfer speeds overall.

@whyrusleeping
Copy link
Member

(well, that PR and its followups)

@piedar
Copy link
Author

piedar commented Jun 9, 2018

Could DHT announce be adjusted to run in the background, only when there is no active transfer in progress? Though maybe that's too much complexity if the root can be solved by speeding up the operation in general.

Anyway, I'll run this test every couple versions and report back if the results change significantly.

@whyrusleeping
Copy link
Member

@piedar running the DHT announce in the background is pretty much what we want to do, the main sticking point for why that hasnt happened yet is that, technically, thats whats happening right now. The reason its slowing things down is that there is backpressure from the DHT provide process slowing down anything that sends hashes to it. Since bitswap fetches each block of a graph independently, it sends one provide call per hash (which can be millions of calls). The change we need to make is to make the DHT providing process a bit smarter, so we can tell it 'here are the objects/pins we care about, make sure the world knows' and it can enumerate hashes on demand (and entirely separate from the process of us receiving them).

@whyrusleeping
Copy link
Member

Anyway, I'll run this test every couple versions and report back if the results change significantly.

:) Please do! This is so helpful to us

@dilshat
Copy link

dilshat commented Jun 19, 2018

Yes this would be a tremendous improvement. Please include this fix into an upcoming release

@etursunbaev
Copy link

Me too faced such issue. Seems it does not download from local peers.

@MirceaKitsune
Copy link

I could confirm this issue today with go-ipfs 0.4.15 under Linux openSUSE x64.

The setup: I have two computers connected to the same home router, mine in one room and my mother's in the hallway. The daemon on mine contains a group of large video files (for DTube) which are pinned, I'd estimate 10GB in total (the size of my ~./ipfs directory). I ran the bash script I created to pin this list of videos on my mother's computer, with the daemon on mine also running since I thought that would cause the files to be served more quickly.

The result: Despite being directly connected to a 10 MB/s or 100 MB/s cable, the pinning process hasn't finished after over 6 hours. Judging by my network traffic monitor, my computer appeared to only serve content periodically: For roughly 5 seconds, I'd see it sending data at over 1 MB/s... after that the transfer rate would drop to roughly 300 KB/s or less and stay there. I know the two daemons were exchanging data over LAN because one of them was posting generic errors and they were all about an IP of the format 192.168.0.1 (the local IP's assigned to our machines by the router).

I immediately found that surprising but thought I must be missing something else. I asked on the IRC channel and someone pointed me to this bug. I figured sharing this experience might help.

@Stebalien
Copy link
Member

So, make sure you're not confusing megabits and megabytes. Those cables are probably 10Mbps and 100Mpbs, 8x slower than 10MB/s and/or 100MB/s.

For comparison, I'd try connecting the two machines with netcat and piping data directly over that connection. To measure the actual transfer speed.

However, that still looks wrong.

  1. Is either machine using a hard disk (not an SSD)?
  2. When you run the test, how does IPFS's CPU usage look? Is it pegged at 100%?
  3. Try running ipfs bitswap wantlist on the machine downloading and report the results.
  4. Try running ipfs bitswap stat ...
  5. Post the error messages you're seeing concerning 192.168.0.1.

@MirceaKitsune
Copy link

I was referring to Megabytes of course. Both machines have classic hard drives, no SSD yet.

Resource usage: On my mother's old computer (slow single core CPU), the IPFS process kept using roughly 40% CPU. Memory wise it was over 350 MB.

I used Netcat in an unrelated test weeks ago, kinda forgot how to use it since but I could look into it again. Those ipfs stat commands seem like a better test, I might look into them first.

@MirceaKitsune
Copy link

This is a screenshot from KSysGuard on my mother's computer showing the network transfer rate. This is all go-ipfs, no other process should have been sending or receiving any significant amount of data.

Sharing it here because the transfer rate is abnormally erratic: One moment it's receiving at over 1 MB/s, the other at 100 KB/s. I see no explanation as to why it wouldn't be at +1 MB/s all the time.

screenshot_20180629_145000

@Stebalien
Copy link
Member

Hm. Yeah, that doesn't look right at all. I'd expect it to be a bit erratic (known issues) but not that slow.

@piedar
Copy link
Author

piedar commented Jun 30, 2018

@MirceaKitsune Have you tried running the receiver with ipfs daemon --routing=none? It drastically improved the speed for me. Also, check the hard drive activity light! The IO hurts even my old SSD, so I imagine an HDD could be brutally slow.

@ItalyPaleAle
Copy link

I have the same issue. Freshly installed go-ipfs on macOS 10.13 (High Sierra). I tried it by requesting just a few pages (less than 10). 15 minutes with the daemon running, it was using 168% CPU and significant energy.

@djdv
Copy link
Contributor

djdv commented May 14, 2019

@hannahhoward
I believe your work on bitswap, graphsync, etc. is likely relevant here.
If not feel free to un/re-assign.

@Stebalien
Copy link
Member

This issue is pre-bitswap refactor. Three are still known issues but nothing here will likely be relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/perf Performance
Projects
None yet
Development

No branches or pull requests

10 participants