-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU Utilization Issues #5613
Comments
Is IPFS jumping or the web browser? The fact that the webui has an extremely inefficient peers view is, unfortunately, known and fortunately, fixed, and, unfortunately, not yet merged/released. I've just submitted a PR to use the in-development version (#5614). |
So, it looks like this is due to the geoip database. For each peer we're connected to, lookup that peer in an IPLD-based geoip database. Unfortunately, each of these requests are independent and are likely forcing us to independently try to find providers for each object due to ipfs/go-bitswap#16 which is, in turn, connecting us to more peers which is, in turn causing us to download more of this graph, rinse-repeat. I'll see if I can delay finding providers. We shouldn't need that as we're already likely to be connected to peers with the blocks we need. |
So, this is even worse than I thought. I have the first half of a fix in ipfs/go-bitswap#17. However, not only are we fetching each of these blocks, we're also announcing that we have them. This announcement also forces us to connect to a bunch of peers, leading to this run-away CPU issue. |
@olizilla The we now know the reason behind my little trick to use the WebUI to boost my peer count 😛 @Stebalien I am in no way suggesting that you don't fix this but I wouldn't mind another way of forcing my peer count up? |
Personally, I just use
Yeah... that's because it's downloading a bunch of tiny blocks and wasting a bunch of traffic just telling the network about them/finding them. @keks will be working on this issue this quarter. |
I just chime in with a rather useless "me too" comment. That was a few years ago. So a few days ago i tried it out again. And immediately i notice it again. High CPU usage. That is just with the stock go-ipfs package as it comes on archlinux. Nothing special turned on, not a thousand users on it.. Nope, just plain and simple sitting idle, eating away my CPU. In my mind this project has the potential to really be a difference to how we use the internet. Really make it much more efficient. As in the more users join, the more stable the whole network becomes; you do not need a monstrous server setup for a highly popular website, the internet as a whole is that monstrous giant server with each user contributing a tiny piece. I quite like that idea! But not if it's eating my desktop and server CPU. Fur a cpu usage graph: https://image.ibb.co/eDPtHA/ipfs-cpuusabe.png (and that is on a quite decent vps!) This CPU usage thingy is a real showstopper for me a the moment. I wish i could help profiling and fixing stuff, but Go isn't quite my language. I would've if it were C++ :) |
Please try running the daemon with |
I will give that a shot. I have no problem passing that argument on my local machine. |
You should be able to pass that on the docker command line. That is: docker run -d --name ipfs_host ... ipfs/go-ipfs:latest daemon --routing=dhtclient |
I now tried that (locally and on a VPS server). I think it helped somewhat. Even though the CPU usage still is high and now is really wonky: https://i.imgur.com/bUdbFv6.png Also, due to the sheer number of peers it connects to (i'm using the default configs) it triggers network hacking detection. Thus i get notified by my provider that the ipfs ip is possibly doing hacking attempts... Right. What is the proper way to limit ipfs to - say - 100 peers or so? That alone will probably also reduce CPU usage significantly. |
To limit the number of connections, use the connection manager: https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#connmgr To avoid connecting to private IP addresses (what's likely triggering the port scanner), apply the |
@Stebalien I do have more questions though, sorry :) Regarding the connmgr. That is really vaguely worded! "LowWater" and "HighWater" ... seriously? Even then it still doesn't tell me anything if those will be the maximum number of connections it opens or if it will open a thousand and keep_open what is specified. |
Profiles patch/transform your config itself.
You're right, that documentation is pretty terrible. Fixed by: #5839. New version: https://github.com/ipfs/go-ipfs/blob/716f69b8f8a3abbaa2fdcacc7827eba00e3470de/docs/config.md#connmgr |
New release, new screenshot! If anything, it got even worse! I am running it in stock mode though. No Please put this CPU load issue as the No.1 priority. For you folks, the IPFS devs, it's only annoying that you get complaints about it. And for us - the users - it's also annoying as having an app that constantly uses much of the CPU (even on high end CPU's!) is bound to get us into trouble if hosted at some provider. |
Performance and resource utilization is a very high priority issue and we're really doing the best we can. This looks like a regression we've also noticed in the gateways, we'll try to get a fix out as soon as possible. Can you give us a CPU profile? That is, Also, what kind of load are you putting on this node? Is it doing nothing? Is it fetching content? Acting as a gateway? Serving content over bitswap? |
Hi, Sorry for the quite late reply. It's doing nothing at all! Just a plain and simple docker run to let it run. Nothing more. This is on a hetzner VPS instance. If you want, i can give you a VPS instance to play with for a month or so. Yes, i'ill pay for it, see it as my little contribution to this project ;) As of a few days ago, it even decided to take up 100% CPU usage. I had to kill my VPS to even be able to get it back responding and run the command you asked for. Lastly, i really don't understand why others are apparently not running in this CPU insanity along with this bug #5977 That bug gives me freaky unstable swarm connections with them mostly not working, sometimes they do. If you have commands for me to run and help in debugging, i'd be happy to help :) But... no binary! Right now i'm shutting off IPFS as it's already having between 20 and 30% CPU usage. Please lets figure this out! Last note. I'd really consider revoking this last IPFS release (0.4.19). I know it sucks. Specially as a fellow developer i know that's about the last thing you'd want to do! But this CPU issue really is getting out of control imho and together with #5977 it just doesn't give a good IPFS experience at all. Something nobody involved wants. |
@markg85, I'm also running against this issue at least from mid-February (v0.4.18 I guess, but it also happens with v0.4.19). The daemon quietly but steadily eats nearly all remaining system memory and maxes CPU to 100%. |
The CPU profile is generated by pprof. The If you don't want to share the binary blob, you can generate a list of the top 20 consumers by running: > go tool pprof /path/to/your/ipfs/binary /path/to/the/profile At the prompt, type
I've seen unstable swarm connections (issues with the connection manager) but I have absolutely no idea why your CPU pegged to 100%. My best guess is that you ran out of memory. |
It has 4GB available.. How much more does IPFS need? If memory was an issue then there is a massive leak in IPFS somewhere. |
Just as a side comment, I switched off QUIC support and swarm addresses in case it had something to do with quic-go/quic-go#1811, but the behaviour stayed the same, so it may be related with running out of memory indeed. The VPS the daemon is running on doesn't have a lot of memory, but the 100% CPU usage issue didn't happen until recently and it did fine until then. |
@ivilata could I get a CPU profile (https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning)? @markg85 if that works for you, my SSH keys are https://github.com/Stebalien.keys. I'm also happy receiving the CPU profile by email (encrypted if you'd like). |
You can also use: https://github.com/ipfs/go-ipfs/blob/master/bin/collect-profiles.sh to collect profiles. |
Just to pile it on here... :-) I don't think my issue has anything to do with lack of RAM as I have 32GB on my server and 25GB+ available but the CPU and system load will climb steadily over time to the point where all data transfer is halted. CPU jumps to over 100% utilization and no clients can access anything. A quick restart of the ipfs daemon brings it back to being able to transfer data again. I have to do this numerous times a day to the point where I am thinking of creating a crontab entry to just reboot the daemon every hour. I do have upwards of 500GB+ worth of blocks being shared but not sure if it has anything to do with that or not since after a restart, things work fine and the size of the data store has not changed. My application also does an hourly addition of content sync'd from other sources. This job takes about 10-15 mins to run during which time data access is also halted or very slow. Seems like running I don't know yet if this hourly addition of content is what triggers the non-responsiveness causing the need to reboot or if it is just an additional issue which the daemon eventually recovers from. I am still attempting to isolate the issues. FYI, I do have numerous other services running on the same server and during these times where IPFS is non-responsive, all other services are super responsive. The server has plenty of resources and ability to execute on other requests, just not IPFS. |
Sent to @Stebalien by email. The script linked by Kubuxu failed with:
|
It might be an old golang version (which go version are you running), or profile incompatibilities between golang versions. |
@markg85 Looks like the issue is with storing provider records. TL;DR, your node's peer ID is probably "close" to some very popular content so everyone is telling you about that content. The actual issue here is garbage collection. It looks like the process of walking through and removing expired provider records is eating 50% of your CPU. Issue filed as: libp2p/go-libp2p-kad-dht#316 |
@Stebalien that would be a response to @ivilata :) Also, CPU usage always is high with IPFS. That is on a low end crappy machine, but also on a high end multi-core power beast. You just notice it way less as it's often one core that is being used most which you hardly notice if you have 16.. That doesn't make the issue less, it merely "masks" it. This is the case for me when hosting it locally (lots of cores) and on a remote VPS (just 2 cores). The rationale of being close to a popular source would be very troublesome as IPFS is most certainly a niche product at this point in time. So if i'm close to a popular source (both locally in The Netherlands and in the VPS in Germany) then there is a whole big issue lurking right around the corner when IPFS does become populair. You also reference a DHT issue. Whereas earlier (months..) it was suggested to use DHT as it could lighten the CPU stress. And it does (or did) reduce CPU load somewhat when compared to not using DHT. But it's just always high. Also, sorry for not sending the information yet you had requested. I will do that later today. |
(oops)
By "close to" I mean in terms of the DHT keyspace, not physical space. His peer ID is such that the And yes, this is an issue, that's why I filed an issue. Note: I run go-ipfs on my laptop all the time and barely notice it (albeit with the |
Now that's (literally) quite unfortunate. I hope that the issue you opened gets fixed fast… In the worst case I guess I can just replace keys… Thanks! |
@ivan386 for now, I recommend running go-ipfs with the |
@Stebalien ok |
@Stebalien I guess this was addressed to me… 😛 I upgraded the daemon to Do you suggest anything else I could test? |
* fixes #5613 (comment) * fixes some relay perf issues * improves datastore query performance License: MIT Signed-off-by: Steven Allen <[email protected]>
@ivilata, could you try the latest master? Everyone else, please open new issues if you're still noticing high CPU usage in the current master. This issue has several separate bug reports that are becoming hard to untangle. WRT the original issue (opening the peers page leading to a bunch of CPU usage), I believe we've mostly fixed the issue:
All together, this has significantly reduced the issue. |
@Stebalien: After some hours with 0.4.20, CPU usage went back to 15%-30% and it's stayed like that. I upgraded to 0.4.21 and CPU usage continued to stay in the same range (though IPv4 traffic doubled and IPv6 was cut to a quarter, but that's another story). Thanks a lot again for taking care of this! 😄 |
Thanks for the report! (although I'm not sure what would have caused the IP traffic changes) |
* fixes ipfs/kubo#5613 (comment) * fixes some relay perf issues * improves datastore query performance License: MIT Signed-off-by: Steven Allen <[email protected]>
First, great job to all involved!
I am super excited about this project and am close to releasing a new project myself built on IPFS. My project will encourage users to operate an IPFS node but I have a current concern with a CPU utilization issue I am seeing which could seriously hamper the desire to run a node.
Version information:
go-ipfs version: 0.4.17-
Repo version: 7
System version: amd64/linux
Golang version: go1.10.3
☝️
$ cat /etc/issue
Ubuntu 18.04.1 LTS \n \l
The same is witnessed on my MacBook pro (Sierra 10.12.6) running latest ipfs-desktop (not sure which version of core comes bundled, having a difficult time finding the version info)
Type:
Bug
Description:
When IPFS core (go) is launched and running, the CPU utilization is generally fine. Idles around 5% with momentary spikes up to 10-15%. When I launch the web GUI (or the desktop GUI) the CPU utilization jumps to idling at over 100% utilization with spikes above 200%.
This is seen on both Ubuntu and Mac OS X. I have managed to narrow it down on the desktop version to the
peers
tab. If any other view is accessed, the utilization remains fine. As soon as thepeers
view is accessed, the CPU jumps.In both Ubuntu and Mac OS X, if I close out either management interface, the CPU utilization eventually calms down but this takes quite a bit of time as well. I usually just kill the daemon and restart it to recover.
I am submitting this here first as it seems consistent across OS and client which suggested to me a core issue but I can file this with each client if it is felt to be an issue on that side. Apologies for not providing more info but I have not have a decent chance to dig into it more myself but can after next week if needed.
The text was updated successfully, but these errors were encountered: