-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipfs add stalls on directory with a very large amount of files #7596
Comments
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
Finally, remember to use https://discuss.ipfs.io if you just need general support. |
Try enabling directory sharding:
```
ipfs config --json Experimental.ShardingEnabled true
```
|
Thanks, I actually have sharding and filestore enabled already
If I remember correctly, when I started using sharding, it fixed another issue where the add would die at the end, but I don't think it had an affect on the speed. Not sure if it's relevant, but this is the kernel I'm using, and the filesystem is ext4
|
Hm.
Once the add starts hanging, could you run https://github.com/ipfs/go-ipfs/blob/master/bin/collect-profiles.sh? This will take a snapshot of IPFS stack traces, a CPU usage sample, etc. and help us figure out where it's stuck. |
Yes, datastore is flatfs and filesystem is ext4 I took a snapshot at 9% complete, the first time the add stalled, and it was stopped for about a minute snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:36:55-0400.tar.gz Took another one around 50% snapshot: ipfs-profile-mega-ipfs-node-2020-08-20T15:56:15-0400.tar.gz The full add took about 1.5 hours, which is faster than previous adds, but the time estimate displayed 20 minutes while it was running. The files were already added previously, but that was also the case last time I added this directory. Right before running, I updated golang and did
|
I'd recommend trying badger. While the filestore will prevent you from writing blocks for files, you'll still have to write blocks for directory chunks. Given the current sharding algorithm, this'll lead to hundreds of thousands of blocks. On flatfs, this equates to hundreds of thousands of synchronously written files. |
The CPU profile agrees. We're spending a lot of time writing files. |
Ok, thank you for looking into it and checking the CPU profile. I'm on board with converting to badgerds. I may not get to it for a little while, so I'll close this for now and report back later with how it affects the add operation. |
Version information:
go-ipfs version: 0.6.0
Repo version: 10
System version: amd64/linux
Golang version: go1.14.4
Description:
When adding a very large directory of over 3m files, the add operation stalls around 15% of the way through and seems to hang for a while before eventually continuing. The time estimate starts at around 10 minutes, and after stalling, the estimate increases to about 35 minutes. It happens intermittently throughout the add, and in the end it takes between 2.5 - 3 hours to add everything. The files are each around 700 bytes, and an rsync of the same files takes 10 - 15 minutes.
I'm running add using
--nocopy
and--offline
, and I have file sharding enabled.$ ipfs add -r --nocopy --offline data/
I've read this related issue, but I'm not sure if it's the same issue.
This is a bit separate, but just to add some more information, after the directory is added, retrieving the directory over the local gateway sometimes works after a long time and sometimes dies with a 502 Proxy Error. By comparison, Apache will return the files, although it takes a long time to list them. Retrieving a single file in the directory works ordinarily.
The hash I'm referring to is
QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6
and the metadata file for the set can be viewed at https://ipfs.io/ipfs/QmXDZ3KzdW9DnuCHvFpttaZSWokAs42ZBayLSCbPeha7B6/metadata.json, although the rest of the files are gzipped CSV format.The text was updated successfully, but these errors were encountered: