Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload Large File bug #762

Closed
gitGksgk opened this issue Aug 24, 2018 · 29 comments
Closed

Upload Large File bug #762

gitGksgk opened this issue Aug 24, 2018 · 29 comments
Labels
area/screen/files Issues related to Files screen kind/bug A bug in existing code (including security flaws) status/blocked Unable to be worked further until needs are met

Comments

@gitGksgk
Copy link

in components\files\action-bar.js OnFilesChange: It reads the entire file first and then do things. This cause a huge memory consumption when uploading large file. In Chrome it will crash the tab and in Firefox it almost causes my system to crash..... Any suggestions? thanks.

@gitGksgk
Copy link
Author

weired thing is, the online ipfs release using the same version doesn't have crash problem. although still consume a lot of memory.

@gitGksgk
Copy link
Author

I test the problem using 40MB and 700MB file, 40 works fine and 700 crashs. it might differ because of the amount of total memory

@hacdias
Copy link
Member

hacdias commented Aug 24, 2018

Hello @gitGksgk! The current version of WebUI is going to be replaced soon. Could you try and see if the error persists on revamp branch? We know there are some issues with large files.

@gitGksgk
Copy link
Author

Sure, many thanks for your reply. Unfortunately the error persists. I suspect it's because
const { streams, totalSize } = await filesToStreams(rawFiles)
in \src\bundles\file.js
not sure if there is a way to transfer file and simultaneously read file so that it consumes just tiny memory.

@gitGksgk
Copy link
Author

it turns out that it's probably because of some kind of memory leak. In chrome dev tools a lot of 1MB JSArrayBufferData is observed when uploading. The last sentence traceable is In node_modules\pull-file-reader\index.js . It seems to be designed sending while reading the uploading file but the buffer persists so that the memory consumption becomes larger and larger until it crashes.
Any ideas?

@hacdias
Copy link
Member

hacdias commented Aug 27, 2018

@gitGksgk I was searching and this seems to be a known issue. Not sure if it's the same one, but I'll take a look at WebUI's code to see if something might be missing. Although, the most probable thing is that this is the same issue as ipfs-inactive/js-ipfs-http-client#654.

@gitGksgk
Copy link
Author

gitGksgk commented Aug 27, 2018

hi @hacdias Thanks for your reply. yeah I've reached that issue before. I don't know if this project uses some part of js-ipfs-api component implicitly, but I can't find http-api in this project, which that issue has modified to fix it's uploading problem.

@hacdias
Copy link
Member

hacdias commented Aug 27, 2018

@gitGksgk we're using ipfs-react-bundle, which required window.ipfs-fallback. So, we use window.ipfs if available or connect through js-ipfs-api otherwise. On either way, we use the js-ipfs-api because that's what window.ipfs also uses.

@gitGksgk
Copy link
Author

gitGksgk commented Aug 28, 2018

hi @hacdias Thanks for your reply. Further investigation shows that in src\bundles\files.js , function

  const res = await ipfs.add(file.content, {
          pin: false,
          // eslint-disable-next-line
          progress: (bytes) => {
            sent = sent + bytes - alreadySent
            alreadySent = bytes
            updateProgress(sent / totalSize * 100)
          }
        })

reads the whole file into buffer first, and then send to ipfs over stream. Thus the "progress" item won't start until end of reading file. This is the same way as the old version does.

@hacdias
Copy link
Member

hacdias commented Aug 28, 2018

@gitGksgk basically you're saying that if we removed the progress, it would not consume so much memory?

@gitGksgk
Copy link
Author

gitGksgk commented Aug 28, 2018

nope, I am saying that the memory consumption ended before the progress starts. It's file.content that costs the consumption. in src\bundle\files.js there's a function named filesToStreams, defined in src\lib\files.js, however it didn't convert file to stream. Rather than split a large file into small pieces, it just split possibly multiple uploaded files into single file...

@gitGksgk
Copy link
Author

gitGksgk commented Aug 28, 2018

file.content is a function reading the whole file content into memory. Read buffer is not piped to send program. That's why it consumes so much memory.

@hacdias
Copy link
Member

hacdias commented Aug 28, 2018

@gitGksgk so you're saying the issue is on filesToStreams?

@lidel
Copy link
Member

lidel commented Aug 28, 2018

@hacdias My understanding is that this additional iterating over streams should be removed / simplified.

Instead of multiple calls to:

const res = await ipfs.add(file.content, {

you should pass streams once:

const res = await ipfs.add(streams, {

Similar use case in Companion uses a single call: https://github.com/ipfs-shipyard/ipfs-companion/blob/b149f99425067affec4eede838b51f598e5fc373/add-on/src/popup/quick-upload.js#L84

I think multiple calls to ipfs.files.mkdir probably can be removed too. When uploading multiple files just call files.add with {wrapWithDirectory: true} parameter, it should create necessary tree under a root CID

In WebUI case you probably want to unwrap files after upload, so after a single files.add(streams, succeeds then you do files.cp <root cid>/every-direct-child /mfs_destination.

Hope this helps :)

@hacdias
Copy link
Member

hacdias commented Aug 28, 2018

@lidel but then, wouldn't we lose directory structure? Directories inside directories?

@lidel
Copy link
Member

lidel commented Aug 28, 2018

@hacdias I think ipfs.add with wrapWithDirectory should respect paths passed in name:

https://github.com/ipfs-shipyard/ipfs-webui/blob/7ef7759f6836032595df2343dee2435646abb9bb/src/lib/files.js#L30-L34

You should be able to pass name: 'root/sub1/sub2/file', and directory tree should be created for you, if not then its a bug we should fix upstream.

@gitGksgk
Copy link
Author

gitGksgk commented Aug 29, 2018

@lidel Yes there's a multiple function call but changing it to stream doesn't seem to change its behavior thus not solving the memory problem.
@hacdias Yes, It reads streams without piping them.

besides, when uploading file of 1.6GB the crush doesn't happen immediately after reading file to buffer though the memory usage grows very high. The crash happens in ipfsApi/./node_modules/buffer/index.js
there's a buffer copy in buf.copy(buffer, pos) ,
which i don't quite understand, after the last execution of fileReader in node_modules\pull-file-reader\index.js, that actually crash it down. (Tested on 400MB, same behavior. maybe it's not because of the size of file)

@hacdias
Copy link
Member

hacdias commented Aug 29, 2018

@gitGksgk could you make a simple script with the only required modules to make it crash please?

@hacdias
Copy link
Member

hacdias commented Aug 29, 2018

@lidel I'll take a look at simplifying the upload too!

@gitGksgk
Copy link
Author

@hacdias I'm glad to, but I don't quite understand. The module has a lot of dependencies. How can i use a simple script to reproduce the crash?

@hacdias
Copy link
Member

hacdias commented Aug 29, 2018

@gitGksgk actually it might not be needed. I was taking a look at pull-file-reader code:

https://github.com/tableflip/pull-file-reader/blob/3c1662c5d9ef8afa316aa0bbb9108f600950ffcf/index.js#L1-L36

And it seems to read the whole file and them split it into chunks. But then, on js-ipfs-api, there are also some related issues: ipfs-inactive/js-ipfs-http-client#654 /cc @alanshaw

@hacdias hacdias added kind/bug A bug in existing code (including security flaws) revamp labels Aug 29, 2018
@alanshaw
Copy link
Member

At the time that module was written I don't think there was a way to stream chunks from FileReader - I don't know if there is a way now. It would be great to get it upgraded to do real streaming.

Similarly for the upload bug - if someone could look into it and submit a fix that would be amazing. I would merge that so fast ;)

@hacdias
Copy link
Member

hacdias commented Aug 29, 2018

@lidel
Copy link
Member

lidel commented Aug 29, 2018

Quick summary for drive-by visitors:

@hacdias fixed "multiple calls to files.add" in #769 and I created a dedicated issue for buffering→the memory problem→crash at https://github.com/ipfs/js-ipfs-api/issues/842

Let's keep this open until https://github.com/ipfs/js-ipfs-api/issues/842 is addressed upstream.

@gitGksgk
Copy link
Author

gitGksgk commented Aug 30, 2018

@alanshaw don't know if it helps. But there's a filereader-stream module. I've tested. the reading process aren't seem to be memory hungry (but not as successful of course, otherwise a pr would have been sent...), while the ipfs.add process still crashes. That uses online file in unpkg.com thus hard to debug.

@hacdias
Copy link
Member

hacdias commented Aug 30, 2018

@gitGksgk we've found out that the issue is not with pull-file-reader, but with a package js-ipfs-api uses to shim Node's http package on the browser. Take a look at https://github.com/ipfs/js-ipfs-api/issues/842 😄

@lidel
Copy link
Member

lidel commented Apr 24, 2020

Note to self: this needs revisiting after we switch webui to the latest js-ipfs-http-client (with async iterators API)

@rafaelramalho19
Copy link
Contributor

Related with #1534

@lidel
Copy link
Member

lidel commented Oct 29, 2020

I believe this was fixed in #1534
@gitGksgk you may want to check the latest webui at https://webui.ipfs.io

@lidel lidel closed this as completed Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/screen/files Issues related to Files screen kind/bug A bug in existing code (including security flaws) status/blocked Unable to be worked further until needs are met
Projects
None yet
Development

No branches or pull requests

6 participants