-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abort() followed by start() is broken on non-file streams #275
Comments
This allows for upload to start almost immediately without having to first download the file. And it allows for uploading bigger files, because transloadit assembly will not timeout, as it will get upload progress events all the time. No longer need for illusive progress. Also fix eslint warnings and simplify logic Still TODO: TUS pause/resume has a bug: tus/tus-js-client#275
@mifi I will try too look into this in the next week, FYI. |
…ad/download without saving to disk (#3159) * rewrite to async/await * Only fetch size (HEAD) if needed #3034 * Update packages/@uppy/companion/src/server/controllers/url.js Co-authored-by: Antoine du Hamel <[email protected]> * Change HEAD to GET in getURLMeta and abort request immediately upon response headers received #3034 (comment) * fix lint * fix lint * cut off length of file names or else we get "MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3 * try to fix flaky test * remove iife and cleanup code a bit * fix lint by reordering code * rename Uploader to MultipartUploader * Rewrite Uploader to use fs-capacitor #3098 This allows for upload to start almost immediately without having to first download the file. And it allows for uploading bigger files, because transloadit assembly will not timeout, as it will get upload progress events all the time. No longer need for illusive progress. Also fix eslint warnings and simplify logic Still TODO: TUS pause/resume has a bug: tus/tus-js-client#275 * add comment in dev Dashboard and pull out variable * fix a bug where remote xhr upload would ignore progress events in the UI * fix bug where s3 multipart cancel wasn't working * fix also cancel for xhr * Rewrite providers to use streams This removes the need for disk space as data will be buffered in memory and backpressure will be respected #3098 (comment) All providers "download" methods will now return a { stream } which can be consumed by uploader. Also: - Remove capacitor (no longer needed) - Change Provider/SearchProvider API to async (Breaking change for custom companion providers) - Fix the case with unknown length streams (zoom / google drive). Need to be downloaded first - rewrite controllers deauth-callback, thumbnail, list, logout to async - getURLMeta: make sure size is never NaN (NaN gets converted to null in JSON.stringify when sent to client but not when used in backend) - fix purest mock (it wasn't returning statusCode on('response')) - add missing http mock for "request" for THUMBNAIL_URL and http://url.myendpoint.com/file (these request errors were never caught by tests previously) - "upload functions with tus protocol" test: move filename checking to new test where size is null. Fix broken expects - fix some lint * Implement streamingUpload flag COMPANION_STREAMING_UPLOAD Default to false due to backward compatibility If set to true, will start to upload files at the same time as dowlnoading them, by piping the streams - Also implement progress for downloading too - and fix progress duplication logic - fix test that assumed file was fully downloaded after first progress event * rearrange validation logic * add COMPANION_STREAMING_UPLOAD to env.test.sh too * implement maxFileSize option in companion for both unknown length and known length downloads * fix bug * fix memory leak when non 200 status streams were being kept * fix lint * Add backward-compatibility for companion providers Implement a new static field "version" on providers, which when not set to 2, will cause a compatibility layer to be added for supporting old callback style provider api also fix some eslint and rename some vars * document new provider API * remove static as it doesn't work on node 10 * try to fix build issue * degrade to node 14 in github actions due to hitting this error: nodejs/node#40030 https://github.com/transloadit/uppy/pull/3159/checks?check_run_id=3544858518 * pull out duplicated logic into reusable function * fix lint * make methods private * re-add unsplash download_location request got lost in merge * add try/catch as suggested #3159 (comment) * Only set default chunkSize if needed for being more compliant with previous behavior when streamingUpload = false * Improve flaky test Trying to fix this error: FAIL packages/@uppy/utils/src/delay.test.js ● delay › should reject when signal is aborted expect(received).toBeLessThan(expected) Expected: < 70 Received: 107 32 | const time = Date.now() - start 33 | expect(time).toBeGreaterThanOrEqual(30) > 34 | expect(time).toBeLessThan(70) | ^ 35 | }) 36 | }) 37 | at Object.<anonymous> (packages/@uppy/utils/src/delay.test.js:34:18) https://github.com/transloadit/uppy/runs/3984613454?check_suite_focus=true * Apply suggestions from code review Co-authored-by: Antoine du Hamel <[email protected]> * fix review feedback & lint * Apply suggestions from code review Co-authored-by: Merlijn Vos <[email protected]> * remove unneeded ts-ignore * Update packages/@uppy/companion/src/server/controllers/url.js Co-authored-by: Antoine du Hamel <[email protected]> * Update packages/@uppy/companion/src/server/Uploader.js Co-authored-by: Antoine du Hamel <[email protected]> * reduce nesting * fix lint * optimize promisify #3159 (comment) * Update packages/@uppy/companion/test/__tests__/uploader.js Co-authored-by: Antoine du Hamel <[email protected]> Co-authored-by: Antoine du Hamel <[email protected]> Co-authored-by: Merlijn Vos <[email protected]>
Thank you for the detailed report, Mikael. I was able to reproduce the issue when uploading to tusd.tusdemo.net. After aborting and restarting the upload, a few KBs are transferred to the tus server but then it hangs until a timeout. I am quite certain that this is caused by how tus-js-client handles streams. In particular, I am talking about the StreamSource and SlicingStream classes: https://github.com/tus/tus-js-client/blob/master/lib/node/sources/StreamSource.js and https://github.com/tus/tus-js-client/blob/master/lib/node/sources/SlicingStream.js. I suspect there is some buggy behavior in this, but I wasn't able to pinpoint it directly. This is also partly because my experience with streams in Node.js does not reach very deep. I would like to describe the goal behind the StreamSource class and maybe you can help me with finding a better implementation for it. Basically, StreamSource is an implementation of the FileSource interface: Lines 74 to 84 in 708df61
It is used by tus-js-client to turn a file, blob, buffer, etc into chunks that can be sent in a single HTTP request. StreamSource does this for Node's readable streams. The challenge here is that the stream not only must be sliced in potentially multiple parts, but also that a certain amount of data must be buffered, which can be used for resuming the upload if some data got lost on the way to the server. StreamSource has two major functions:
Now, I believe that the current implementation of StreamSource sometimes loses data, as you have experienced in your bug report. I wanted to ask if you know a good way of implementing the above interface using Node's streams which does not leak memory, event listeners or data. I feel like I don't know enough about streams to pull this off. Any help in that regard is appreciated. Maybe also @juliangruber has any idea about this. |
Thanks for explaining the constraints / specification of this class. One of the reasons that I never started working on this is because i didnt really know how it all works. I have some experience with working with streams, but i find working with streams and low level byte handling really mind bending and complicated. However I can try and spend a few hours to see if I can come up with something that works :) |
The SlicingStream and StreamSource are using some outdated or non-existing nodejs stream patterns, I think we could improve this in general by giving them an overhaul, and hopefully also fix the bug in question here. Shall I give it a stab, and document the changes? |
I'm having a try at it now, trying to solve it by removing SlicingStream and instead return a Buffer, because a stream that produces streams complicates a lot |
I think an api like this could work: const slicer = new Slicer(chunkSize)
const seeker = new Seeker(from, to)
source.pipe(slicer).pipe(seeker)
const result = await seeker.find()
source.destroy() |
@juliangruber Thank you very much! Are the |
I found some modules for this on npm, maybe some work for this use case: |
Interesting approach that I haven't thought about before. My concern would be that we first have to buffer the data (e.g. 10MB if that is the chunk size) before sending it off to the server. This would cause a time delay in comparison to the current implementation, but maybe this is acceptable. I am not concerned about memory usage since we are also currently buffering up to a limit of the given chunk size.
Awesome, I will also have a look into these! |
As far as I understand, we always have to buffer every chunk we read from the stream anyways, because the consumer might read (call slice on it) again later? Actually when I look at the current implementation of Anyways I tried to reimplement it without streams, but I'm getting a multitude of test failures no matter what I do, so I'm a bit stuck and not sure if I should spend more time on this. Feel free to take a stab on this @juliangruber. FWIW here's my failing code so far: async function readChunk (stream) {
return new Promise((resolve, reject) => {
stream.once('error', reject)
function tryReadChunk () {
const chunk = this._stream.read()
if (chunk != null) {
resolve(chunk)
return
}
// todo must handle the end case here too
stream.once('readable', () => tryReadChunk)
}
tryReadChunk()
})
}
export default class StreamSource {
constructor (stream, chunkSize) {
this._stream = stream
// Setting the size to null indicates that we have no calculation available
// for how much data this stream will emit requiring the user to specify
// it manually (see the `uploadSize` option).
this.size = null
this._chunkSize = +chunkSize
stream.pause()
this._ended = false
stream.on('end', () => {
this._ended = true
})
stream.on('error', (err) => {
this._error = err
})
this._buf = Buffer.alloc(0)
this._bufPos = 0
}
// See https://github.com/tus/tus-js-client/issues/275#issuecomment-1047304211
async slice (start, end) {
// Fail fast if the caller requests a proportion of the data which is not
// available any more.
if (start < this._bufPos) {
throw new Error('cannot slice from position which we already seeked away')
}
if (this._error) throw this._error
// Always attempt to drain the buffer first, even if this means that we
// return less data than the caller requested.
if (start < this._bufPos + this._buf.length) {
const bufStart = start - this._bufPos
const bufEnd = Math.min(this._buf.length, end - this._bufPos)
const sliced = this._buf.slice(bufStart, bufEnd)
sliced.size = sliced.length
return { value: sliced }
}
// OK, we are outside the range of our stored buffer, and need to read from the stream itself
const bytesToSkip = start - (this._bufPos + this._buf.length)
let bytesRead = 0
while (true) {
if (this._ended) return { value: null, done: true }
const receivedChunk = await readChunk(this._stream)
bytesRead += receivedChunk.length
if (bytesRead > bytesToSkip) {
const bytesToSkipInChunk = bytesToSkip - (bytesRead - receivedChunk.length)
const slicedChunk = receivedChunk.slice(bytesToSkipInChunk)
this._buf = slicedChunk // store in case the consumer wants to read this chunk (or parts of it) again
this._bufPos = start
break
}
}
const requestedLength = end - start
// need to constrain the returned chunk size?
const chunkToReturn = this._buf.slice(0, requestedLength)
chunkToReturn.size = chunkToReturn.length
return { value: chunkToReturn }
}
close () {
this._stream.destroy()
}
} had to also add to .babelrc to makethe code run:
|
Will do 👍 And thanks for sharing your code |
anyone made any progress yet? |
I haven't yet had time for this. If you want to continue on this, the first thing I would change is not to read from the stream manually, but always to use |
@mifi I will pick this issue up again this week! |
Great! My brain almost exploded last time when I tried to fix this. |
Hey @Acconut , are you still on top of this one? |
In #385 I implemented a new stream source, using which a tus uploaded can be |
@mifi In tus-js-client v3.0.0-0 (prerelease for now) I have added a fix. Can you try it out and see if that helps with your original issue? |
Fantastic! I just tested and it looks like it's working nicely with pause/resume in the companion UI now |
by upgrading tus-js-client see tus/tus-js-client#275
…ad/download without saving to disk (transloadit#3159) * rewrite to async/await * Only fetch size (HEAD) if needed transloadit#3034 * Update packages/@uppy/companion/src/server/controllers/url.js Co-authored-by: Antoine du Hamel <[email protected]> * Change HEAD to GET in getURLMeta and abort request immediately upon response headers received transloadit#3034 (comment) * fix lint * fix lint * cut off length of file names or else we get "MetadataTooLarge: Your metadata headers exceed the maximum allowed metadata size" in tus / S3 * try to fix flaky test * remove iife and cleanup code a bit * fix lint by reordering code * rename Uploader to MultipartUploader * Rewrite Uploader to use fs-capacitor transloadit#3098 This allows for upload to start almost immediately without having to first download the file. And it allows for uploading bigger files, because transloadit assembly will not timeout, as it will get upload progress events all the time. No longer need for illusive progress. Also fix eslint warnings and simplify logic Still TODO: TUS pause/resume has a bug: tus/tus-js-client#275 * add comment in dev Dashboard and pull out variable * fix a bug where remote xhr upload would ignore progress events in the UI * fix bug where s3 multipart cancel wasn't working * fix also cancel for xhr * Rewrite providers to use streams This removes the need for disk space as data will be buffered in memory and backpressure will be respected transloadit#3098 (comment) All providers "download" methods will now return a { stream } which can be consumed by uploader. Also: - Remove capacitor (no longer needed) - Change Provider/SearchProvider API to async (Breaking change for custom companion providers) - Fix the case with unknown length streams (zoom / google drive). Need to be downloaded first - rewrite controllers deauth-callback, thumbnail, list, logout to async - getURLMeta: make sure size is never NaN (NaN gets converted to null in JSON.stringify when sent to client but not when used in backend) - fix purest mock (it wasn't returning statusCode on('response')) - add missing http mock for "request" for THUMBNAIL_URL and http://url.myendpoint.com/file (these request errors were never caught by tests previously) - "upload functions with tus protocol" test: move filename checking to new test where size is null. Fix broken expects - fix some lint * Implement streamingUpload flag COMPANION_STREAMING_UPLOAD Default to false due to backward compatibility If set to true, will start to upload files at the same time as dowlnoading them, by piping the streams - Also implement progress for downloading too - and fix progress duplication logic - fix test that assumed file was fully downloaded after first progress event * rearrange validation logic * add COMPANION_STREAMING_UPLOAD to env.test.sh too * implement maxFileSize option in companion for both unknown length and known length downloads * fix bug * fix memory leak when non 200 status streams were being kept * fix lint * Add backward-compatibility for companion providers Implement a new static field "version" on providers, which when not set to 2, will cause a compatibility layer to be added for supporting old callback style provider api also fix some eslint and rename some vars * document new provider API * remove static as it doesn't work on node 10 * try to fix build issue * degrade to node 14 in github actions due to hitting this error: nodejs/node#40030 https://github.com/transloadit/uppy/pull/3159/checks?check_run_id=3544858518 * pull out duplicated logic into reusable function * fix lint * make methods private * re-add unsplash download_location request got lost in merge * add try/catch as suggested transloadit#3159 (comment) * Only set default chunkSize if needed for being more compliant with previous behavior when streamingUpload = false * Improve flaky test Trying to fix this error: FAIL packages/@uppy/utils/src/delay.test.js ● delay › should reject when signal is aborted expect(received).toBeLessThan(expected) Expected: < 70 Received: 107 32 | const time = Date.now() - start 33 | expect(time).toBeGreaterThanOrEqual(30) > 34 | expect(time).toBeLessThan(70) | ^ 35 | }) 36 | }) 37 | at Object.<anonymous> (packages/@uppy/utils/src/delay.test.js:34:18) https://github.com/transloadit/uppy/runs/3984613454?check_suite_focus=true * Apply suggestions from code review Co-authored-by: Antoine du Hamel <[email protected]> * fix review feedback & lint * Apply suggestions from code review Co-authored-by: Merlijn Vos <[email protected]> * remove unneeded ts-ignore * Update packages/@uppy/companion/src/server/controllers/url.js Co-authored-by: Antoine du Hamel <[email protected]> * Update packages/@uppy/companion/src/server/Uploader.js Co-authored-by: Antoine du Hamel <[email protected]> * reduce nesting * fix lint * optimize promisify transloadit#3159 (comment) * Update packages/@uppy/companion/test/__tests__/uploader.js Co-authored-by: Antoine du Hamel <[email protected]> Co-authored-by: Antoine du Hamel <[email protected]> Co-authored-by: Merlijn Vos <[email protected]>
Describe the bug
With a non-file stream, when an upload is
start()
-ed and then after a few seconds of progress, we callabort()
, followed by anotherstart()
a few seconds later, the upload grinds to a halt andtus-js-client
leaks stream event listeners. This does not happen with file based streams (fs.createReadStream
).To Reproduce
create
index.js
:Now create an assembly in the state
UPLOADING
and replaceassemblyUrl
with the assembly's url.Observe the log output:
Note: the file does get uploaded successfully, it just takes a very long time, and it leaks EventEmitters
Alternatively, easier to reproduce but different error/outcome:
endpoint: 'https://tusd.tusdemo.net/files/
. No assemblyUrl needed. The upload completely hangs but crashes with a http 499 after a whileendpoint: 'http://127.0.0.1:1080/files/
and enable/uncomment the tus-node-server code. This also causes the upload to slow to a halt (even slower progress events).Even without Throttle, the same issue happens. I also tried with a
fs.createReadStream
pipe through aPassThrough
stream, and same issue.Expected behavior
It should continue uploading with normal speed after calling
start()
again, just like for file streams.Setup details
Please provide following details, if applicable to your situation:
The text was updated successfully, but these errors were encountered: