Remove end-of-stream blocking #92

ehmicky · 2023-08-11T01:30:15Z

Currently, get-stream aggregates all chunks in an array. Once fully consumed, all chunks are concatenated at the end.

This final concatenation is slow and CPU-bound. On my machine, for getStream(), it is roughly a third of the CPU time spent. It is even half of it for getStreamAsBuffer()/getStreamAsArrayBuffer(). If that final concatenation is happening on a big stream, this can hold up the event loop for a relatively long time. This can result in perceivable freezes by users, which removes some of the benefits of using streams.

Instead, this PR finds a way to remove that final concatenation, and spend that CPU time when each chunk is processed instead. In other terms, it spreads that load over the whole time the stream is being consumed. Since streams are almost always I/O-bound (filesystem or network), that additional CPU time spent on each chunk is not noticed since the bottleneck is I/O (i.e. waiting for the next chunk).

This PR implements this by shifting the main strategy from storing each chunk in an array, to computing the final value incrementally instead. For example, instead of pushing to an array of strings then calling array.join(''), it keeps only a single string, which is appended with += on each chunk. This turns out to be actually ~30% faster (on my machine), most likely because array.push() needs to allocate more memory at specific size intervals, which is slow. As a side note, this is also how core Node.js is implementing text().

For Buffer/ArrayBuffer, the implementation is a little more involved to make sure it is as fast as possible (i.e. doing as few memory allocations and copies as possible). There are also two strategies, since ArrayBuffer.resize() provides with some significant performance boost, but is unfortunately only supported in Node 20 and a couple of browsers.

This PR comes also with the following enhancement: when the stream is so big that it is larger than the max length for a string/Buffer/ArrayBuffer, we currently truncate error.bufferedData to make sure it fits within the max length. The current implementation returns only half of error.bufferedData. This PR returns all bytes that have been read so far instead, so users will get as much information as possible in error.bufferedData.

Apart from that last point, the user-visible behavior is identical to before this PR, this is just performance tweaking.

Finally, spreading the load as described above seems to decrease the probability of the process crashing on big streams. 2 automated tests had been previously disabled on CI for this reason. This PR re-enables them on CI, as they now work correctly.

index.js

sindresorhus · 2023-08-11T10:53:28Z

index.js

+// the stream data. It does not include extraneous zeroes to trim at the end.
+// The underlying `ArrayBuffer` does allocate a number of bytes that is a power
+// of 2, but those bytes are only visible after calling `ArrayBuffer.resize()`.
+const resizeArrayBufferFast = (contents, length) => {


I would name this resizeArrayBuffer and the other one resizeArrayBufferSlow as this one is the one we will keep in the future. It makes it slightly easier to remove the old one when we target Node.js 20.

Done in bcb87e7 👍

Co-authored-by: Sindre Sorhus <[email protected]>

ehmicky · 2023-08-11T15:10:01Z

Fixed 👍

sindresorhus requested changes Aug 11, 2023

View reviewed changes

ehmicky and others added 7 commits August 11, 2023 15:40

Remove end-of-stream blocking

b03abd1

Update index.js

f00104b

Co-authored-by: Sindre Sorhus <[email protected]>

Improve comment

3981884

Rename function

bcb87e7

Fix length variable

56651f1

Fix linting

52d065f

Add comment

c13cc13

ehmicky requested a review from sindresorhus August 11, 2023 14:47

ehmicky added 2 commits August 11, 2023 16:04

Fix linting

2a66f6b

Fix linting

74c2519

sindresorhus merged commit 333125c into sindresorhus:main Aug 11, 2023

ehmicky deleted the improve-performance branch August 11, 2023 16:42

ehmicky mentioned this pull request Aug 13, 2023

v8 release notes #102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove end-of-stream blocking #92

Remove end-of-stream blocking #92

ehmicky commented Aug 11, 2023 •

edited

Loading

sindresorhus Aug 11, 2023

ehmicky Aug 11, 2023

ehmicky commented Aug 11, 2023

Remove end-of-stream blocking #92

Remove end-of-stream blocking #92

Conversation

ehmicky commented Aug 11, 2023 • edited Loading

sindresorhus Aug 11, 2023

Choose a reason for hiding this comment

ehmicky Aug 11, 2023

Choose a reason for hiding this comment

ehmicky commented Aug 11, 2023

ehmicky commented Aug 11, 2023 •

edited

Loading