-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream,zlib: performance improvements #13322
Conversation
d96b10e
to
e6540d1
Compare
Optional, but given the extent of the changes, it would be good to run a test coverage report and make sure this doesn't decrease coverage for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the streams parts.
lib/_stream_transform.js
Outdated
@@ -208,18 +206,15 @@ function done(stream, er, data) { | |||
if (er) | |||
return stream.emit('error', er); | |||
|
|||
if (data !== null && data !== undefined) | |||
if (data != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind adding a comment that this matches both null
and undefined
, and it is done on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added.
As any changes in streams, a run through CITGM would be nice. |
lib/zlib.js
Outdated
|
||
if (opts.encoding || opts.objectMode || opts.writableObjectMode) { | ||
opts = _extend({}, opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Object.assign
? still too slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any handy, no. We could always flip it to be a whitelist in the future I suppose if we run into issues. Using a whitelist would allow us to just specify an object inline and avoid any expensive copying-related operations.
This needs a rebase. |
e6540d1
to
c050ccb
Compare
Rebased. |
lib/_stream_transform.js
Outdated
this._transformState = { | ||
afterTransform: (er, data) => { | ||
return afterTransform(this, er, data); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we try to use bind
here? It might be slightly faster as we don't keep the context alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed and checked with @mcollina, this can be really fast using bind
, especially once this CL lands in V8. The CL should apply cleanly to 6.0, but also 5.9.
Here's a simple micro-benchmark:
function bar() {}
function A() { this.x = (x) => bar(x); }
function Ab() { this.x = bar.bind(undefined); }
function c() { return new A(); }
function cb() { return new Ab(); }
function l() { return {x: (x) => bar(x)}; }
function lb() { return {x: bar.bind(undefined)}; }
const N = 10000000;
function test(fn, n) {
for (var i = 0; i < n; ++i) {
fn.call(this);
}
}
const FNS = [c, cb, l, lb];
for (const fn of FNS) {
test(fn, 100);
}
for (const fn of FNS) {
console.time(fn.name);
test(fn, N);
console.timeEnd(fn.name);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should see if you can bring those improvements also to _readableState
and _writableState
, as allocating those are a hot path whenever using node streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied the state object as-is without really paying attention. I will look into adding bind()
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now incorporated bind()
and it doesn't seem to have negatively affected performance, so it's fine.
lib/_stream_transform.js
Outdated
this._transformState = { | ||
afterTransform: (er, data) => { | ||
return afterTransform(this, er, data); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed and checked with @mcollina, this can be really fast using bind
, especially once this CL lands in V8. The CL should apply cleanly to 6.0, but also 5.9.
Here's a simple micro-benchmark:
function bar() {}
function A() { this.x = (x) => bar(x); }
function Ab() { this.x = bar.bind(undefined); }
function c() { return new A(); }
function cb() { return new Ab(); }
function l() { return {x: (x) => bar(x)}; }
function lb() { return {x: bar.bind(undefined)}; }
const N = 10000000;
function test(fn, n) {
for (var i = 0; i < n; ++i) {
fn.call(this);
}
}
const FNS = [c, cb, l, lb];
for (const fn of FNS) {
test(fn, 100);
}
for (const fn of FNS) {
console.time(fn.name);
test(fn, N);
console.timeEnd(fn.name);
}
7358c93
to
18ad2d7
Compare
Let's let #13374 land first, then get this rebased and landed after that. |
:-( |
I understand the :-( but I'd like to get #13374 landed and pulled into a quick 8.x patch or minor release next week while letting this PR sit for a few weeks before pulling it in. |
When the input to Function.prototype.bind is a known function, we can inline the allocation of the JSBoundFunction into TurboFan, which provides a 2x speed-up for several hot functions in Node streams (as discovered by Matteo Collina). One of example of this can be found in nodejs/node#13322, which can be optimized and made more readable using bind instead of closures. [email protected] Review-Url: https://codereview.chromium.org/2916063002 Cr-Commit-Position: refs/heads/master@{#45679}
18ad2d7
to
153f870
Compare
Rebased. |
Still LGTM |
CITGM with |
Benchmark results with V8 5.9 in master:
|
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
PR-URL: #13322 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Matteo Collina <[email protected]>
I’ve removed this from the 8.2.0 proposal and labelled it dont-land so we can wait for #14161 to be resolved. |
So this should have likely been reverted as soon as we noticed issues in zlib. At this point there is a whole bunch of code that has landed on top of this PR in /cc @nodejs/tsc |
@MylesBorins I have add4b0a reverted and all tests running successfully. Most of the conflicts were due to the new internal error stuff. I'll open a PR. Any idea if e5dc934 needs to be reverted too (or perhaps even instead)? |
add4b0a made the assumption that compressed data would never lead to an empty decompressed stream. Fix that by explicitly checking the number of read bytes. Fixes: nodejs#17041 Refs: nodejs#13322
add4b0a made the assumption that compressed data would never lead to an empty decompressed stream. Fix that by explicitly checking the number of read bytes. PR-URL: #17042 Fixes: #17041 Refs: #13322 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Evan Lucas <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]>
add4b0a made the assumption that compressed data would never lead to an empty decompressed stream. Fix that by explicitly checking the number of read bytes. PR-URL: #17042 Fixes: #17041 Refs: #13322 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Evan Lucas <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]>
This PR brings various performance improvements to
stream.Transform
andzlib
. A good chunk of the performance increases come from the inlining ofstream.Transform
state initialization during stream instantiation. The rest of the increases come from changes tozlib
itself.Here is an overview of the notable
zlib
-specific changes:new
and without) when calling thezlib.create*()
methodsassert()
by putting them behind anif
containing their conditionalchunk
values)objectMode
/writableObjectMode
and/orencoding
are now explicitly overridden if set to non-defaults inoptions
passed to constructorswrite()
callback out of_transform()
for better reuse by other instances and removed the sync-specific codewrite()
callback at the C++ layer for faster invocation (avoiding a dynamic lookup on the handle's JS object for everywrite()
)chunk
value when multiplewrite()
s are needed for the same chunkzlibBuffer()
helper function for better reusability'data'
event listening instead ofstream.read()
Buffer.concat()
for single Buffer caseBuffer.concat()
for single Buffer casecallback()
previously shared by both sync and async functionalityHere are some results with the included benchmarks:
CI: https://ci.nodejs.org/job/node-test-pull-request/8378/
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)