Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process: small nextTick cleanup, including removing manual args copying #19153

Closed
wants to merge 3 commits into from

Conversation

apapirovski
Copy link
Member

@apapirovski apapirovski commented Mar 5, 2018

A couple of small changes to nextTick, mostly dependant on V8 6.4+ so we shouldn't back-port these.

  1. Instead of manually copying the arguments into a new Array, use rest syntax to gather up the arguments as it's actually more performant starting with V8 6.4.

  2. As of V8 6.4 it appears to no longer be necessary to preset the callback on TickObject to null to avoid hidden class function tracking. The benchmark perf is now equivalent regardless. (Edit: this doesn't seem to apply universally so it's possible the reasoning here is different. In Timers, I'm still unable to get rid of similar code without tanking the performance 3x. Strangely we have a similar benchmark for both timers & nextTick but clearly something else is going on...)

  3. This is a temporary commit that introduces the 1st change to internalNextTick but will be made redundant once async_hooks,process: remove internalNextTick #19147 lands.

There's a performance benefit to this change on most of our nextTick benchmarks except for two but the regression there is smaller than the overall gains. It also slightly improves some of the streams benchmarks which should be more representative of real usage. Edit: see below for more info.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

Instead of manually copying the arguments into a new Array,
use rest syntax to gather up the arguments as it's actually
more performant starting with V8 6.4.
As of V8 6.4 it appears to no longer be necessary to
preset the callback on TickObject to null to avoid
hidden class function tracking. The benchmark perf
is now equivalent regardless.
@apapirovski apapirovski added process Issues and PRs related to the process subsystem. dont-land-on-v4.x performance Issues and PRs related to the performance of Node.js. labels Mar 5, 2018
@nodejs-github-bot nodejs-github-bot added the process Issues and PRs related to the process subsystem. label Mar 5, 2018
@apapirovski
Copy link
Member Author

apapirovski commented Mar 5, 2018

Mostly what I expected from the benchmark (depth.js & exec-args.js regressions, improvements to breadth-args.js and depth-args.js), although very different result for process/next-tick-exec.js...

                                              confidence improvement accuracy (*)   (**)  (***)
 process/next-tick-breadth-args.js millions=4        ***     17.45 %       ±1.92% ±2.53% ±3.25%
 process/next-tick-breadth.js millions=4                      0.94 %       ±2.77% ±3.65% ±4.69%
 process/next-tick-depth-args.js millions=12         ***      6.23 %       ±1.53% ±2.02% ±2.60%
 process/next-tick-depth.js millions=12              ***     -5.24 %       ±1.54% ±2.03% ±2.61%
 process/next-tick-exec-args.js millions=5           ***    -16.35 %       ±1.55% ±2.05% ±2.63%
 process/next-tick-exec.js millions=5                ***    -22.90 %       ±1.97% ±2.60% ±3.34%

Here's my local run for next-tick-exec.js (only 30 runs, didn't have time for 100):

                                      confidence improvement accuracy (*)   (**)  (***)
 process/next-tick-exec.js millions=5        ***      6.25 %       ±3.29% ±4.42% ±5.85%

@apapirovski
Copy link
Member Author

@bmeurer
Copy link
Member

bmeurer commented Mar 6, 2018

If you have uses of the rest parameters object that are not apply or friends, then you'll force materialization of the actual JSArray. If Reflect.apply(f, o, args) is the only user of args, then we completely avoid the allocation of the array object.

@apapirovski
Copy link
Member Author

No clue what's going on with the perf differences on my Mac vs the benchmark CI:

                                                       confidence improvement accuracy (*)   (**)  (***)
 streams/pipe.js n=5000000                                            -0.90 %       ±1.78% ±2.35% ±3.01%
 streams/pipe-object-mode.js n=5000000                         **     -3.36 %       ±2.24% ±2.95% ±3.79%
 streams/readable-bigread.js n=1000                                    1.26 %       ±1.30% ±1.72% ±2.21%
 streams/readable-bigunevenread.js n=1000                              0.13 %       ±1.20% ±1.58% ±2.03%
 streams/readable-boundaryread.js type='buffer' n=2000                -0.36 %       ±0.85% ±1.13% ±1.45%
 streams/readable-boundaryread.js type='string' n=2000          *     -1.09 %       ±0.90% ±1.19% ±1.52%
 streams/readable-readall.js n=5000                                    0.47 %       ±1.02% ±1.34% ±1.73%
 streams/readable-unevenread.js n=1000                                 4.88 %       ±4.99% ±6.59% ±8.46%
 streams/transform-creation.js n=1000000                               1.10 %       ±2.38% ±3.14% ±4.04%
 streams/writable-manywrites.js n=2000000                     ***     -5.98 %       ±3.11% ±4.11% ±5.28%

My system:

confidence improvement accuracy (*)   (**)  (***)
streams/writable-manywrites.js n=2000000        ***     19.60 %       ±3.26% ±4.38% ±5.79%

Having a quick look at the raw data, it seems like the benchmark CI results have a much wider spread (lots of outliers) vs my local results which are always roughly the same for each run with very few outliers. Perhaps some GC differences or something?

@apapirovski apapirovski added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 6, 2018
@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 11, 2018
@BridgeAR
Copy link
Member

I removed the author-ready label again as AFAIK it should only be applied with at least one approval.

@BridgeAR
Copy link
Member

@nodejs/process @nodejs/timers PTAL

@benjamingr
Copy link
Member

Can you run the benchmarks again with @bmeurer's suggestion? The code changes LGTM but I trust the CI's benchmarks.

@apapirovski
Copy link
Member Author

@benjamingr i don't think there's an applicable suggestion here. It was a response to a question I had deleted. I need to run a benchmark against v8 6.5 now.

@BridgeAR
Copy link
Member

BridgeAR commented Apr 9, 2018

Ping @apapirovski

@apapirovski
Copy link
Member Author

Won't have time to look at this for a while but the rest param performance profile when storing the resulting array is a bit sketchy at the moment. The benchmarks seem mostly fine for nextTick but I see a huge regression with setTimeout and setImmediate for the same change which tells me that the micro-benchmarks might not be fully representative.

@BridgeAR
Copy link
Member

@apapirovski yes, I fear that a couple of our benchmarks do not yield the results that we expect. If I find the time, I will have a look at some to try to improve them.

@apapirovski
Copy link
Member Author

@BridgeAR I think there's also probably more to it that can't easily be tested using micro benchmarks. I know streams use nextTick extensively so I might try testing this against those sometime. It's a bit closer to real usage at least.

@apapirovski apapirovski deleted the patch-nexttick-es branch April 14, 2018 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. process Issues and PRs related to the process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants