Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: eliminate performance cost of ESM syntax detection for CommonJS entry points #52093

Merged
merged 16 commits into from
Mar 20, 2024

Conversation

GeoffreyBooth
Copy link
Member

@GeoffreyBooth GeoffreyBooth commented Mar 15, 2024

This PR eliminates the performance penalty incurred by --experimental-detect-module for the common use case of CommonJS entry points:

hyperfine --warmup 10 --runs 30 \
  './node ./test/fixtures/snapshot/typescript.js' \
  './node --experimental-detect-module ./test/fixtures/snapshot/typescript.js'

Benchmark 1: ./node ./test/fixtures/snapshot/typescript.js
  Time (mean ± σ):     135.0 ms ±   3.3 ms    [User: 117.5 ms, System: 13.9 ms]
  Range (min … max):   129.0 ms … 143.5 ms    30 runs

Benchmark 2: ./node --experimental-detect-module ./test/fixtures/snapshot/typescript.js
  Time (mean ± σ):     135.3 ms ±   2.8 ms    [User: 117.3 ms, System: 13.9 ms]
  Range (min … max):   130.0 ms … 143.1 ms    30 runs

Summary
  ./node ./test/fixtures/snapshot/typescript.js ran
    1.00 ± 0.03 times faster than ./node --experimental-detect-module ./test/fixtures/snapshot/typescript.js

It does this by changing the startup flow for “ambiguous” entry points (where there’s no explicit file extension or package.json "type" field). On current main, there’s an initial file load and CommonJS parse to determine whether or not the such ambiguous entry points should be handled by the CommonJS or the ESM loader; and then the chosen loader reloads and reparses the entry point. In this PR, ambiguous files are always sent into the CommonJS loader, and only if it throws an error do we check if we should retry the entry point as ESM via the ESM loader. Because nothing “extra” happens for the common case of CommonJS entry points that parse successfully as CommonJS, there is no performance penalty for incurred for CommonJS entry points by having --experimental-detect-module enabled by default.

I haven’t gone further to investigate if additional optimizations are possible within the ESM loader; there probably are. But this was significant enough that I thought it could probably land on its own.

Additional benchmarks
./node benchmark/compare.js --new ./out/Release/node --old ./node-main --filter startup misc | Rscript benchmark/compare.R
[00:22:32|% 100| 2/2 files | 60/60 runs | 4/4 configs]: Done
                                                                                           confidence improvement accuracy (*)   (**)  (***)
misc/startup-cli-version.js count=30 cli='deps/corepack/dist/corepack.js'                                 0.80 %       ±1.15% ±1.53% ±1.99%
misc/startup-cli-version.js count=30 cli='deps/npm/bin/npm-cli.js'                                        0.72 %       ±1.17% ±1.56% ±2.03%
misc/startup-cli-version.js count=30 cli='deps/npm/bin/npx-cli.js'                                        0.60 %       ±1.32% ±1.76% ±2.29%
misc/startup-cli-version.js count=30 cli='tools/node_modules/eslint/bin/eslint.js'                       -0.24 %       ±1.50% ±1.99% ±2.60%
misc/startup-core.js count=30 mode='process' script='benchmark/fixtures/require-builtins'                 0.40 %       ±1.15% ±1.53% ±2.00%
misc/startup-core.js count=30 mode='process' script='test/fixtures/semicolon'                             0.84 %       ±1.14% ±1.52% ±1.97%
misc/startup-core.js count=30 mode='worker' script='benchmark/fixtures/require-builtins'                  1.55 %       ±2.40% ±3.19% ±4.15%
misc/startup-core.js count=30 mode='worker' script='test/fixtures/semicolon'                             -0.23 %       ±2.56% ±3.40% ±4.43%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 8 comparisons, you can thus
expect the following amount of false-positive results:
  0.40 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.08 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

@nodejs/loaders @nodejs/performance

@GeoffreyBooth GeoffreyBooth added module Issues and PRs related to the module subsystem. performance Issues and PRs related to the performance of Node.js. esm Issues and PRs related to the ECMAScript Modules implementation. labels Mar 15, 2024
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 15, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Can we flip the flag in v22?

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely prefer this falling back from CJS loader behavior than the “check syntax first then choose the loader” approach before 👍 I think this approach was what I suggested at that time, not sure why it ended up the way it was. Though it would still be nicer if we could handle it via V8 caching but that can be a follow up.

src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
lib/internal/modules/cjs/loader.js Outdated Show resolved Hide resolved
@joyeecheung
Copy link
Member

Can you add the typescript.js bundle into the startup-core or startup-cli-version benchmark and run it in the benchmark CI?

@joyeecheung
Copy link
Member

joyeecheung commented Mar 15, 2024

Can we flip the flag in v22?

I think we should only flip the flag if we print a warning when the fallback is used (at least when it happens in non-entrypoints, or when the entry point is small enough). Users should mark the ESM explicitly wherever possible and be aware of the performance cost of the fallback. You probably don’t want thousands of your implicit .js ESMs to all get parsed as CJS first before falling back to ESM silently and wonder why your module loading is so slow…(and that could’ve been avoided if you just add a type filed to your package.json)

@mcollina
Copy link
Member

Agreed on the warning. It would be great to ship it given it only simplifies dx.

@GeoffreyBooth GeoffreyBooth force-pushed the optimize-cjs-detection branch from 278b15f to 5f7526c Compare March 15, 2024 17:43
@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Mar 15, 2024

Regarding the warning, I think we definitely want to avoid warning for two scenarios:

  • An extensionless file outside of any package.json scope, like /usr/local/bin/foo
  • Any files within node_modules

The user can’t add explicitness for the former without breaking their desired UX, and the user can’t fix the latter without patching dependencies.

The case I’m unsure on is node entry.js where there’s no package.json file (but there is an extension). The user could change the .js to .mjs or .cjs, but I’m not sure we want to push them to do so; they may have great reasons for needing to use .js (their editor, other tools, etc.). But if there’s no package.json present, it’s probably a single-file script or other very small script/app, where the performance overhead of detection is unlikely to be noticeable.

How about we warn only on ambiguous entry points where there is a package.json present? That probably covers most usage of detection, and if the user already has a package.json there shouldn’t be a reason that they can’t add a type field to it (whether that field’s value is module or commonjs).

@GeoffreyBooth GeoffreyBooth force-pushed the optimize-cjs-detection branch from 5f7526c to f4cd945 Compare March 15, 2024 18:48
@joyeecheung
Copy link
Member

joyeecheung commented Mar 16, 2024

the user can’t fix the latter without patching dependencies.

Isn't that already the case? Right now this would not work anyway (it errors), why should we allow it without a warning? It seems to be encouraging package authors to publish their .js code without a type field in package.json. Right now what we have requires them to put "type": "module" field in package.json, I see no reason why we should lure them into stopping that...

I agree we can do that for extension less files. Though I'd still prefer we have something better for indicating ESM for them (e.g. magical comments or directives - the use of magical comments isn't new anyway, e.g. for source maps or V8 compilation hints).

@GeoffreyBooth
Copy link
Member Author

I discovered that the warning was being printed incorrectly, because I was calling the error flow even when we were retrying as ESM. I fixed it so that the error flow is only used when we’re actually going to throw the CommonJS error, because we’re not retrying as ESM. So now the warning behavior is unchanged from before, which I think is fine for this PR. If we want to discuss warnings further that can be a separate PR.

@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth force-pushed the optimize-cjs-detection branch from 0fd062c to 737e831 Compare March 16, 2024 04:53
@GeoffreyBooth
Copy link
Member Author

Can you add the typescript.js bundle into the startup-core or startup-cli-version benchmark and run it in the benchmark CI?

Done for startup-core: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1495/console

The fixture isn’t the full TypeScript CLI. I don’t think there’s a way to get it to print the TypeScript version, so I didn’t add it to the startup-cli-version one.

@nodejs-github-bot
Copy link
Collaborator

@GeoffreyBooth GeoffreyBooth added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 17, 2024
@GeoffreyBooth
Copy link
Member Author

There were some failing tests until I moved the try/catch inside the check for --experimental-detect-module; so with that change, this PR should be ready to land, unless @joyeecheung you had any other notes (or I haven’t resolved any of your notes)?

The few tests that fail when the flag is enabled by default seem mostly to do with error output, because per this PR with the flag we’re catching and rethrowing the attempt to run the entry point. I think I might have a solution for the error output, though it’ll take some work to get working with source maps; or it might not be necessary at all if we unflag this at the same time as #51977.

lib/internal/modules/run_main.js Outdated Show resolved Hide resolved
lib/internal/modules/helpers.js Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.cc Outdated Show resolved Hide resolved
src/node_contextify.h Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot nodejs-github-bot added the commit-queue-failed An error occurred while landing this pull request using GitHub Actions. label Mar 19, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@richardlau richardlau added commit-queue Add this label to land a pull request using GitHub Actions. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. labels Mar 20, 2024
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Mar 20, 2024
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/52093
✔  Done loading data for nodejs/node/pull/52093
----------------------------------- PR info ------------------------------------
Title      module: eliminate performance cost of ESM syntax detection for CommonJS entry points (#52093)
   ⚠  Could not retrieve the email or name of the PR author's from user's GitHub profile!
Branch     GeoffreyBooth:optimize-cjs-detection -> nodejs:main
Labels     c++, module, lib / src, performance, esm, author ready, needs-ci, commit-queue-squash
Commits    16
 - module: eliminate performance cost of detection for cjs entry
 - DRY the C++ functions
 - restore warning to be printed only on error
 - add TypeScript fixture to benchmark
 - put try/catch inside our flag, to postpone for now having to deal wit…
 - Apply suggestions from code review
 - Remove cjsRetryAsESMCache
 - ScriptCompiler::kNoCompileOptions
 - export error messages from c++ as constants
 - USE()
 - format
 - review notes
 - Use string approach
 - simplify array conversion
 - lint
 - nit
Committers 1
 - Geoffrey Booth 
PR-URL: https://github.com/nodejs/node/pull/52093
Reviewed-By: Matteo Collina 
Reviewed-By: Yagiz Nizipli 
Reviewed-By: Joyee Cheung 
Reviewed-By: Jacob Smith 
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/52093
Reviewed-By: Matteo Collina 
Reviewed-By: Yagiz Nizipli 
Reviewed-By: Joyee Cheung 
Reviewed-By: Jacob Smith 
--------------------------------------------------------------------------------
   ⚠  Commits were pushed since the last approving review:
   ⚠  - lint
   ⚠  - nit
   ℹ  This PR was created on Fri, 15 Mar 2024 05:07:50 GMT
   ✔  Approvals: 4
   ✔  - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/52093#pullrequestreview-1938208612
   ✔  - Yagiz Nizipli (@anonrig) (TSC): https://github.com/nodejs/node/pull/52093#pullrequestreview-1944916206
   ✔  - Joyee Cheung (@joyeecheung) (TSC): https://github.com/nodejs/node/pull/52093#pullrequestreview-1946897096
   ✔  - Jacob Smith (@JakobJingleheimer): https://github.com/nodejs/node/pull/52093#pullrequestreview-1946928149
   ✔  Last GitHub CI successful
   ℹ  Last Full PR CI on 2024-03-20T12:55:16Z: https://ci.nodejs.org/job/node-test-pull-request/57855/
   ℹ  Last Benchmark CI on 2024-03-19T22:34:34Z: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1495/console
- Querying data for job/node-test-pull-request/57855/
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  Aborted `git node land` session in /home/runner/work/node/node/.ncu
https://github.com/nodejs/node/actions/runs/8361463879

@GeoffreyBooth GeoffreyBooth added commit-queue Add this label to land a pull request using GitHub Actions. and removed commit-queue-failed An error occurred while landing this pull request using GitHub Actions. commit-queue Add this label to land a pull request using GitHub Actions. labels Mar 20, 2024
@GeoffreyBooth GeoffreyBooth added the commit-queue Add this label to land a pull request using GitHub Actions. label Mar 20, 2024
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Mar 20, 2024
@nodejs-github-bot nodejs-github-bot merged commit 8bc7459 into nodejs:main Mar 20, 2024
71 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 8bc7459

@GeoffreyBooth GeoffreyBooth deleted the optimize-cjs-detection branch March 20, 2024 15:48
rdw-msft pushed a commit to rdw-msft/node that referenced this pull request Mar 26, 2024
PR-URL: nodejs#52093
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Reviewed-By: Joyee Cheung <[email protected]>
Reviewed-By: Jacob Smith <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
@marco-ippolito marco-ippolito added the backport-requested-v20.x PRs awaiting manual backport to the v20.x-staging branch. label May 21, 2024
@marco-ippolito
Copy link
Member

Can you create a manual backport for v20? It doesnt land cleanly

@targos targos added dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. and removed backport-requested-v20.x PRs awaiting manual backport to the v20.x-staging branch. labels Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. dont-land-on-v20.x PRs that should not land on the v20.x-staging branch and should not be released in v20.x. esm Issues and PRs related to the ECMAScript Modules implementation. lib / src Issues and PRs related to general changes in the lib or src directory. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants