Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: throw errors from sync branches instead of separate implementations #49913

Merged
merged 2 commits into from
Sep 30, 2023

Conversation

joyeecheung
Copy link
Member

Previously to throw errors from C++ land, sync versions of the fs were created by copying C++ code from the original implementation and moving JS code to a separate file. This can lead to several problems:

  1. By moving code to a new file for the sake of moving, it would be harder to use git blame to trace changes and harder to backport changes to older branches.
  2. Scattering the async and sync versions of fs methods in different files makes it harder to keep them in sync and share code in the prologues and epilogues.
  3. Having two copies of code doing almost the same thing results in duplication and can be prone to out-of-sync problems when the prologue and epilogue get updated.
  4. There is a minor cost to startup in adding an additional file. This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4 and introduces C++ helpers SyncCallAndThrowIf() and SyncCallAndThrowOnError() so that the original implementations can be easily tweaked to allow throwing from C++ and stop 3.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Sep 28, 2023
Previously to throw errors from C++ land, sync versions of the fs
were created by copying C++ code from the original implementation
and moving JS code to a separate file. This can lead to several
problems:

1. By moving code to a new file for the sake of moving, it would
  be harder to use git blame to trace changes and harder to backport
  changes to older branches.
2. Scattering the async and sync versions of fs methods in
  different files makes it harder to keep them in sync and
  share code in the prologues and epilogues.
3. Having two copies of code doing almost the same thing results
  in duplication and can be prone to out-of-sync problems when the
  prologue and epilogue get updated.
4. There is a minor cost to startup in adding an additional file.
  This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4
and introduces C++ helpers SyncCallAndThrowIf() and
SyncCallAndThrowOnError() so that the original implementations
can be easily tweaked to allow throwing from C++ and stop 3.
@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 28, 2023
@joyeecheung joyeecheung requested a review from anonrig September 28, 2023 01:17
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 28, 2023
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts:

  • Moving to sync.js to fs.js seems reasonable and understandable.
  • I don't think Fsreqwrap brings any good. I actually believe it brings unnecessary complexity just to avoid a if statement wrapping a trace function.
  • Even though, I like your approach I'm extremely frustrated with the inevitable toll this pull request brings to the existing pull requests.

I'm fine with merging this, with the change of the removal of syncfsreq class but I prefer to merge the existing pull requests first to land this.

cc @CanadaHonk

src/node_file-inl.h Outdated Show resolved Hide resolved
FSReqWrapSync* req_wrap,
Func fn,
Args... args) {
env->PrintSyncTrace();
Copy link
Member

@anonrig anonrig Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lack of this call causes a bug, I recommend adding a test to address your concerns in the description

Copy link
Member Author

@joyeecheung joyeecheung Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a test (test/parallel/test-sync-io-option.js), but it only checks fs.statSync. Not sure if it's worth it to add a test for every single fs methods, I think it's testing under the assumption that fs methods would share this bit of code so if one is okay, others are fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unknowingly broke it. We should eventually add it.

src/node_file.h Show resolved Hide resolved
@benjamingr
Copy link
Member

benjamingr commented Sep 28, 2023

@joyeecheung
Copy link
Member Author

joyeecheung commented Sep 28, 2023

I like your approach I'm extremely frustrated with the inevitable toll this pull request brings to the existing pull requests.

Sorry that I notice about this too late, but I think at least if we do this soon enough there would be less toll in the future. If we land the existing pull requests some of the problems mentioned in the OP would be difficult to undo (they will create irreversible effect in git history and make git blame and bakcporting harder). Also this PR can make future pull requests much simpler - just a few lines of changes changing SyncCall to SyncCallAndThrowOnError in C++, then remove ctx creation and handleErrorFromBinding call in JS land, which probably amount to <10 lines of changes for each method, instead of the 20-50 lines of copied code added in C++ and another ~10 lines of code moved in JS land.

Note that the previous approach would also create out-of-sync problems if any of the fs methods being optimized is getting fixed in another PR (obviously we can't block bug fixing just because things are getting optimized), then you need to propogate the fix to the other implementation, and depending on the timing of landing, one of them can be landed without thinking about the other but you have no git conflicts to warn you, which is a bad thing in this case as now the bug lurks again in the other implementation.

Moving to sync.js to fs.js seems reasonable and understandable.

I would disagree. The ultimate difference between sync and async version is usually just if/when a callback should be invoked, most of the prologue leading up to the libuv call is and should be shared to avoid getting out of sync - if we can't, at least we can place them close enough to pay attention. It would be easy to accidentally change the prologue of fs.methodSync without doing the same to fs.method if we keep them very separate, especially if we do things in bulk in FS and make the individual changes hard to notice in the review (instead of expanding to see the other implementation above or below, you have to look for the same thing in a different file, and doing that for multiple methods can be very tiring).

@joyeecheung
Copy link
Member Author

joyeecheung commented Sep 28, 2023

I started another benchmar CI run because the one started by @benjamingr had connection issues (queued in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1428/). However I am seeing unexpected performance improvements locally 🤷‍♀️ Perhaps the benchmarks aren't very stable, or maybe my local fs is too fast so some JS-land factor (e.g. inlining) dominated the numbers.

                                                                                              confidence improvement accuracy (*)    (**)   (***)
fs/bench-accessSync.js n=100000 type='existing'                                                               1.19 %      ±11.82% ±15.72% ±20.46%
fs/bench-accessSync.js n=100000 type='non-existing'                                                          11.42 %      ±11.83% ±15.74% ±20.49%
fs/bench-accessSync.js n=100000 type='non-flat-existing'                                                      2.42 %      ±11.82% ±15.73% ±20.48%
fs/bench-copyFileSync.js n=10000 type='invalid'                                                      ***      8.55 %       ±2.45%  ±3.27%  ±4.25%
fs/bench-copyFileSync.js n=10000 type='valid'                                                                -2.26 %       ±6.10%  ±8.11% ±10.56%
fs/bench-existsSync.js n=1000000 type='existing'                                                             -0.45 %       ±1.57%  ±2.09%  ±2.73%
fs/bench-existsSync.js n=1000000 type='non-existing'                                                         -1.01 %       ±1.32%  ±1.75%  ±2.29%
fs/bench-existsSync.js n=1000000 type='non-flat-existing'                                                     0.29 %       ±1.31%  ±1.74%  ±2.27%
fs/bench-opendirSync.js n=1000 type='existing'                                                                0.37 %       ±0.60%  ±0.79%  ±1.04%
fs/bench-opendirSync.js n=1000 type='non-existing'                                                    **      2.05 %       ±1.53%  ±2.04%  ±2.67%
fs/bench-openSync.js n=100000 type='existing'                                                                 0.14 %       ±0.80%  ±1.07%  ±1.41%
fs/bench-openSync.js n=100000 type='non-existing'                                                    ***      6.63 %       ±1.22%  ±1.63%  ±2.13%
fs/bench-readdirSync.js withFileTypes='false' dir='lib' n=10                                                  0.43 %       ±2.04%  ±2.72%  ±3.54%
fs/bench-readdirSync.js withFileTypes='false' dir='test/parallel' n=10                                       -0.74 %       ±1.86%  ±2.49%  ±3.26%
fs/bench-readdirSync.js withFileTypes='true' dir='lib' n=10                                            *     -3.59 %       ±3.09%  ±4.11%  ±5.36%
fs/bench-readdirSync.js withFileTypes='true' dir='test/parallel' n=10                                         1.01 %       ±2.16%  ±2.88%  ±3.75%
fs/bench-realpathSync.js pathType='relative' n=10000                                                         -0.16 %       ±1.07%  ±1.43%  ±1.86%
fs/bench-realpathSync.js pathType='resolved' n=10000                                                          0.56 %       ±2.24%  ±3.01%  ±3.96%
fs/bench-statSync-failure.js statSyncType='noThrow' n=1000000                                                 0.28 %       ±1.30%  ±1.73%  ±2.25%
fs/bench-statSync-failure.js statSyncType='throw' n=1000000                                          ***      7.36 %       ±0.78%  ±1.04%  ±1.35%
fs/bench-statSync.js statSyncType='fstatSync' n=1000000                                                       0.38 %       ±0.71%  ±0.95%  ±1.24%
fs/bench-statSync.js statSyncType='lstatSync' n=1000000                                                      -0.05 %       ±0.33%  ±0.44%  ±0.57%
fs/bench-statSync.js statSyncType='statSync' n=1000000                                                        0.29 %       ±0.50%  ±0.67%  ±0.87%
fs/bench-unlinkSync.js n=1000 type='existing'                                                                -6.02 %       ±9.08% ±12.09% ±15.74%
fs/bench-unlinkSync.js n=1000 type='non-existing'                                                             3.71 %      ±11.41% ±15.19% ±19.78%
fs/readFileSync.js n=10000 hasFileDescriptor='false' path='existing' encoding='undefined'                    -2.18 %       ±3.97%  ±5.32%  ±6.99%
fs/readFileSync.js n=10000 hasFileDescriptor='false' path='existing' encoding='utf8'                   *      1.34 %       ±1.01%  ±1.34%  ±1.74%
fs/readFileSync.js n=10000 hasFileDescriptor='false' path='non-existing' encoding='undefined'        ***      3.82 %       ±0.95%  ±1.27%  ±1.66%
fs/readFileSync.js n=10000 hasFileDescriptor='false' path='non-existing' encoding='utf8'             ***      2.07 %       ±0.85%  ±1.13%  ±1.48%
fs/readFileSync.js n=10000 hasFileDescriptor='true' path='existing' encoding='undefined'                     -0.20 %       ±1.67%  ±2.23%  ±2.91%
fs/readFileSync.js n=10000 hasFileDescriptor='true' path='existing' encoding='utf8'                          -2.38 %       ±3.46%  ±4.63%  ±6.09%
fs/readFileSync.js n=10000 hasFileDescriptor='true' path='non-existing' encoding='undefined'                  0.29 %       ±0.86%  ±1.14%  ±1.49%
fs/readFileSync.js n=10000 hasFileDescriptor='true' path='non-existing' encoding='utf8'              ***      3.45 %       ±0.89%  ±1.18%  ±1.54%

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced with the benefits of this pull request, but merging this pull request will block 13 pending pull requests from 6 different contributors. We should merge this once we landed those pull requests to avoid any more frustrations.

Ref: https://github.com/nodejs/node/pulls?q=is%3Aopen+is%3Apr+label%3Aperformance+fs

@CanadaHonk
Copy link
Member

Fwiw my opinion: I think the approach of this PR is good and better than the existing system. I don't think having open PRs using the "old technique" should block this; PR authors (hopefully) should be able to rewrite easily as they are still relatively small changes (or they can always close and someone else can try).

Being a bit more controversial, it would probably aid reviewers by possibly creating one large PR after this and closing existing error perf PRs to use this new technique for ~all sync functions (with pure binding impls/only JS validation and no JS logic) instead of having many small PRs. However, this would make benchmarking results much more difficult being in one PR (and possibly having to create many many benchmark files). Plus, it would block new contributors doing these relatively small but helpful changes via doing one monolithic PR instead (not sure if that would be a concern or not).

Regardless, cool work :)

src/node_file.cc Outdated Show resolved Hide resolved
src/node_file.cc Outdated Show resolved Hide resolved
@joyeecheung
Copy link
Member Author

joyeecheung commented Sep 28, 2023

I think we should do it the other way around - once you land those other fs PRs, the effect they have on git history (for git blame and backports) would be impossible to undo, unless we want to land them all at once, edit all of them and force push, while blocking all other unrelated PRs from landing, but then we might as well just land this first and then ask other PRs to migrate, so that we have a clean git history.

@Qard
Copy link
Member

Qard commented Sep 28, 2023

I agree with Joyee on all points except that I think the splitting of the JS code to a separate sync file is fine, especially if the JS portion of the code is fairly minimal. We already do this with promises, and it makes it easier to only load the parts needed. To me the ideal state would be for the majority of the fs code to all be single C++ functions with helpers to handle sync, callback, and promise forms all in the same function and then have JS files just to expose the different forms and do little else. I do think it's reasonable to hold off on such a split until that unification of the underlying C++ functions is in better shape though.

From a perf perspective it might make sense for each to have its own separate C++ function at some point, but only if we can isolate the common parts to helpers that can be reused between each of the forms. Duplicating a bunch of code across different forms is not great. There's a high risk of things getting out-of-sync and that can be a major source of bugs. Performance is important, but not at the risk of stability.

It's unfortunate that there's a bunch of PRs in conflict with this, but I agree that it's best to not churn the git history further before landing a cleaner upgrade path.

@joyeecheung joyeecheung added the request-ci Add this label to start a Jenkins CI on a PR. label Sep 28, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Sep 28, 2023
@anonrig anonrig added the commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. label Sep 29, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Sep 30, 2023
@nodejs-github-bot nodejs-github-bot merged commit 813713f into nodejs:main Sep 30, 2023
@nodejs-github-bot
Copy link
Collaborator

Landed in 813713f

GeoffreyBooth pushed a commit to GeoffreyBooth/node that referenced this pull request Oct 1, 2023
Previously to throw errors from C++ land, sync versions of the fs
were created by copying C++ code from the original implementation
and moving JS code to a separate file. This can lead to several
problems:

1. By moving code to a new file for the sake of moving, it would
  be harder to use git blame to trace changes and harder to backport
  changes to older branches.
2. Scattering the async and sync versions of fs methods in
  different files makes it harder to keep them in sync and
  share code in the prologues and epilogues.
3. Having two copies of code doing almost the same thing results
  in duplication and can be prone to out-of-sync problems when the
  prologue and epilogue get updated.
4. There is a minor cost to startup in adding an additional file.
  This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4
and introduces C++ helpers SyncCallAndThrowIf() and
SyncCallAndThrowOnError() so that the original implementations
can be easily tweaked to allow throwing from C++ and stop 3.

PR-URL: nodejs#49913
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
pluris pushed a commit to pluris/node that referenced this pull request Oct 1, 2023
pluris pushed a commit to pluris/node that referenced this pull request Oct 1, 2023
pluris pushed a commit to pluris/node that referenced this pull request Oct 1, 2023
pluris pushed a commit to pluris/node that referenced this pull request Oct 1, 2023
alexfernandez pushed a commit to alexfernandez/node that referenced this pull request Nov 1, 2023
Previously to throw errors from C++ land, sync versions of the fs
were created by copying C++ code from the original implementation
and moving JS code to a separate file. This can lead to several
problems:

1. By moving code to a new file for the sake of moving, it would
  be harder to use git blame to trace changes and harder to backport
  changes to older branches.
2. Scattering the async and sync versions of fs methods in
  different files makes it harder to keep them in sync and
  share code in the prologues and epilogues.
3. Having two copies of code doing almost the same thing results
  in duplication and can be prone to out-of-sync problems when the
  prologue and epilogue get updated.
4. There is a minor cost to startup in adding an additional file.
  This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4
and introduces C++ helpers SyncCallAndThrowIf() and
SyncCallAndThrowOnError() so that the original implementations
can be easily tweaked to allow throwing from C++ and stop 3.

PR-URL: nodejs#49913
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
targos pushed a commit that referenced this pull request Nov 11, 2023
Previously to throw errors from C++ land, sync versions of the fs
were created by copying C++ code from the original implementation
and moving JS code to a separate file. This can lead to several
problems:

1. By moving code to a new file for the sake of moving, it would
  be harder to use git blame to trace changes and harder to backport
  changes to older branches.
2. Scattering the async and sync versions of fs methods in
  different files makes it harder to keep them in sync and
  share code in the prologues and epilogues.
3. Having two copies of code doing almost the same thing results
  in duplication and can be prone to out-of-sync problems when the
  prologue and epilogue get updated.
4. There is a minor cost to startup in adding an additional file.
  This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4
and introduces C++ helpers SyncCallAndThrowIf() and
SyncCallAndThrowOnError() so that the original implementations
can be easily tweaked to allow throwing from C++ and stop 3.

PR-URL: #49913
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
codebytere added a commit to electron/electron that referenced this pull request Dec 11, 2023
* chore: bump node in DEPS to v20.10.0

* chore: update feat_initialize_asar_support.patch

no code changes; patch just needed an update due to nearby upstream changes

Xref: nodejs/node#49986

* chore: update pass_all_globals_through_require.patch

no manual changes; patch applied with fuzz

Xref: nodejs/node#49657

* chore: update refactor_allow_embedder_overriding_of_internal_fs_calls

Xref: nodejs/node#49912

no code changes; patch just needed an update due to nearby upstream changes

* chore: update chore_allow_the_node_entrypoint_to_be_a_builtin_module.patch

Xref: nodejs/node#49986

minor manual changes needed to sync with upstream change

* update fix_expose_the_built-in_electron_module_via_the_esm_loader.patch

Xref: nodejs/node#50096
Xref: nodejs/node#50314
in lib/internal/modules/esm/load.js, update the code that checks for
`format === 'electron'`. I'd like 👀 on this

Xref: nodejs/node#49657
add braces in lib/internal/modules/esm/translators.js to sync with upstream

* fix: lazyload fs in esm loaders to apply asar patches

* nodejs/node#50127
* nodejs/node#50096

* esm: jsdoc for modules code

nodejs/node#49523

* test: set test-cli-node-options as flaky

nodejs/node#50296

* deps: update c-ares to 1.20.1

nodejs/node#50082

* esm: bypass CommonJS loader under --default-type=module

nodejs/node#49986

* deps: update uvwasi to 0.0.19

nodejs/node#49908

* lib,test: do not hardcode Buffer.kMaxLength

nodejs/node#49876

* crypto: account for disabled SharedArrayBuffer

nodejs/node#50034

* test: fix edge snapshot stack traces

nodejs/node#49659

* src: generate snapshot with --predictable

nodejs/node#48749

* chore: fixup patch indices

* fs: throw errors from sync branches instead of separate implementations

nodejs/node#49913

* crypto: ensure valid point on elliptic curve in SubtleCrypto.importKey

nodejs/node#50234

* esm: detect ESM syntax in ambiguous JavaScrip

nodejs/node#50096

* fixup! test: fix edge snapshot stack traces

* esm: unflag extensionless ES module JavaScript and Wasm in module scope

nodejs/node#49974

* [tagged-ptr] Arrowify objects

https://chromium-review.googlesource.com/c/v8/v8/+/4705331

---------

Co-authored-by: electron-roller[bot] <84116207+electron-roller[bot]@users.noreply.github.com>
Co-authored-by: Charles Kerr <[email protected]>
Co-authored-by: Shelley Vohr <[email protected]>
debadree25 pushed a commit to debadree25/node that referenced this pull request Apr 15, 2024
Previously to throw errors from C++ land, sync versions of the fs
were created by copying C++ code from the original implementation
and moving JS code to a separate file. This can lead to several
problems:

1. By moving code to a new file for the sake of moving, it would
  be harder to use git blame to trace changes and harder to backport
  changes to older branches.
2. Scattering the async and sync versions of fs methods in
  different files makes it harder to keep them in sync and
  share code in the prologues and epilogues.
3. Having two copies of code doing almost the same thing results
  in duplication and can be prone to out-of-sync problems when the
  prologue and epilogue get updated.
4. There is a minor cost to startup in adding an additional file.
  This can add up even with the help of snapshots.

This patch moves the JS code back to lib/fs.js to stop 1, 2 & 4
and introduces C++ helpers SyncCallAndThrowIf() and
SyncCallAndThrowOnError() so that the original implementations
can be easily tweaked to allow throwing from C++ and stop 3.

PR-URL: nodejs#49913
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Yagiz Nizipli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants