Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: re-enable V8 concurrent marking #41013

Closed
wants to merge 1 commit into from

Conversation

targos
Copy link
Member

@targos targos commented Nov 29, 2021

It was unintentionally disabled during a V8 update.

Fixes: #41012

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. v8 engine Issues and PRs related to the V8 dependency. labels Nov 29, 2021
@targos targos added lts-watch-v16.x request-ci Add this label to start a Jenkins CI on a PR. labels Nov 29, 2021
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 29, 2021
@nodejs-github-bot
Copy link
Collaborator

@targos
Copy link
Member Author

targos commented Nov 29, 2021

This breaks a test on my machine:

$  out/Release/node /Users/targos/git/nodejs/node/test/parallel/test-heapsnapshot-near-heap-limit-worker.js

Invoked NearHeapLimitCallback, processing=false, current_limit=52428800, initial_limit=52428800
max_young_gen_size=50331648, young_gen_size=0, old_gen_size=50959592, total_size=50959592
Estimated available memory=766459904, estimated overhead=50331648
Start generating Heap.20211129.113347.20003.1.001.heapsnapshot...
Wrote snapshot to /Users/targos/git/nodejs/node/test/.tmp.0/Heap.20211129.113347.20003.1.001.heapsnapshot
node:events:368
      throw er; // Unhandled 'error' event
      ^

Error [ERR_WORKER_OUT_OF_MEMORY]: Worker terminated due to reaching memory limit: JS heap out of memory
    at new NodeError (node:internal/errors:371:5)
    at Worker.[kOnExit] (node:internal/worker:276:26)
    at Worker.<computed>.onexit (node:internal/worker:198:20)
Emitted 'error' event on Worker instance at:
    at Worker.[kOnExit] (node:internal/worker:276:12)
    at Worker.<computed>.onexit (node:internal/worker:198:20) {
  code: 'ERR_WORKER_OUT_OF_MEMORY'
}

Node.js v18.0.0-pre

@joyeecheung

@nodejs-github-bot
Copy link
Collaborator

@joyeecheung
Copy link
Member

joyeecheung commented Nov 29, 2021

This breaks a test on my machine

What was the failure? I think the logs are just what we log in the tests (even if it passes)?

@targos
Copy link
Member Author

targos commented Nov 29, 2021

Oh you're right, the test is probably just flaky. I'll open a separate issue.

@targos
Copy link
Member Author

targos commented Nov 29, 2021

@joyeecheung it's only flaky with this change, so I'll post here:

tools/test.py --repeat 100 parallel/test-heapsnapshot-near-heap-limit-worker
...
=== release test-heapsnapshot-near-heap-limit-worker ===                   
Path: parallel/test-heapsnapshot-near-heap-limit-worker
Invoked NearHeapLimitCallback, processing=false, current_limit=52428800, initial_limit=52428800
max_young_gen_size=50331648, young_gen_size=0, old_gen_size=43712608, total_size=43712608
Estimated available memory=124911616, estimated overhead=50331648
Start generating Heap.20211129.213557.21128.1.001.heapsnapshot...
Wrote snapshot to /Users/targos/git/nodejs/node/test/.tmp.96/Heap.20211129.213557.21128.1.001.heapsnapshot

<--- Last few GCs --->
al[21128:0x130008000]     2829 ms: Mark-sweep 41.6 (60.4) -> 41.4 (60.7) MB, 0.8 / 0.0 ms  (+ 1.3 ms in 3 steps since start of marking, biggest step 1.1 ms, walltime since start of marking 2 ms) (average mu = 0.376, current mu = 0.377) finalize incremental [21128:0x130008000]     2836 ms: Mark-sweep 41.7 (60.7) -> 41.4 (60.7) MB, 2.8 / 0.0 ms  (+ 1.9 ms in 3 steps since start of marking, biggest step 1.9 ms, walltime since start of marking 5 ms) (average mu = 0.346, current mu = 0.329) finalize incremental 

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10049160c node::Abort() [/Users/targos/git/nodejs/node/out/Release/node]
 2: 0x10049178c node::OnFatalError(char const*, char const*) [/Users/targos/git/nodejs/node/out/Release/node]
 3: 0x1005e16c4 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/Users/targos/git/nodejs/node/out/Release/node]
 4: 0x1005e1658 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/targos/git/nodejs/node/out/Release/node]
 5: 0x10076496c v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/Users/targos/git/nodejs/node/out/Release/node]
 6: 0x100767f04 v8::internal::Heap::CollectSharedGarbage(v8::internal::GarbageCollectionReason) [/Users/targos/git/nodejs/node/out/Release/node]
 7: 0x100765138 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/Users/targos/git/nodejs/node/out/Release/node]
 8: 0x100762a38 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/targos/git/nodejs/node/out/Release/node]
 9: 0x10076aebc v8::internal::Heap::FinalizeIncrementalMarkingIfComplete(v8::internal::GarbageCollectionReason) [/Users/targos/git/nodejs/node/out/Release/node]
10: 0x1007785f0 v8::internal::IncrementalMarkingJob::Task::RunInternal() [/Users/targos/git/nodejs/node/out/Release/node]
11: 0x1004eee00 node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task> >) [/Users/targos/git/nodejs/node/out/Release/node]
12: 0x1004edb40 node::PerIsolatePlatformData::FlushForegroundTasksInternal() [/Users/targos/git/nodejs/node/out/Release/node]
13: 0x100d219dc uv__async_io [/Users/targos/git/nodejs/node/out/Release/node]
14: 0x100d33540 uv__io_poll [/Users/targos/git/nodejs/node/out/Release/node]
15: 0x100d21e6c uv_run [/Users/targos/git/nodejs/node/out/Release/node]
16: 0x1003dc470 node::SpinEventLoop(node::Environment*) [/Users/targos/git/nodejs/node/out/Release/node]
17: 0x1005228e0 node::worker::Worker::Run() [/Users/targos/git/nodejs/node/out/Release/node]
18: 0x100525888 node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) [/Users/targos/git/nodejs/node/out/Release/node]
19: 0x1b32d9240 _pthread_start [/usr/lib/system/libsystem_pthread.dylib]
20: 0x1b32d4024 thread_start [/usr/lib/system/libsystem_pthread.dylib]
node:assert:400
    throw err;
    ^

AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:

  assert(stderr.includes('ERR_WORKER_OUT_OF_MEMORY'))

    at Object.<anonymous> (/Users/targos/git/nodejs/node/test/parallel/test-heapsnapshot-near-heap-limit-worker.js:34:5)
    at Module._compile (node:internal/modules/cjs/loader:1097:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1149:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47 {
  generatedMessage: true,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Node.js v18.0.0-pre
Command: out/Release/node /Users/targos/git/nodejs/node/test/parallel/test-heapsnapshot-near-heap-limit-worker.js
[00:42|% 100|+  85|-  15]: Done 

@targos
Copy link
Member Author

targos commented Nov 30, 2021

@targos
Copy link
Member Author

targos commented Nov 30, 2021

The stress test failed on windows but not linux or macos:

10:55:36 1000   OK: 786   NOT OK: 214   TOTAL: 1000

@joyeecheung
Copy link
Member

hmm, so it seems this hits this branch here because we are not increasing the heap limit in the near heap limit callback - since we are performing a heap snapshot which is triggering GC, and we are confident that this would release memory anyway, we simply return the initial limit back after the snapshot is written, but the branch checks that the returned it's strictly bigger than the initial limit or it will just crash. I wonder why we haven't bumped into this failure before until concurrent marking is enabled...

@ulan Is it better that we just return a slightly bigger limit (+1 maybe) in this case? Or can the NearHeapLimit API be loosen up a little to accept the original limit, if the embedder knows that they can do something to free up the memory in the callback?

@ulan
Copy link
Contributor

ulan commented Nov 30, 2021

@joyeecheung: adding 1 should work if you're sure that the GC is going to free a lot of memory. Note that 1) performing GC doesn't necessarily mean that many objects will be freed (it could be that all objects are reachable) 2) the callback is invoked when V8 thinks that promoting the young generation in the current GC may go over the old generation limit. So the freed bytes in the old generation should exceed the promoted bytes from the young generation.

To make the test robust and independent from such technical details, perhaps it is worthwhile to increase the limit by substantial amount? E.g. double the limit?

@joyeecheung
Copy link
Member

@targos Can you rebase on top of #41041 and see if the flake goes away? For now let's try adding 1 to the limit, and revisit for a better scheme to calculate the new limit later.

@targos targos added the blocked PRs that are blocked by other issues or PRs. label Nov 30, 2021
joyeecheung added a commit that referenced this pull request Dec 3, 2021
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@targos targos force-pushed the fix-concurrent-marking branch from 847201a to 19958c7 Compare December 3, 2021 11:26
@targos targos added request-ci Add this label to start a Jenkins CI on a PR. and removed blocked PRs that are blocked by other issues or PRs. labels Dec 3, 2021
@targos targos added the commit-queue Add this label to land a pull request using GitHub Actions. label Dec 5, 2021
@nodejs-github-bot nodejs-github-bot added commit-queue-failed An error occurred while landing this pull request using GitHub Actions. and removed commit-queue Add this label to land a pull request using GitHub Actions. labels Dec 5, 2021
@nodejs-github-bot
Copy link
Collaborator

Commit Queue failed
- Loading data for nodejs/node/pull/41013
✔  Done loading data for nodejs/node/pull/41013
----------------------------------- PR info ------------------------------------
Title      build: re-enable V8 concurrent marking (#41013)
Author     Michaël Zasso  (@targos)
Branch     targos:fix-concurrent-marking -> nodejs:master
Labels     v8 engine, tools, needs-ci, dont-land-on-v12.x, dont-land-on-v14.x, lts-watch-v16.x
Commits    1
 - build: re-enable V8 concurrent marking
Committers 1
 - Michaël Zasso 
PR-URL: https://github.com/nodejs/node/pull/41013
Fixes: https://github.com/nodejs/node/issues/41012
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Richard Lau 
Reviewed-By: Jiawen Geng 
Reviewed-By: James M Snell 
------------------------------ Generated metadata ------------------------------
PR-URL: https://github.com/nodejs/node/pull/41013
Fixes: https://github.com/nodejs/node/issues/41012
Reviewed-By: Benjamin Gruenbaum 
Reviewed-By: Richard Lau 
Reviewed-By: Jiawen Geng 
Reviewed-By: James M Snell 
--------------------------------------------------------------------------------
   ⚠  Commits were pushed since the last review:
   ⚠  - build: re-enable V8 concurrent marking
   ℹ  This PR was created on Mon, 29 Nov 2021 10:13:36 GMT
   ✔  Approvals: 4
   ✔  - Benjamin Gruenbaum (@benjamingr): https://github.com/nodejs/node/pull/41013#pullrequestreview-817668927
   ✔  - Richard Lau (@richardlau) (TSC): https://github.com/nodejs/node/pull/41013#pullrequestreview-817688119
   ✔  - Jiawen Geng (@gengjiawen): https://github.com/nodejs/node/pull/41013#pullrequestreview-817733406
   ✔  - James M Snell (@jasnell) (TSC): https://github.com/nodejs/node/pull/41013#pullrequestreview-817918558
   ✔  Last GitHub Actions successful
   ℹ  Last Full PR CI on 2021-12-05T11:53:32Z: https://ci.nodejs.org/job/node-test-pull-request/41356/
- Querying data for job/node-test-pull-request/41356/
   ✔  Last Jenkins CI successful
--------------------------------------------------------------------------------
   ✔  Aborted `git node land` session in /home/runner/work/node/node/.ncu
https://github.com/nodejs/node/actions/runs/1541360821

@targos
Copy link
Member Author

targos commented Dec 6, 2021

Landed in ca353de

@targos targos closed this Dec 6, 2021
targos added a commit that referenced this pull request Dec 6, 2021
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@targos targos deleted the fix-concurrent-marking branch December 6, 2021 07:23
danielleadams pushed a commit that referenced this pull request Dec 13, 2021
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Dec 13, 2021
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Dec 14, 2021
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Dec 14, 2021
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@sergei-startsev
Copy link

Is there a chance to be merged into 16.x?

@vincentcr
Copy link

Is there a chance to be merged into 16.x?

Yes, we have noticed some fairly large regressions in GC performance in production between node 14 and node 16, hopefully this fix will get merged back into node 16.

danielleadams pushed a commit that referenced this pull request Jan 31, 2022
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Jan 31, 2022
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Jan 31, 2022
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Jan 31, 2022
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Linkgoron pushed a commit to Linkgoron/node that referenced this pull request Jan 31, 2022
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: nodejs#41041
Refs: nodejs#41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Linkgoron pushed a commit to Linkgoron/node that referenced this pull request Jan 31, 2022
It was unintentionally disabled during a V8 update.

Fixes: nodejs#41012

PR-URL: nodejs#41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Feb 1, 2022
V8 requires the NearHeapLimitCallback to return a limit that's higher
than the initial one or otherwise it will crash.

PR-URL: #41041
Refs: #41013
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
danielleadams pushed a commit that referenced this pull request Feb 1, 2022
It was unintentionally disabled during a V8 update.

Fixes: #41012

PR-URL: #41013
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@danielleadams danielleadams mentioned this pull request Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commit-queue-failed An error occurred while landing this pull request using GitHub Actions. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--concurrent-marking and --parallel-marking don't work and disabled by default
10 participants