Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failure in src/node_worker.cc with large number of workers #31614

Closed
gireeshpunathil opened this issue Feb 2, 2020 · 9 comments
Closed
Labels
worker Issues and PRs related to Worker support.

Comments

@gireeshpunathil
Copy link
Member

  • Version: master, v14.0.0-pre
  • Platform: Linux 3.10.0-957.5.1.el7.x86_64 1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: workers

I was debugging #23277 and came across this:

$ cat bar.js

const { Worker } = require('worker_threads');

for (let i = 0; i < 10000; ++i) {
  const worker = new Worker(
    'require(\'worker_threads\').parentPort.postMessage(2 + 2)',
    { eval: true });
}

$ node --max-old-space-size=100000 bar

node[101402]: ../src/node_worker.cc:135:node::worker::WorkerThreadData::WorkerThreadData(node::worker::Worker*): Assertion `(uv_loop_init(&loop_)) == (0)' failed.

I guess this has to do with libuv failure due to lack of memory, but can this be better handled?

/cc @nodejs/workers

@gireeshpunathil
Copy link
Member Author

node: ../deps/uv/src/unix/core.c:556: uv__close_nocheckstdio: Assertion `fd > -1' failed.

In the assertion failure list, I see this too.

@gireeshpunathil
Copy link
Member Author

/cc @nodejs/libuv

@bnoordhuis
Copy link
Member

Does the process runs out of file descriptors? The first assertion is node's problem; it assumes that uv_loop_init() never fails but it can.

How did you trigger the second one? Can you get a backtrace?

@gireeshpunathil
Copy link
Member Author

(gdb) where
#0  0x00007ffff6e04377 in raise () from /lib64/libc.so.6
#1  0x00007ffff6e05a68 in abort () from /lib64/libc.so.6
#2  0x0000000000a9dea1 in node::Abort() ()
#3  0x0000000000a9df17 in node::Assert(node::AssertionInfo const&) ()
#4  0x0000000000b3ad12 in node::worker::Worker::Run() ()
#5  0x0000000000b3b880 in node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::{lambda(void*)#1}::_FUN(void*) ()
#6  0x00007ffff71a3ea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff6ecc8cd in clone () from /lib64/libc.so.6

@bnoordhuis - thanks! the first assertion has this backtrace; not able to obtain for the second one. Looks like it is always triggered by a worker thread, while the main thread is already processing the first assertion failure, because the second one occurs only when first one is present (this is my guess, no proof)

running out of file descriptors looks like a possibility; I will debug from that angle.

@bnoordhuis
Copy link
Member

not able to obtain for the second one.

If you turn on coredumps, you should be able to get a backtrace. You may need to select the right thread in gdb or just run thread apply all backtrace.

@addaleax addaleax added the worker Issues and PRs related to Worker support. label Feb 4, 2020
@gireeshpunathil
Copy link
Member Author

(gdb) where
#0  0x00007f92514c7377 in raise () from /lib64/libc.so.6
#1  0x00007f92514c8a68 in abort () from /lib64/libc.so.6
#2  0x00007f92514c0196 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f92514c0242 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000a03319 in uv__close_nocheckstdio (fd=-24) at ../deps/uv/src/unix/core.c:556
#5  0x00000000013cd271 in uv__close_nocheckstdio (fd=fd@entry=-24) at ../deps/uv/src/unix/core.c:563
#6  0x00000000013de2f7 in uv__read_proc_meminfo (what=what@entry=0x20be946 "MemTotal:")
    at ../deps/uv/src/unix/linux-core.c:1016
#7  0x00000000013df7f3 in uv_get_total_memory () at ../deps/uv/src/unix/linux-core.c:1043
#8  0x0000000000a0dc05 in node::SetIsolateCreateParamsForNode(v8::Isolate::CreateParams*) ()
#9  0x0000000000b3a3e7 in node::worker::Worker::Run() ()
#10 0x0000000000b3b930 in node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::{lambda(void*)#1}::_FUN(void*) ()
#11 0x00007f9251866ea5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f925158f8cd in clone () from /lib64/libc.so.6

this is the stack trace for the second assertion. Looks like the loop creation failed, but the worker creation sequence has progressed thus far, wrongly?

when we run out of descriptors, the error is -24, which is returned by uv_loop_init. the fd at the assertion site is also the same. Is it by chance? or do we fill the uv_loop_t with the error codes?

@gireeshpunathil
Copy link
Member Author

On the other hand, in its current form, #31621 will address this one too?

@addaleax
Copy link
Member

addaleax commented Feb 5, 2020

@gireeshpunathil No, #31621 is unrelated to that second failure, but libuv/libuv#2645 should have fixed that assertion (fd should not be negative anymore when being passed to uv__close_nocheckstdio). That was landed in 05d350a – is your local copy of Node up to date?

@gireeshpunathil
Copy link
Member Author

@addaleax - that is really promising! my copy does not have it; will check and confirm. [ since the recreate is consistent, the validation will be easy ]

HarshithaKP added a commit to HarshithaKP/node that referenced this issue Feb 17, 2020
Instead of hard asserting throw a runtime error,
that is more consumable.
Fixes: nodejs#31614
codebytere pushed a commit that referenced this issue Feb 27, 2020
Instead of hard asserting throw a runtime error,
that is more consumable.

Fixes: #31614

PR-URL: #31621
Reviewed-By: Anna Henningsen <[email protected]>
codebytere pushed a commit that referenced this issue Mar 15, 2020
Instead of hard asserting throw a runtime error,
that is more consumable.

Fixes: #31614

PR-URL: #31621
Reviewed-By: Anna Henningsen <[email protected]>
codebytere pushed a commit that referenced this issue Mar 17, 2020
Instead of hard asserting throw a runtime error,
that is more consumable.

Fixes: #31614

PR-URL: #31621
Reviewed-By: Anna Henningsen <[email protected]>
codebytere pushed a commit that referenced this issue Mar 30, 2020
Instead of hard asserting throw a runtime error,
that is more consumable.

Fixes: #31614

PR-URL: #31621
Reviewed-By: Anna Henningsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants