Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node/yarn segmentation fault on redhat aarch64 #3202

Closed
liza-mae opened this issue Jan 27, 2021 · 17 comments
Closed

Node/yarn segmentation fault on redhat aarch64 #3202

liza-mae opened this issue Jan 27, 2021 · 17 comments

Comments

@liza-mae
Copy link

liza-mae commented Jan 27, 2021

  • Node.js Version: 14.15.4
  • OS: Linux ip-10-0-0-100.us-west-2.compute.internal 4.18.0-221.el8.aarch64 Update README for help #1 SMP Thu Jun 25 22:08:25 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
  • **Yarn: 1.22.10
  • **Other info: If I use Ubuntu 20.04 of the same architecture aarch64, it works. If I switch the node version back to 12.20.1, it works. It fails with Node 14 and 15.

I have a cross post I did here in the yarn repo, but we did not get any answers, so I wanted to ask here also since the node version impacts whether or not the yarn command fails, as noted above.

Some core dump info below, can someone help?

coredumpctl debug
           PID: 43042 (node)
           UID: 1000 (centos)
           GID: 1000 (centos)
        Signal: 11 (SEGV)
     Timestamp: Wed 2021-01-27 16:47:53 UTC (17s ago)
  Command Line: node /home/centos/.kibana/node/14.15.4/bin/yarn logs
    Executable: /home/centos/.kibana/node/14.15.4/bin/node
 Control Group: /user.slice/user-1000.slice/session-7.scope
          Unit: session-7.scope
         Slice: user-1000.slice
       Session: 7
     Owner UID: 1000 (centos)
       Boot ID: 7a91b2c67a84483f890f96119ea784f2
    Machine ID: ae650baab2584f2f9754371ced69082f
      Hostname: ip-10-0-0-6.us-west-2.compute.internal
       Storage: /var/lib/systemd/coredump/core.node.1000.7a91b2c67a84483f890f96119ea784f2.43042.1611766073000000.lz4
       Message: Process 43042 (node) of user 1000 dumped core.
                
                Stack trace of thread 43042:
                #0  0x0000000000d74690 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #1  0x0000000000d089bc n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #2  0x0000000000d089bc n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #3  0x0000000000d0fa30 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #4  0x0000000000ce5e54 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #5  0x0000000000ce61d0 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #6  0x000000000127c308 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #7  0x0000000000fb32ec n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #8  0x0000000000fce988 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #9  0x0000000000fceef0 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #10 0x0000000000fcf9ac n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #11 0x0000000000fcfa7c n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #12 0x00000000010206d4 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #13 0x000000000135ac6c n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #14 0x00000000013ab910 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #15 0x000000000134bef8 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #16 0x00000000012f3a14 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #17 0x00000000012f3a14 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #18 0x00000000013232bc n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #19 0x00000000012f3a14 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #20 0x00000000012f3a14 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #21 0x000000000139ba9c n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #22 0x0000000001313c18 n/a (/home/centos/.kibana/node/14.15.4/bin/node)
                #23 0x00000000012f11a8 n/a (/home/centos/.kibana/node/14.15.4/bin/node)

GNU gdb (GDB) Red Hat Enterprise Linux 8.2-12.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/centos/.kibana/node/14.15.4/bin/node...done.
[New LWP 43042]
[New LWP 43043]
[New LWP 43044]
[New LWP 43046]
[New LWP 43048]
[New LWP 43050]
[New LWP 43049]
[New LWP 43051]
[New LWP 43052]
[New LWP 43045]
[New LWP 43047]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `node /home/centos/.kibana/node/14.15.4/bin/yarn logs'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000d74690 in v8::internal::CodeObjectRegistry::RegisterNewlyAllocatedCodeObject(unsigned long) ()
[Current thread is 1 (Thread 0xffffb4bb4f40 (LWP 43042))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.aarch64 libgcc-8.3.1-5.1.el8.aarch64 libstdc++-8.3.1-5.1.el8.aarch64
(gdb) bt
#0  0x0000000000d74690 in v8::internal::CodeObjectRegistry::RegisterNewlyAllocatedCodeObject(unsigned long) ()
#1  0x0000000000d089bc in v8::internal::Heap::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#2  0x0000000000d0fa30 in v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) ()
#3  0x0000000000ce5e54 in v8::internal::Factory::CodeBuilder::BuildInternal(bool) ()
#4  0x0000000000ce61d0 in v8::internal::Factory::CodeBuilder::Build() ()
#5  0x000000000127c308 in v8::internal::RegExpMacroAssemblerARM64::GetCode(v8::internal::Handle<v8::internal::String>) ()
#6  0x0000000000fb32ec in v8::internal::RegExpCompiler::Assemble(v8::internal::Isolate*, v8::internal::RegExpMacroAssembler*, v8::internal::RegExpNode*, int, v8::internal::Handle<v8::internal::String>) ()
#7  0x0000000000fce988 in v8::internal::RegExpImpl::Compile(v8::internal::Isolate*, v8::internal::Zone*, v8::internal::RegExpCompileData*, v8::base::Flags<v8::internal::JSRegExp::Flag, int>, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::String>, bool, unsigned int) ()
#8  0x0000000000fceef0 in v8::internal::RegExpImpl::CompileIrregexp(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, bool) ()
#9  0x0000000000fcf9ac in v8::internal::RegExp::IrregexpPrepare(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>) ()
#10 0x0000000000fcfa7c in v8::internal::RegExpImpl::IrregexpExec(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, int, v8::internal::Handle<v8::internal::RegExpMatchInfo>) ()
#11 0x00000000010206d4 in v8::internal::Runtime_RegExpExec(int, unsigned long*, v8::internal::Isolate*) ()
#12 0x000000000135ac6c in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
@kasicka
Copy link

kasicka commented Jan 27, 2021

@mhdawson
Copy link
Member

It would be good to clarify what versions of Node.js you are using.

Are you installing the versions from the Node.js downloads page on both RHEL and Ubuntu? If not are you using the versions of Node.js that come bundled into Ubuntu and RHEL?

@danbev had investigated what sounds like a similar failure with yarn on aarch64. If you are recreating this on the binaries you downloaded from: https://nodejs.org/en/download/, then possibly its a different issue or the fix applied to the 14.x stream did not fix the original issue (@danbev confirmed that a commit was in the 14.x stream intended to fix that issue and it looks like that was landed included in the latest community 14.x release).

If instead you are recreating on the binaries that come with RHEL, then I'm confused because as far as I know there is no v14.15.4 available yet as part of the RHEL containers/rpms.

@liza-mae
Copy link
Author

Hi @mhdawson,

You are correct the binaries that come with RHEL do work but that is I believe it lands on a node 12 version.

What I have is a shell script that downloads the node package based on the node version specified for our project, which is 14.15.4 and I use the same package on both Ubuntu and RHEL.

This is the link I am using: https://nodejs.org/dist/v14.15.4/node-v14.15.4-linux-arm64.tar.gz

It does work on Ubuntu, so is this distribution not supported for RHEL yet?

Please let me know. Thanks!

@mhdawson
Copy link
Member

@liza-mae I was trying to understand which binaries you were using. Those binaries should run/work on RHEL even though they are different from the ones packaged into the RHEL distributions.

@danbev can you look to see if this is the same problem that you investigated before? From what was in the PR/issues I think we thought the issue was fixed on 14.x (although with a different fix than on earlier versions) but from what @liza-mae is reporting that may not be the case.

@danbev
Copy link

danbev commented Jan 30, 2021

Looking at v8.h in 14.15.4 the above fix is included, so if this is the version being used then this could be a different issue. I don't have an environment set up with aarch64 (this is something that the RHEL team at work usually provides us with when there is an issue to be investigated) so I don't have a quick way of verifying this I'm afraid. I can't tell by the output alone, it does look similar and I would have guessed it was the same but can't say for sure.

@mhdawson
Copy link
Member

mhdawson commented Feb 1, 2021

@kasicka do you have access to an aarch64 environment you can give @danbev access to?

@sxa
Copy link
Member

sxa commented Feb 1, 2021

I had a quick try in a CentOS8.3 docker container and couldn't reproduce this problem. Let me know if I've not followed the steps propely that you used to reproduce:

curl https://nodejs.org/dist/v14.15.4/node-v14.15.4-linux-arm64.tar.gz | tar xpfz -
cd node-v14.15.4-linux-arm64/
export PATH=$PWD:$PATH
npm install -g yarn

And then yarn appears to run ok:

[sxa@5148ffbb568c ~]$ yarn add express
yarn add v1.22.10
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
success Saved lockfile.
success Saved 29 new dependencies.
info Direct dependencies
└─ [email protected]
info All dependencies
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
├─ [email protected]
└─ [email protected]
Done in 5.51s.
[sxa@5148ffbb568c ~]$ 

@danbev Based on the above it may not be trivial to reproduce it in Docker.

@sxa
Copy link
Member

sxa commented Feb 1, 2021

Interesting - tried with a CentOS8.2.2004 on AWS us-west-2 like yours and was able to reproduce ...

@sxa
Copy link
Member

sxa commented Feb 1, 2021

LIke the OP I've found that it works on all versions up to latest v12 (v12.20.1), segfaults on v13.0.0 and later as far as I can tell on those systems (Before and after performing a yum update to pull it up to 8.3.2011 and kernel 4.18.0-240.10.1.el8_3 makes no difference)

@sxa
Copy link
Member

sxa commented Feb 1, 2021

Also with a debug build of node v14.15.4 I get a crash just by trying to start npm:

[sxa@ip-172-31-61-154 bin]$ npm


#
# Fatal error in ../deps/v8/src/heap/memory-chunk.cc, line 50
# Debug check failed: kMaxRegularHeapObjectSize <= memory (131072 vs. 65536).
#
#
#
#FailureMessage Object: 0xffffc1118e48
Trace/breakpoint trap (core dumped)

@sxa
Copy link
Member

sxa commented Feb 1, 2021

I can also reproduce the crash on a (non-docker, non-AWS) CentOS 7.9/aarch64 system. I can also confirm it does NOT crash using code from HEAD (16.0.0)

@danbev
Copy link

danbev commented Feb 2, 2021

@danbev Based on the above it may not be trivial to reproduce it in Docker.

Yeah, I've not been able to reproduce this docker either. Thanks.

I'm going to set up a machine internally and see if I can reproduce this.

@danbev
Copy link

danbev commented Feb 2, 2021

@sxa Thanks for investigating this. I've been able to reproduce the issue thanks to your above description (I'm using CentOS8.2.2004). I'll take a closer look at this now.

@danbev
Copy link

danbev commented Feb 3, 2021

@liza-mae Could you provide the output of the following command from the system in question:

$ getconf PAGESIZE

@liza-mae
Copy link
Author

liza-mae commented Feb 3, 2021

Hi @danbev it is

[centos@ip-10-0-0-6 ~]$ getconf PAGESIZE
65536

@sxa
Copy link
Member

sxa commented Feb 3, 2021

That's interesting, a quick look around suggests that all my Ubuntu/aarch64 systems seem to have a pagesize of 4096 set, but as you suggest on CentOS it's 65536 (even when the total memory size on the machine is the same)

danbev added a commit to danbev/node that referenced this issue Feb 4, 2021
    Original commit message:
      [heap] Make maximum regular code object size a runtime value.

      Executable V8 pages include 3 reserved OS pages: one for the writable
      header and two as guards. On systems with 64k OS pages, the amount of
      allocatable space left for objects can then be quite smaller than the
      page size, only 64k for each 256k page.

      This means regular code objects cannot be larger than 64k, while the
      maximum regular object size is fixed to 128k, half of the page size. As
      a result code object never reach this limit and we can end up filling
      regular pages with few large code objects.

      To fix this, we change the maximum code object size to be runtime value,
      set to half of the allocatable space per page. On systems with 64k OS
      pages, the limit will be 32k.

      Alternatively, we could increase the V8 page size to 512k on Arm64 linux
      so we wouldn't waste code space. However, systems with 4k OS pages are
      more common, and those with 64k pages tend to have more memory available
      so we should be able to live with it.

      Bug: v8:10808
      Change-Id: I5d807e7a3df89f1e9c648899e9ba2f8e2648264c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2460809
      Reviewed-by: Igor Sheludko <[email protected]>
      Reviewed-by: Georg Neis <[email protected]>
      Reviewed-by: Ulan Degenbaev <[email protected]>
      Commit-Queue: Pierre Langlois <[email protected]>
      Cr-Commit-Position: refs/heads/master@{#70569}

Refs: nodejs/help#3202
@sxa
Copy link
Member

sxa commented Feb 4, 2021

Looks like the upgrade to V8 8.7.220 in the master branch fixed whatever the problem was which is why it doesn't crash in the master branch, but that won't be formally released until v16.0.0, so your PR seems like a good candidate for the next v14.x release

danbev added a commit to danbev/node that referenced this issue Feb 8, 2021
    Original commit message:
      [heap] Make maximum regular code object size a runtime value.

      Executable V8 pages include 3 reserved OS pages: one for the writable
      header and two as guards. On systems with 64k OS pages, the amount of
      allocatable space left for objects can then be quite smaller than the
      page size, only 64k for each 256k page.

      This means regular code objects cannot be larger than 64k, while the
      maximum regular object size is fixed to 128k, half of the page size. As
      a result code object never reach this limit and we can end up filling
      regular pages with few large code objects.

      To fix this, we change the maximum code object size to be runtime value,
      set to half of the allocatable space per page. On systems with 64k OS
      pages, the limit will be 32k.

      Alternatively, we could increase the V8 page size to 512k on Arm64 linux
      so we wouldn't waste code space. However, systems with 4k OS pages are
      more common, and those with 64k pages tend to have more memory available
      so we should be able to live with it.

      Bug: v8:10808
      Change-Id: I5d807e7a3df89f1e9c648899e9ba2f8e2648264c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2460809
      Reviewed-by: Igor Sheludko <[email protected]>
      Reviewed-by: Georg Neis <[email protected]>
      Reviewed-by: Ulan Degenbaev <[email protected]>
      Commit-Queue: Pierre Langlois <[email protected]>
      Cr-Commit-Position: refs/heads/master@{#70569}

Refs: nodejs/help#3202
danbev added a commit to danbev/node that referenced this issue Feb 9, 2021
    Original commit message:
      [heap] Make maximum regular code object size a runtime value.

      Executable V8 pages include 3 reserved OS pages: one for the writable
      header and two as guards. On systems with 64k OS pages, the amount of
      allocatable space left for objects can then be quite smaller than the
      page size, only 64k for each 256k page.

      This means regular code objects cannot be larger than 64k, while the
      maximum regular object size is fixed to 128k, half of the page size. As
      a result code object never reach this limit and we can end up filling
      regular pages with few large code objects.

      To fix this, we change the maximum code object size to be runtime value,
      set to half of the allocatable space per page. On systems with 64k OS
      pages, the limit will be 32k.

      Alternatively, we could increase the V8 page size to 512k on Arm64 linux
      so we wouldn't waste code space. However, systems with 4k OS pages are
      more common, and those with 64k pages tend to have more memory available
      so we should be able to live with it.

      Bug: v8:10808
      Change-Id: I5d807e7a3df89f1e9c648899e9ba2f8e2648264c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2460809
      Reviewed-by: Igor Sheludko <[email protected]>
      Reviewed-by: Georg Neis <[email protected]>
      Reviewed-by: Ulan Degenbaev <[email protected]>
      Commit-Queue: Pierre Langlois <[email protected]>
      Cr-Commit-Position: refs/heads/master@{#70569}

Refs: nodejs/help#3202
danbev added a commit to nodejs/node that referenced this issue Feb 16, 2021
    Original commit message:
      [heap] Make maximum regular code object size a runtime value.

      Executable V8 pages include 3 reserved OS pages: one for the writable
      header and two as guards. On systems with 64k OS pages, the amount of
      allocatable space left for objects can then be quite smaller than the
      page size, only 64k for each 256k page.

      This means regular code objects cannot be larger than 64k, while the
      maximum regular object size is fixed to 128k, half of the page size. As
      a result code object never reach this limit and we can end up filling
      regular pages with few large code objects.

      To fix this, we change the maximum code object size to be runtime value,
      set to half of the allocatable space per page. On systems with 64k OS
      pages, the limit will be 32k.

      Alternatively, we could increase the V8 page size to 512k on Arm64 linux
      so we wouldn't waste code space. However, systems with 4k OS pages are
      more common, and those with 64k pages tend to have more memory available
      so we should be able to live with it.

      Bug: v8:10808
      Change-Id: I5d807e7a3df89f1e9c648899e9ba2f8e2648264c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2460809
      Reviewed-by: Igor Sheludko <[email protected]>
      Reviewed-by: Georg Neis <[email protected]>
      Reviewed-by: Ulan Degenbaev <[email protected]>
      Commit-Queue: Pierre Langlois <[email protected]>
      Cr-Commit-Position: refs/heads/master@{#70569}

PR-URL: #37225
Refs: nodejs/help#3202
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Stewart X Addison <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit to nodejs/node that referenced this issue Apr 6, 2021
    Original commit message:
      [heap] Make maximum regular code object size a runtime value.

      Executable V8 pages include 3 reserved OS pages: one for the writable
      header and two as guards. On systems with 64k OS pages, the amount of
      allocatable space left for objects can then be quite smaller than the
      page size, only 64k for each 256k page.

      This means regular code objects cannot be larger than 64k, while the
      maximum regular object size is fixed to 128k, half of the page size. As
      a result code object never reach this limit and we can end up filling
      regular pages with few large code objects.

      To fix this, we change the maximum code object size to be runtime value,
      set to half of the allocatable space per page. On systems with 64k OS
      pages, the limit will be 32k.

      Alternatively, we could increase the V8 page size to 512k on Arm64 linux
      so we wouldn't waste code space. However, systems with 4k OS pages are
      more common, and those with 64k pages tend to have more memory available
      so we should be able to live with it.

      Bug: v8:10808
      Change-Id: I5d807e7a3df89f1e9c648899e9ba2f8e2648264c
      Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2460809
      Reviewed-by: Igor Sheludko <[email protected]>
      Reviewed-by: Georg Neis <[email protected]>
      Reviewed-by: Ulan Degenbaev <[email protected]>
      Commit-Queue: Pierre Langlois <[email protected]>
      Cr-Commit-Position: refs/heads/master@{#70569}

PR-URL: #37225
Refs: nodejs/help#3202
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Stewart X Addison <[email protected]>
Reviewed-By: Juan José Arboleda <[email protected]>
Reviewed-By: James M Snell <[email protected]>
aleskandro added a commit to aleskandro/console that referenced this issue Jan 18, 2023
Building the console on top of the tectonic-console-builder (for CI and OKD) is blocked on arm64 by nodejs/help#3202 as we build the image on rhel8, shipping a 64k pagesize kernel. The fix landed in nodejs from v14.17.0.
aleskandro added a commit to aleskandro/console that referenced this issue Jan 18, 2023
Building the console on top of the tectonic-console-builder (for CI and OKD) is blocked on arm64 by nodejs/help#3202 as we build the image on rhel8, shipping a 64k pagesize kernel. The fix landed in nodejs from v14.17.0.
aleskandro added a commit to aleskandro/console that referenced this issue Jan 19, 2023
Building the console on top of the tectonic-console-builder (for CI and OKD) is blocked on arm64 by nodejs/help#3202 as we build the image on rhel8, shipping a 64k pagesize kernel. The fix landed in nodejs from v14.17.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants