Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abort in hydra-eval-jobs: Collecting from unknown thread #1186

Open
2xsaiko opened this issue Mar 29, 2022 · 4 comments
Open

abort in hydra-eval-jobs: Collecting from unknown thread #1186

2xsaiko opened this issue Mar 29, 2022 · 4 comments
Labels

Comments

@2xsaiko
Copy link
Contributor

2xsaiko commented Mar 29, 2022

Describe the bug
Evaluating my jobset (configuration for some of my NixOS systems) causes hydra-eval-jobs to always (edit: not always! it ended up evaluating successfully after crashing like 30 times) abort in GC_push_all_stacks with the message "Collecting from unknown thread". (pthread_stop_world.c:754 in libgc.so). Executing the build manually with nix build finishes without problems.

To Reproduce
Steps to reproduce the behavior:

  1. Evaluate flake git+https://git.dblsaiko.net/systems (might be important that hydra is running on aarch64?)
  2. hydra-eval-jobs should end up coredumping

Expected behavior
The process doesn't abort and finishes evaluating the jobset correctly.

Hydra Server:

Please fill out this data as well as you can, but don't worry if you can't -- just do your best.

  • OS and version: NixOS 21.11.20220325.d89f18a
  • Version of Hydra: 2021-08-11
  • Version of Nix Hydra is built against: nix-2.5pre20211206_d1aaa7e
  • Version of the Nix daemon: 2.5.0pre20211206_d1aaa7e

Additional context
Here's the core dump log from systemd. The exact stack trace is always different but it always ends up in GC_malloc_kind_global to GC_push_all_stacks where it ends up aborting.

Mar 29 14:11:00 spike hydra-evaluator[911]: starting evaluation of jobset ‘systems:master (jobset#5)’ (last checked 60 s ago)
Mar 29 14:11:01 spike nix-daemon[1083]: accepted connection from pid 118191, user hydra
Mar 29 14:11:01 spike nix-daemon[1083]: accepted connection from pid 118215, user hydra
Mar 29 14:11:04 spike systemd[1]: Started Process Core Dump (PID 121035/UID 0).
Mar 29 14:11:07 spike systemd-coredump[121036]: [🡕] Process 118215 (hydra-eval-jobs) of user 122 dumped core.
                                                
                                                Found module linux-vdso.so.1 with build-id: 2df0e272c95568f51a3a1921a822b54330132699
                                                Found module libnss_dns.so.2 with build-id: 62c5a4fcec5da4f113241fda61ea1afc9b2d683d
                                                Found module libattr.so.1 without build-id.
                                                Found module libresolv.so.2 with build-id: 0fe000e8a1dcb24d46153bd487b7c4ef3c0200fd
                                                Found module libkeyutils.so.1 without build-id.
                                                Found module libkrb5support.so.0 without build-id.
                                                Found module libxml2.so.2 without build-id.
                                                Found module libbz2.so.1 without build-id.
                                                Found module libzstd.so.1 without build-id.
                                                Found module liblzma.so.5 without build-id.
                                                Found module libacl.so.1 without build-id.
                                                Found module libbrotlicommon.so.1 without build-id.
                                                Found module libaws-c-common.so.1 without build-id.
                                                Found module libaws-c-sdkutils.so.1.0.0 without build-id.
                                                Found module libaws-c-cal.so.1.0.0 without build-id.
                                                Found module libaws-c-compression.so.1.0.0 without build-id.
                                                Found module libs2n.so without build-id.
                                                Found module libaws-c-io.so.1.0.0 without build-id.
                                                Found module libaws-c-http.so.1.0.0 without build-id.
                                                Found module libaws-c-auth.so.1.0.0 without build-id.
                                                Found module libaws-c-s3.so.0unstable without build-id.
                                                Found module libaws-checksums.so.1.0.0 without build-id.
                                                Found module libaws-c-event-stream.so.1.0.0 without build-id.
                                                Found module libaws-c-mqtt.so.1.0.0 without build-id.
                                                Found module libaws-crt-cpp.so without build-id.
                                                Found module libcom_err.so.3 without build-id.
                                                Found module libk5crypto.so.3 without build-id.
                                                Found module libkrb5.so.3 without build-id.
                                                Found module libgssapi_krb5.so.2 without build-id.
                                                Found module libssl.so.1.1 with build-id: 85e435fd52ba4ac684891bacf3e582cf2e317e3b
                                                Found module libssh2.so.1 without build-id.
                                                Found module libnghttp2.so.14 without build-id.
                                                Found module libz.so.1 without build-id.
                                                Found module librt.so.1 with build-id: f364ce33c59cd7955db929de439acb10e8053792
                                                Found module libarchive.so.13 without build-id.
                                                Found module libbrotlidec.so.1 without build-id.
                                                Found module libbrotlienc.so.1 without build-id.
                                                Found module libseccomp.so.2 without build-id.
                                                Found module libaws-cpp-sdk-core.so without build-id.
                                                Found module libaws-cpp-sdk-s3.so without build-id.
                                                Found module libaws-cpp-sdk-transfer.so without build-id.
                                                Found module libsodium.so.23 with build-id: 37c3dea45982807673d873baeb8b9c37856ac97a
                                                Found module libcurl.so.4 with build-id: 12782cce4baa4c59fb2640b20503b365128990ab
                                                Found module libsqlite3.so.0 with build-id: 9882a19eb1366385271d43056659352c62e678c0
                                                Found module libnixfetchers.so with build-id: 702e2e886dfec67980b900fc0e9ef548a4ee06a9
                                                Found module libboost_context.so.1.69.0 without build-id.
                                                Found module libcrypto.so.1.1 with build-id: 840d07f85182906a77458e2574e413149009446a
                                                Found module libc.so.6 with build-id: 2ec2584e7cf41bf9a28433370c4fbdac47cc8634
                                                Found module libgcc_s.so.1 without build-id.
                                                Found module libm.so.6 with build-id: 729324a0809db558a202d1a0244a1e0263031859
                                                Found module libstdc++.so.6 without build-id.
                                                Found module libnixutil.so with build-id: 988b87dc322421b48747aecf9bbabd84613a03c2
                                                Found module libnixstore.so with build-id: b56319b1246b8750a75ba295d139ec2c0edf8082
                                                Found module libdl.so.2 with build-id: a046e4bc181def5e579fac40210507736492f350
                                                Found module libpthread.so.0 with build-id: 7f4b6b86e1f1dcb6c793869a655c23cf82b1f45c
                                                Found module libgc.so.1 with build-id: 6dd737074c128cc3ef070a7f9a65313b3fe6461d
                                                Found module libnixexpr.so with build-id: da78f552b9b5f7d3ee06edf0da865a6c5e162017
                                                Found module libnixmain.so with build-id: 66f96d7d7a0d5f6de353a8114472c181ac3875a4
                                                Found module hydra-eval-jobs without build-id.
                                                Stack trace of thread 118215:
                                                #0  0x0000ffffb81a6c20 raise (libc.so.6 + 0x36c20)
                                                #1  0x0000ffffb8194678 abort (libc.so.6 + 0x24678)
                                                #2  0x0000ffffb8a71f6c GC_push_all_stacks (libgc.so.1 + 0x1cf6c)
                                                #3  0x0000ffffb8a6d6d4 GC_mark_some (libgc.so.1 + 0x186d4)
                                                #4  0x0000ffffb8a6d878 GC_stopped_mark (libgc.so.1 + 0x18878)
                                                #5  0x0000ffffb8a6ee4c GC_try_to_collect_inner (libgc.so.1 + 0x19e4c)
                                                #6  0x0000ffffb8a6f254 GC_collect_or_expand (libgc.so.1 + 0x1a254)
                                                #7  0x0000ffffb8a6f68c GC_allocobj (libgc.so.1 + 0x1a68c)
                                                #8  0x0000ffffb8a6fa68 GC_generic_malloc_inner (libgc.so.1 + 0x1aa68)
                                                #9  0x0000ffffb8a73650 GC_generic_malloc (libgc.so.1 + 0x1e650)
                                                #10 0x0000ffffb8a73a18 GC_malloc_kind_global (libgc.so.1 + 0x1ea18)
                                                #11 0x0000ffffb8a74ddc GC_strndup (libgc.so.1 + 0x1fddc)
                                                #12 0x0000ffffb8d87228 _ZN3nix8mkStringERNS_5ValueESt17basic_string_viewIcSt11char_traitsIcEERKSt3setINSt7__cxx1112basic_stringIcS4_SaIcEEESt4lessISA_ESaISA_EE (libnixexpr.so + 0xa0228)
                                                #13 0x0000ffffb8d95040 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xae040)
                                                #14 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #15 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #16 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #17 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #18 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #19 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #20 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #21 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #22 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #23 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #24 0x0000ffffb8d94dd8 _ZN3nix17ExprConcatStrings4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaddd8)
                                                #25 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #26 0x0000ffffb8e22998 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13b998)
                                                #27 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #28 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #29 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #30 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #31 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #32 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #33 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #34 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #35 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #36 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #37 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #38 0x0000ffffb8e22a48 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13ba48)
                                                #39 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #40 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #41 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #42 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #43 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #44 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #45 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #46 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #47 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #48 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #49 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #50 0x0000ffffb8d94784 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad784)
                                                #51 0x0000ffffb8e22a48 _ZN3nixL21prim_derivationStrictERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x13ba48)
                                                #52 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #53 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #54 0x0000ffffb8e1fd2c _ZN3nix12prim_getAttrERNS_9EvalStateERKNS_3PosEPPNS_5ValueERS5_ (libnixexpr.so + 0x138d2c)
                                                #55 0x0000ffffb8d8edf8 _ZN3nix9EvalState12callFunctionERNS_5ValueEmPPS1_S2_RKNS_3PosE (libnixexpr.so + 0xa7df8)
                                                #56 0x0000ffffb8d900ec _ZN3nix8ExprCall4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xa90ec)
                                                #57 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #58 0x0000ffffb8d916f8 _ZN3nix10ExprSelect4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xaa6f8)
                                                #59 0x0000ffffb8d92220 _ZN3nix10ExprAssert4evalERNS_9EvalStateERNS_3EnvERNS_5ValueE (libnixexpr.so + 0xab220)
                                                #60 0x000000000043e914 _ZN3nix9EvalState10forceValueERNS_5ValueERKNS_3PosE (hydra-eval-jobs + 0x3e914)
                                                #61 0x0000ffffb8d9469c _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad69c)
                                                #62 0x0000ffffb8d94a68 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xada68)
                                                #63 0x0000ffffb8d94784 _ZN3nix9EvalState14coerceToStringERKNS_3PosERNS_5ValueERSt3setINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4lessISC_ESaISC_EEbbb (libnixexpr.so + 0xad784)
Mar 29 14:11:07 spike systemd[1]: [email protected]: Deactivated successfully.
Mar 29 14:11:07 spike systemd[1]: [email protected]: Consumed 3.082s CPU time, no IP traffic.
Mar 29 14:11:07 spike hydra-evaluator[118188]: hydra-eval-jobs returned exit code 1:
Mar 29 14:11:07 spike hydra-evaluator[118188]: Collecting from unknown thread
Mar 29 14:11:07 spike hydra-evaluator[118188]: error: unexpected EOF reading a line
Mar 29 14:11:07 spike hydra-evaluator[911]: evaluation of jobset ‘systems:master (jobset#5)’ failed with exit code 1
@2xsaiko 2xsaiko added the bug label Mar 29, 2022
@misuzu
Copy link

misuzu commented Jun 12, 2022

Same issue on NixOS/nixpkgs@90cd545, also aarch64.

blurgyy added a commit to blurgyy/flames that referenced this issue Aug 30, 2022
@blurgyy
Copy link

blurgyy commented Aug 30, 2022

The problem hits me today, and I managed to workaround this issue by adding a GC_DONT_GC environment variable to hydra-evaluator.service (some thing like https://gitlab.com/highsunz/flames/-/commit/9cd2a0a3f48abb0c5c57d3ee049f72e31cf1ec2e).

This workaround comes from NixOS/nix#4178 (comment).

Misterio77 added a commit to Misterio77/nix-config that referenced this issue Feb 6, 2023
@chayleaf
Copy link
Contributor

Can confirm I've encountered this issue right after migrating my server from x86_64 to aarch64 (while keeping the same config). GC_DONT_GC does help.

chayleaf added a commit to chayleaf/dotfiles that referenced this issue Oct 18, 2023
@de11n
Copy link

de11n commented Jul 9, 2024

We encountered this issue as well. We were able to work around it by refactoring the Nix expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants