Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during eval #54285

Closed
Drvi opened this issue Apr 27, 2024 · 7 comments · Fixed by #54671
Closed

Segfault during eval #54285

Drvi opened this issue Apr 27, 2024 · 7 comments · Fixed by #54671
Labels
needs more info Clarification or a reproducible example is required

Comments

@Drvi
Copy link
Contributor

Drvi commented Apr 27, 2024

Hey folks, we saw this segfault in our CI. Any idea what is causing it?

  5 dependencies successfully precompiled in 18 seconds. 12 already precompiled.
     Testing Running tests...
[ Info: Scanning for test items in project `TypeOrder` at paths: /tmp/nix-build-raicode.drv-0/packages-source/packages/TypeOrder

[13422] signal (11.128): Segmentation fault
in expression starting at /tmp/nix-build-raicode.drv-0/packages-source/packages/TypeOrder/test/jet_test.jl:3
signal (11) thread (4) bindingkey_eq at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/module.c:714
signal (11) thread (4) jl_smallintset_lookup at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/smallintset.c:121
signal (11) thread (4) jl_get_module_binding at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/module.c:723
signal (11) thread (4) jl_resolve_owner at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/module.c:379 [inlined]
signal (11) thread (4) ijl_get_binding at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/module.c:464
signal (11) thread (4) ijl_get_global at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/module.c:775
signal (11) thread (4) jl_eval_global_var at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:148 [inlined]
signal (11) thread (4) eval_value at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:204
signal (11) thread (4) do_call at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:125
signal (11) thread (4) eval_value at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:223
signal (11) thread (4) eval_stmt_value at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:174 [inlined]
signal (11) thread (4) eval_body at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:621
signal (11) thread (4) jl_interpret_toplevel_thunk at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/interpreter.c:775
signal (11) thread (4) jl_toplevel_eval_flex at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/toplevel.c:934
signal (11) thread (4) jl_toplevel_eval_flex at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/toplevel.c:877
signal (11) thread (4) ijl_toplevel_eval_in at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/toplevel.c:985
signal (11) thread (4) eval at ./boot.jl:385 [inlined]
signal (11) thread (4) include_string at ./loading.jl:2084
signal (11) thread (4) _include at ./loading.jl:2144
signal (11) thread (4) include at ./Base.jl:496 [inlined]
signal (11) thread (4) checked_include at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:630 [inlined]
signal (11) thread (4) #68 at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:714 [inlined]
signal (11) thread (4) task_local_storage at ./task.jl:297
signal (11) thread (4) #67 at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:713 [inlined]
signal (11) thread (4) task_local_storage at ./task.jl:297
signal (11) thread (4) #66 at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:712 [inlined]
signal (11) thread (4) task_local_storage at ./task.jl:297
signal (11) thread (4) #65 at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:711 [inlined]
signal (11) thread (4) task_local_storage at ./task.jl:297
signal (11) thread (4) #64 at /tmp/nix-build-raicode.drv-0/packages-source/julia-depot-path/.julia/packages/ReTestItems/ELM7z/src/ReTestItems.jl:710
signal (11) thread (4) unknown function (ip: 0x7fffec295e32)
signal (11) thread (4) jl_apply at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/julia.h:1982 [inlined]
signal (11) thread (4) start_task at /build/julia-v1.10.2+RAI-01b4a24fdbc27dc7a924ddbca3bbe4cd3073a12b/src/task.c:1238
Allocations: 4380912 (Pool: 4375274; Big: 5638); GC: 7
ERROR: LoadError: Package TypeOrder errored during testing (received signal: 11)
@albinahlback
Copy link
Contributor

I think it would help with some more information.

@fatteneder fatteneder added the needs more info Clarification or a reproducible example is required label Apr 28, 2024
@Drvi
Copy link
Contributor Author

Drvi commented Apr 29, 2024

Sadly, this is a private package, so I can't share the code itself, but the error happened when we were evaluating the code with ReTestItems which validates the files during Base.include here. The test file in question contained a single @testitem behind a @static check:

@static if get(VERSION.prerelease, 1, "") != "DEV"
    @testitem "JET error analysis TypeOrder package" begin
        ...
    end
end

We're using a patched Julia 1.10.2.
Please let me know what other information would be helpful to diagnose this.

@albinahlback
Copy link
Contributor

Yeah, but a minimal example is still sought after. If you cannot provide that, then one would need the actual code. Otherwise I believe it is quite hard to know what the actual issue is, no?

@jakobnissen
Copy link
Contributor

Could it be #51267 ?

@kpamnany
Copy link
Contributor

kpamnany commented May 1, 2024

From the backtrace, it looks to me like a search through module bindings segfaulted, which is different from #51267. I was wondering if #49556 perhaps unintentionally broke something and was hoping the backtrace might be enough to diagnose it.

Sadly, this does not happen deterministically, so we have little hope of getting an rr trace.

@vtjnash
Copy link
Member

vtjnash commented May 1, 2024

It sounds a bit like a call to Module() that the user isn't then properly gc-rooting (the proper way to make a module is with @eval(module Foo end), but the alternative constructor exists for power users in some cases)

@Drvi
Copy link
Contributor Author

Drvi commented May 7, 2024

We should use the @eval(module Foo end) style in ReTestItems, AFAIK.
We managed to get a core dump so we'll try to examine it with @kpamnany.

kpamnany pushed a commit that referenced this issue Jun 5, 2024
…#54671)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes #54285

---------

Co-authored-by: Jameson Nash <[email protected]>
kpamnany pushed a commit to RelationalAI/julia that referenced this issue Jun 5, 2024
…JuliaLang#54671)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes JuliaLang#54285

---------

Co-authored-by: Jameson Nash <[email protected]>
kpamnany added a commit to RelationalAI/julia that referenced this issue Jun 5, 2024
…JuliaLang#54671) (#158)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes JuliaLang#54285

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
KristofferC pushed a commit that referenced this issue Jun 7, 2024
…#54671)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes #54285

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 20f03dd)
Drvi pushed a commit to RelationalAI/julia that referenced this issue Jun 7, 2024
…JuliaLang#54671) (#158)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes JuliaLang#54285

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Drvi pushed a commit to RelationalAI/julia that referenced this issue Jun 7, 2024
…JuliaLang#54671) (#158)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes JuliaLang#54285

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
KristofferC pushed a commit that referenced this issue Jun 19, 2024
…#54671)

The race here is that svec might be replaced and a new binding
introduced into the keyset while we hold a reference to the old svec,
which led to a OOB access on the svec with the index a binding
introduced at the same time. This now introduces a bounds check which
will force taking the lock if we fail the lookup i.e we had a data race.

Fixes #54285

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 20f03dd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more info Clarification or a reproducible example is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants