Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise segfault on small changes in large package (v1.8.0-beta3 regression from v1.7) #44954

Closed
oschulz opened this issue Apr 12, 2022 · 15 comments
Milestone

Comments

@oschulz
Copy link
Contributor

oschulz commented Apr 12, 2022

With v1.8.0-beta3, I run into a segfault immediately with Revise on a (large) package (BAT.jl), even with very small changes. I don't really have an MWE, and the behavior is intermittent. I've used Revise very extensively with the package on Julia v1.7 without any trouble.

Steps to reproduce (starting with a completly empty DEPOT_PATH), all steps in a single Julia session (segfault doesn't always occur):

julia> versioninfo()
Julia Version 1.8.0-beta3
Commit 3e092a2521 (2022-03-29 15:42 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 128 × AMD EPYC 7702P 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 64 on 128 virtual cores

julia> DEPOT_PATH
1-element Vector{String}:
 "/tmp/.julia"

(@v1.8) pkg> status
Status `/tmp/.julia/environments/v1.8/Project.toml` (empty project)

(@v1.8) pkg> add [email protected]

julia> using Revise

(@v1.8) pkg> activate --temp

(jl_xxInzK) pkg> dev BAT

julia> cd(dirname(dirname(Base.find_package("BAT"))))

shell> git checkout revise-test-1
Branch 'revise-test-1' set up to track remote branch 'revise-test-1' from 'origin'.
Switched to a new branch 'revise-test-1'

(jl_TK0fkk) pkg> precompile

julia> using BAT

julia> bat_sample(BAT.example_posterior());

shell> git checkout revise-test-2
Branch 'revise-test-2' set up to track remote branch 'revise-test-2' from 'origin'.
Switched to a new branch 'revise-test-2'

julia> bat_sample(BAT.example_posterior());

signal (11): Segmentation fault
in expression starting at none:1
mtcache_hash_lookup at /buildworker/worker/package_linux64/build/src/typemap.c:292 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:1025
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:1324 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:1027
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:1324 [inlined]
jl_lookup_generic_ at /buildworker/worker/package_linux64/build/src/gf.c:2480 [inlined]
ijl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2536

This runs fine on Julia v1.7.2 (segfault never occurs).

More compact version of steps-to-reproduce, without switching REPL modes:

using Pkg
pkg"add [email protected]"
using Revise

pkg"activate --temp"
pkg"dev BAT"
cd(dirname(dirname(Base.find_package("BAT"))))

run(`git checkout revise-test-1`)

pkg"precompile"
using BAT
bat_sample(BAT.example_posterior());

run(`git checkout revise-test-2`)

bat_sample(BAT.example_posterior());
@gbaraldi
Copy link
Member

might be similar to #44913

@vtjnash
Copy link
Member

vtjnash commented Apr 12, 2022

I can't say anything with the truncated backtrace. I attempted to run this on master a couple times, but it worked for me so far. Could you run with --check-bounds=yes --bug-report=rr?

@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

Here's a new version of the test script using two different BAT branches, this one seems to be more reproducible:

using Pkg
pkg"add [email protected]"
using Revise

pkg"activate --temp"
pkg"dev BAT"
cd(dirname(dirname(Base.find_package("BAT"))))

run(`git checkout revise-test-3`)

pkg"precompile"
using BAT

run(`git checkout revise-test-4`)

1+1

Running with --check-bounds=yes --bug-report=rr ... seems to take a while. :-)

@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

@vtjnash I tried running with --check-bounds=yes --bug-report=rr (after setting sysctl -w kernel.perf_event_paranoid=1) but it stalls after filling my /tmp directory to 100% (i.e. writing 100GB of data).

Update: found a way, see below.

@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

Running the test script above with --check-bounds=yes --bug-report=rr in one go makes the disk explode, but running it in two stages works.

Running with julia:

using Pkg
pkg"add [email protected] BugReporting"

pkg"activate @revisedebug"
pkg"dev BAT"
cd(dirname(dirname(Base.find_package("BAT"))))

run(`git checkout revise-test-3`)

pkg"precompile"
using BAT

Then, running with julia --check-bounds=yes --bug-report=rr:

using Pkg
pkg"activate @revisedebug"
using Revise
using BAT

cd(dirname(dirname(Base.find_package("BAT"))))
run(`git checkout revise-test-4`)

1+1

Result:

signal (11): Segmentation fault
in expression starting at none:1
mtcache_hash_lookup at /buildworker/worker/package_linux64/build/src/typemap.c:292 [inlined]
jl_typemap_level_assoc_exact at /buildworker/worker/package_linux64/build/src/typemap.c:1025
jl_typemap_assoc_exact at /buildworker/worker/package_linux64/build/src/julia_internal.h:1324 [inlined]
[...]
To upload a trace, please authenticate, by visiting:
[...]
[ Info: Uploading Trace directory
Uploaded to https://s3.amazonaws.com/julialang-dumps/reports/[...]
┌ Info: Debugged process failed. Unless you see rr errors above, the trace likely completed.
│   exitcode = 0
└   termsignal = 11
Segmentation fault (core dumped)

@vtjnash can you work with that?

@cafaxo
Copy link
Contributor

cafaxo commented Apr 13, 2022

This is a shot in the dark, but does this issue still occur if you downgrade JuliaInterpreter to 0.9.11?

With 0.9.12, Revise also seems to be crashing for me. (I am on Julia master)
I will add a more detailed report on the issue I am experiencing later.

Edit: I get the following with JuliaInterpreter 0.9.12:

Internal error: encountered unexpected error in runtime:
UndefVarError(var=:o)
ijl_undefined_var_error at /Users/lukasmayrhofer/julia/src/rtutils.c:132
ijl_get_binding_or_error at /Users/lukasmayrhofer/julia/src/module.c:407
getglobal_nothrow at ./compiler/tfuncs.jl:2090 [inlined]
builtin_effects at ./compiler/tfuncs.jl:1822
abstract_call_known at ./compiler/abstractinterpretation.jl:1573
abstract_call at ./compiler/abstractinterpretation.jl:1727
abstract_call at ./compiler/abstractinterpretation.jl:1704
... thousands of lines more of stuff

So it seems like this might be an unrelated issue. Sorry for the noise.

@IanButterworth IanButterworth added this to the 1.8 milestone Apr 13, 2022
@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

This is a shot in the dark, but does this issue still occur if you downgrade JuliaInterpreter to 0.9.11?

Just tried, I get the same segfault as before.

@vtjnash
Copy link
Member

vtjnash commented Apr 13, 2022

you seem to be on an old version of Julia (missing #44855), but that should be unrelated to the segfault

@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

you seem to be on an old version of Julia

I thought I had run this on v1.8.0-beta1, did I use v1.7.2 by mistake?

@vtjnash
Copy link
Member

vtjnash commented Apr 13, 2022

That comment is for @cafaxo, who appears to be using julia-master to test this

@oschulz
Copy link
Contributor Author

oschulz commented Apr 13, 2022

Ah, sorry :-)

vtjnash added a commit to vtjnash/JuliaInterpreter.jl that referenced this issue Apr 13, 2022
aviatesk pushed a commit to JuliaDebug/JuliaInterpreter.jl that referenced this issue Apr 14, 2022
* eval `=` is not functionally equivalent to global assignment

Caused JuliaLang/julia#44954 crash because of the incorrect semantics

* Update interpret.jl
@giordano
Copy link
Contributor

Is this fixed by JuliaDebug/JuliaInterpreter.jl#534?

@SebastianAment
Copy link
Contributor

I was running into the same issue with Julia 1.8.0-beta3, but updating to Revise v3.3.3 with JuliaInterpreter 0.9.13 appears to have fixed it.

@vtjnash vtjnash closed this as completed Apr 15, 2022
@vtjnash
Copy link
Member

vtjnash commented Apr 15, 2022

Jeff also fixed it in #44974 (2 related bugs)

@oschulz
Copy link
Contributor Author

oschulz commented Apr 16, 2022

Thanks @vtjnash and @JeffBezanson !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants