Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4), possibly alignment ? #135

Open
sjdaines opened this issue Dec 16, 2024 · 7 comments

Comments

@sjdaines
Copy link

sjdaines commented Dec 16, 2024

I'm seeing intermittent segfaults with Julia 1.11.2, for code that works fine on Julia 1.10
This is using SIMD v3.7.0 (not tested other versions)

Attempt at a MWE (this only generates intermittent segfaults, the full code always fails with Julia 1.11.2):

julia> import SIMD
julia> rbuf = Ref{SIMD.Vec{8, Float32}}()
Base.RefValue{SIMD.Vec{8, Float32}}(<8 x Float32>[2.2f-44, 0.0, -2.4113148f37, 4.4487f-41, 6.78f-41, 0.0, 0.0, 0.0])
julia> rbuf[]  # segfault!

The behaviour of the MWE above seems to be intermittent: the first few runs with a fresh julia repl generate a segfault, subsequent runs on the same PC then work (whereas the full code always fails with Julia 1.11.2).
(rbuf is uninitialized in the MWE above, the full code of course does initialise the equivalent of rbuf[] and still fails with a segfault)

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null
@eschnett
Copy link
Owner

I cannot reproduce this on macOS (arm64-apple-darwin24.0.0). The problem might be architecture-specific.

@sjdaines
Copy link
Author

I'm seeing other segfaults as well with SIMD on this PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, so this doesn't look like it's anything to do with Ref

A possibly simpler MWE, again this sometimes gives a segfault, sometimes OK:

julia> import SIMD

julia> x = zeros(SIMD.Vec{8, Float64}, 10)
10-element Vector{SIMD.Vec{8, Float64}}:

[538808] signal 11 (128): Segmentation fault
in expression starting at none:0
getindex at ./essentials.jl:917 [inlined]
getindex at ./array.jl:930
unknown function (ip: 0x77108445e7b2)
alignment at ./arrayshow.jl:69
_print_matrix at ./arrayshow.jl:207
print_matrix at ./arrayshow.jl:171
print_matrix at ./arrayshow.jl:171 [inlined]
print_array at ./arrayshow.jl:358 [inlined]
show at ./arrayshow.jl:399
unknown function (ip: 0x77108445e476)
#68 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:367
jfptr_YY.68_10048.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:372 [inlined]
display at ./multimedia.jl:340
jfptr_display_13663.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:409
#70 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:378
jfptr_YY.70_10086.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:376
do_respond at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1003
jfptr_do_respond_10241.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_interface at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
jfptr_run_interface_8710.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
run_frontend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1474
#75 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:480
jfptr_YY.75_10143.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 2140600 (Pool: 2140459; Big: 141); GC: 3
Segmentation fault (core dumped)

@sjdaines sjdaines changed the title Intermittent segfaults with Ref on Julia 1.11.2 Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4) Dec 21, 2024
@sjdaines
Copy link
Author

sjdaines commented Dec 21, 2024

And possibly an even simpler MWE... looks like this is perhaps alignment ?
Same PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above,

Is this just user error ? (should I be using some explicit way to align array allocations ?)

julia> a = Array{SIMD.Vec{8, Float64}}(undef, 2);   # ; to suppress REPL output

julia> Int(pointer(a)) % 64   # if this gives 0, no segfault
32

julia> a   # segfault as soon as try and display
2-element Vector{SIMD.Vec{8, Float64}}:

[539567] signal 11 (128): Segmentation fault
in expression starting at none:0
getindex at ./essentials.jl:917 [inlined]
getindex at ./array.jl:930
unknown function (ip: 0x71531a6b7472)
alignment at ./arrayshow.jl:69
_print_matrix at ./arrayshow.jl:207
print_matrix at ./arrayshow.jl:171
print_matrix at ./arrayshow.jl:171 [inlined]
print_array at ./arrayshow.jl:358 [inlined]
show at ./arrayshow.jl:399
unknown function (ip: 0x71531a6b7206)
#68 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:367
jfptr_YY.68_10048.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:353
display at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:372 [inlined]
display at ./multimedia.jl:340
jfptr_display_13663.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:409
#70 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:378
jfptr_YY.70_10086.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
with_repl_linfo at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:661
jfptr_with_repl_linfo_10178.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
print_response at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:376
do_respond at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1003
jfptr_do_respond_10241.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_interface at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/LineEdit.jl:2755
jfptr_run_interface_8710.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
run_frontend at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:1474
#75 at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:480
jfptr_YY.75_10143.1 at /home/sd336/software/julia-1.11.2/share/julia/compiled/v1.11/REPL/u0gqU_4x0TT.so (unknown line)
jl_apply at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci5-12/julialang/julia-release-1-dot-11/src/task.c:1202
Allocations: 5568417 (Pool: 5568190; Big: 227); GC: 12
Segmentation fault (core dumped)

@sjdaines
Copy link
Author

Testing the same MWE on same PC, but with Julia 1.10.7 seems to consistently give a 64-byte aligned array, and no segfault:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.7 (2024-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.10) pkg> activate examples
  Activating project at `~/PALEO/PALEOocean.jl/examples`

julia> import SIMD

julia> a = Array{SIMD.Vec{8, Float64}}(undef, 2);

julia> Int(pointer(a)) % 64
0

julia> versioninfo()
Julia Version 1.10.7
Commit 4976d05258e (2024-11-26 15:57 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null

@sjdaines sjdaines changed the title Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4) Intermittent segfaults with Julia 1.11.2, possibly architecture-specific (AMD Zen 4), possibly alignment ? Dec 21, 2024
@sjdaines
Copy link
Author

Perhaps another clue to alignment issue ? Same PC (AMD Ryzen 5 7600) and julia version 1.11.2 as above, intermittent AssertionError while trying valloc to explicitly align array:

julia> a = SIMD.valloc(SIMD.Vec{8, Float64}, 1, 2);

julia> Int(pointer(parent(a))) % 64
0

julia> Int(pointer(a)) % 64
0

julia> a = SIMD.valloc(SIMD.Vec{8, Float64}, 1, 2);
ERROR: AssertionError: mod(off, sizeof(T)) == 0
Stacktrace:
 [1] valloc(::Type{SIMD.Vec{8, Float64}}, N::Int64, sz::Int64)
   @ SIMD ~/.julia/packages/SIMD/cST3l/src/arrayops.jl:127
 [2] top-level scope
   @ REPL[10]:1

julia> versioninfo()
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × AMD Ryzen 5 7600 6-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_CONDAPKG_BACKEND = Null

where the assertion is from

@assert mod(off, sizeof(T)) == 0

@KristofferC
Copy link
Collaborator

There were changes to how arrays were implemented in 1.11. If you can reproduce it consistently, perhaps you could do a bisect to see where it started to fail?

sjdaines added a commit to PALEOtoolkit/PALEOocean.jl that referenced this issue Dec 22, 2024
- use netcdf files for output
- tidy up yaml files and remove old versions
- bugfix for ReactionOceanTransportTMM: workaround SIMD issue, see eschnett/SIMD.jl#135
@sjdaines
Copy link
Author

sjdaines commented Jan 3, 2025

This looks like JuliaLang/julia#56937 to me... ?

I'm not sure about the proposed fix JuliaLang/julia#56938 though, which if I understand it correctly means that Julia will at least be consistent in the sense it no longer overpromises about the alignment it provides, but will not guarantee alignment larger than 16-byte alignment ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants