Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when running with ALWAYS_COPY_STACKS #32700

Closed
kcajf opened this issue Jul 26, 2019 · 8 comments
Closed

Segmentation fault when running with ALWAYS_COPY_STACKS #32700

kcajf opened this issue Jul 26, 2019 · 8 comments
Labels
multithreading Base.Threads and related functionality

Comments

@kcajf
Copy link
Contributor

kcajf commented Jul 26, 2019

In an effort to work around #31104, I have tried to build julia with ALWAYS_COPY_STACKS. However, I'm having into trouble when running (amongst other things) the simple multithreading example provided here https://julialang.org/blog/2019/07/multithreading.

I'm on the latest master at time of writing. Before changing ALWAYS_COPY_STACKS, the examples run smoothly:

julia> 
➜  julia git:(master) JULIA_NUM_THREADS=8 julia --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.0-alpha.18 (2019-07-26)
 _/ |\__'_|_|_|\__'_|  |  Commit be7495644b (0 days old master)
|__/                   |

julia> versioninfo()
Julia Version 1.3.0-alpha.18
Commit be7495644b (2019-07-26 18:16 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8

julia> import Base.Threads.@spawn

julia> # sort the elements of `v` in place, from indices `lo` to `hi` inclusive
       function psort!(v, lo::Int=1, hi::Int=length(v))
           if lo >= hi                       # 1 or 0 elements; nothing to do
               return v
           end
           if hi - lo < 100000               # below some cutoff, run in serial
               sort!(view(v, lo:hi), alg = MergeSort)
               return v
           end
       
           mid = (lo+hi)>>>1                 # find the midpoint
       
           half = @spawn psort!(v, lo, mid)  # task to sort the lower half; will run
           psort!(v, mid+1, hi)              # in parallel with the current call sorting
                                             # the upper half
           wait(half)                        # wait for the lower half to finish
       
           temp = v[lo:mid]                  # workspace for merging
       
           i, k, j = 1, lo, mid+1            # merge the two sorted sub-arrays
           @inbounds while k < j <= hi
               if v[j] < temp[i]
                   v[k] = v[j]
                   j += 1
               else
                   v[k] = temp[i]
                   i += 1
               end
               k += 1
           end
           @inbounds while k < j
               v[k] = temp[i]
               k += 1
               i += 1
           end
       
           return v
       end
psort! (generic function with 3 methods)

julia> a = rand(20000000);

julia> b = copy(a); @time sort!(b, alg = MergeSort);   # single-threaded

  2.029216 seconds (113.27 k allocations: 82.114 MiB)

julia> b = copy(a); @time sort!(b, alg = MergeSort);
  1.975056 seconds (11 allocations: 76.294 MiB)

julia> b = copy(a); @time psort!(b);    # two threads
  0.776889 seconds (299.35 k allocations: 701.137 MiB, 6.08% gc time)

julia> b = copy(a); @time psort!(b);
  0.568276 seconds (3.78 k allocations: 686.935 MiB) 

I then modify src/options.h:

// task options ---------------------------------------------------------------

// select whether to allow the COPY_STACKS stack switching implementation
#define COPY_STACKS
// select whether to use COPY_STACKS for new Tasks by default
#define ALWAYS_COPY_STACKS  // changed this line

then run make again.
Same code as above:

julia> b = copy(a); @time psort!(b);    # two threads

signal (11): Segmentation fault
in expression starting at REPL[6]:1

signal (11): Segmentation fault
in expression starting at REPL[6]:1
jl_gc_pool_alloc at /data/xxx/julia/src/gc.c:1117
jl_gc_pool_alloc at /data/xxx/julia/src/gc.c:1117
Atomic at ./atomics.jl:67 [inlined]
SpinLock at ./locks-mt.jl:33 [inlined]
GenericCondition at ./condition.jl:68 [inlined]
Task at ./task.jl:5 [inlined]
Task at ./task.jl:5 [inlined]
macro expansion at ./threadingconstructs.jl:123 [inlined]
psort! at ./REPL[2]:13
#3 at ./threadingconstructs.jl:120
_jl_invoke at /data/xxx/julia/src/gf.c:2043 [inlined]
jl_apply_generic at /data/xxx/julia/src/gf.c:2213
jl_apply at /data/xxx/src/julia.h:1630 [inlined]
start_task at /data/xxx/julia/src/task.c:604
jl_init_root_task at /data/xxx/julia/src/task.c:993

signal (11): Segmentation fault
in expression starting at REPL[6]:1

signal (11): Segmentation fault
in expression starting at REPL[6]:1
RefValue at ./refvalue.jl:8 [inlined]
RefValue at ./refvalue.jl:10 [inlined]
Ref at ./refpointer.jl:82 [inlined]
poptaskref at ./task.jl:603
jl_threadfun at /data/xxx/julia/src/partr.c:287
jl_gc_pool_alloc at /data/xxx/julia/src/gc.c:1117
jl_gc_pool_alloc at /data/xxx/julia/src/gc.c:1117
Atomic at ./atomics.jl:67 [inlined]
SpinLock at ./locks-mt.jl:33 [inlined]
GenericCondition at ./condition.jl:68 [inlined]
Task at ./task.jl:5 [inlined]
Task at ./task.jl:5 [inlined]
macro expansion at ./threadingconstructs.jl:123 [inlined]
psort! at ./REPL[2]:13
#3 at ./threadingconstructs.jl:120
_jl_invoke at /data/xxx/julia/src/gf.c:2043 [inlined]
jl_apply_generic at /data/xxx/julia/src/gf.c:2213
start_thread at /lib64/libpthread.so.0 (unknown line)
clone at /lib64/libc.so.6 (unknown line)
Allocations: 2370203 (Pool: 2369545; Big: 658); GC: 7
jl_apply at /data/xxx/julia/src/julia.h:1630 [inlined]
start_task at /data/xxx/julia/src/task.c:604
[1]    23758 segmentation fault (core dumped)  JULIA_NUM_THREADS=8 julia --startup-file=no
@JeffBezanson JeffBezanson added the multithreading Base.Threads and related functionality label Jul 30, 2019
@kcajf
Copy link
Contributor Author

kcajf commented Aug 23, 2019

Has anyone been able to reproduce this? Without this, the whole Julia <-> Java ecosystem is broken.

@PallHaraldsson
Copy link
Contributor

FYI @kcajf not in Julia 1.0.4 LTS version: "Without this, the whole Julia <-> Java ecosystem is broken." Still I thought the workaround (for 1.3) worked, and yes it needs fixing, and 1.4 is the milestone w/o the workaround.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 23, 2019

1.3 has the JULIA_COPY_STACKS environment variable: #32885. I'm not sure if that change also fixes this segfault, but a segfault is a bug so feature freeze does not apply; this can be fixed at any point.

@kcajf
Copy link
Contributor Author

kcajf commented Aug 23, 2019

Thanks Stefan, I hadn't seen / tested since #32885. Testing again on 1.4.0-DEV.31 and the above code runs well with JULIA_COPY_STACKS=1, and JavaCall works. Very happy!

@kcajf kcajf closed this as completed Aug 23, 2019
@ExpandingMan
Copy link
Contributor

ExpandingMan commented Sep 18, 2019

Can we re-open this issue? I'm still experience segfaults even with JULIA_COPY_STACKS=true on 1.3-rc2.

JavaCall segfaults immediately for me, see here

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.0-rc2.0 (2019-09-12)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using JavaCall

julia> JavaCall.init()
[1]    5331 segmentation fault (core dumped)  julia
 ✘  expandingman@thinkpad-x1c  ~  echo $JULIA_COPY_STACKS
true

@SimonDanisch
Copy link
Contributor

SimonDanisch commented Sep 18, 2019

Works for me:

sd@simondanisch:~$ JULIA_COPY_STACKS=1 juliadev
(v1.4) pkg> add JavaCall
...
  Updating `~/.julia/environments/v1.4/Project.toml`
  [494afd89] + JavaCall v0.7.1
...
julia> using JavaCall
julia> JavaCall.init()
v1.4) pkg> test JavaCall
   Testing JavaCall
...
Test Summary:    | Pass  Total
unsafe_strings_1 |    3      3
In Java, recd: 10
In Java, recd: 10
In Java, recd: 10
In Java, recd: 2147483647
In Java, recd: 9223372036854775807
In Java, recd: Hello Java
In Java, recd: 10.02
In Java, recd: 10.02
In Java, recd: 1.7976931348623157E308
In Java, recd: 3.4028235E38
In Java, recd: null
Test Summary:       | Pass  Total
parameter_passing_1 |   12     12
Test Summary:        | Pass  Total
static_method_call_1 |    3      3
Test Summary:      | Pass  Total
instance_methods_1 |    3      3
Test Summary: | Pass  Total
null_1        |    1      1
Test Summary: | Pass  Total
arrays_1      |   14     14
Test Summary: | Pass  Total
dates_1       |    6      6
Test Summary:    | Pass  Total
map_conversion_1 |    1      1
Test Summary:           | Pass  Total
array_list_conversion_1 |    1      1
Test Summary:   | Pass  Total
inner_classes_1 |    8      8
Test Summary: | Pass  Total
sinx_1        |    2      2
method_lists_1: Test Failed at /home/sd/.julia/packages/JavaCall/toamy/test/runtests.jl:172
  Expression: [getname(typ) for typ = getparametertypes(m)] == ["java.lang.String", "int"]
   Evaluated: ["java.lang.String"] == ["java.lang.String", "int"]
Stacktrace:
 [1] top-level scope at /home/sd/.julia/packages/JavaCall/toamy/test/runtests.jl:172
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1107
 [3] top-level scope at /home/sd/.julia/packages/JavaCall/toamy/test/runtests.jl:155
Test Summary:  | Pass  Fail  Total
method_lists_1 |    7     1      8
ERROR: LoadError: Some tests did not pass: 7 passed, 1 failed, 0 errored, 0 broken.
in expression starting at /home/sd/.julia/packages/JavaCall/toamy/test/runtests.jl:154
ERROR: Package JavaCall errored during testing

Btw, I'm using GRAAL VM, if that changes anything...
I'm guessing the method_list failure is also related to a different Java version or something not too serious ;)

@ExpandingMan
Copy link
Contributor

Wow, I feel incredibly stupid. I believe this happened because I set JULIA_COPY_STACKS=true instead of 1 or yes (as I should have realized by looking at Jeff's code).

We should document this, any idea which section of the docs it should go in?

@StefanKarpinski
Copy link
Member

I've opened #33318 to check JULIA_COPY_STACKS for valid values and exit on startup if it is set to something invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

6 participants