Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@cfunctions precompiled into a system image are not executable from a foreign thread due to TLS accesses #43748

Closed
vchuravy opened this issue Jan 10, 2022 · 5 comments
Labels
compiler:codegen Generation of LLVM IR and native code

Comments

@vchuravy
Copy link
Member

@ericphanson is observing segmentation faults when using a system image with CUDA.jl (x-ref: JuliaGPU/CUDA.jl#1314). The TL;DR is that CUDA.jl uses :uv_async_send to trigger AsyncConditions from a foreign thread.

async_send(handle::Ptr{Cvoid}) = ccall(:uv_async_send, Cvoid, (Ptr{Cvoid},), handle).

If we disassemble the code we normally run, when executing without a custom systemimage:

Dump of assembler code for function jlcapi_async_send_15:
   0x00007fff9c9bc9c0 <+0>:     push   %r14
   0x00007fff9c9bc9c2 <+2>:     push   %rbx
   0x00007fff9c9bc9c3 <+3>:     sub    $0x8,%rsp
   0x00007fff9c9bc9c7 <+7>:     movabs $0x7ffff761dfc0,%rdx
   0x00007fff9c9bc9d1 <+17>:    movabs $0x7fffee5cecc8,%rsi
   0x00007fff9c9bc9db <+27>:    mov    %fs:0x0,%rax
   0x00007fff9c9bc9e4 <+36>:    mov    -0x8(%rax),%rax
   0x00007fff9c9bc9e8 <+40>:    mov    %rsp,%rbx
   0x00007fff9c9bc9eb <+43>:    mov    (%rsi),%rcx
   0x00007fff9c9bc9ee <+46>:    mov    (%rdx),%rdx
   0x00007fff9c9bc9f1 <+49>:    lea    0x8(%rax),%r8
   0x00007fff9c9bc9f5 <+53>:    cmp    %rdx,%rcx
   0x00007fff9c9bc9f8 <+56>:    mov    %rcx,%rsi
   0x00007fff9c9bc9fb <+59>:    cmovae %rdx,%rsi
   0x00007fff9c9bc9ff <+63>:    test   %rax,%rax
   0x00007fff9c9bca02 <+66>:    movabs $0x7fff9c9bca40,%rax
   0x00007fff9c9bca0c <+76>:    cmovne %r8,%rbx
   0x00007fff9c9bca10 <+80>:    movabs $0x7fff9c9bc900,%r8
   0x00007fff9c9bca1a <+90>:    cmovne %rdx,%rsi
   0x00007fff9c9bca1e <+94>:    mov    (%rbx),%r14
   0x00007fff9c9bca21 <+97>:    cmove  %r8,%rax
   0x00007fff9c9bca25 <+101>:   cmp    %rdx,%rcx
   0x00007fff9c9bca28 <+104>:   mov    %rsi,(%rbx)
   0x00007fff9c9bca2b <+107>:   cmovae %r8,%rax
   0x00007fff9c9bca2f <+111>:   call   *%rax
   0x00007fff9c9bca31 <+113>:   mov    %r14,(%rbx)
   0x00007fff9c9bca34 <+116>:   add    $0x8,%rsp
   0x00007fff9c9bca38 <+120>:   pop    %rbx
   0x00007fff9c9bca39 <+121>:   pop    %r14
   0x00007fff9c9bca3b <+123>:   ret    
End of assembler dump.

On the other hand having a custom sysimage:

Dump of assembler code for function jlcapi_async_send_48533:
   0x00007fffe4209300 <+0>:     push   %r15
   0x00007fffe4209302 <+2>:     push   %r14
   0x00007fffe4209304 <+4>:     push   %r13
   0x00007fffe4209306 <+6>:     push   %r12
   0x00007fffe4209308 <+8>:     push   %rbx
   0x00007fffe4209309 <+9>:     sub    $0x20,%rsp
   0x00007fffe420930d <+13>:    mov    0x85b9bfc(%rip),%rax        # 0x7fffec7c2f10 <jl_tls_offset.real>
   0x00007fffe4209314 <+20>:    vxorps %xmm0,%xmm0,%xmm0
   0x00007fffe4209318 <+24>:    mov    %rdi,%r14
   0x00007fffe420931b <+27>:    movq   $0x0,0x10(%rsp)
   0x00007fffe4209324 <+36>:    vmovaps %xmm0,(%rsp)
   0x00007fffe4209329 <+41>:    test   %rax,%rax
   0x00007fffe420932c <+44>:    je     0x7fffe42093db <jlcapi_async_send_48533+219>
   0x00007fffe4209332 <+50>:    mov    %fs:0x0,%rcx
   0x00007fffe420933b <+59>:    mov    (%rcx,%rax,1),%rbx
   0x00007fffe420933f <+63>:    movq   $0x4,(%rsp)
   0x00007fffe4209347 <+71>:    mov    0x136c62(%rip),%rcx        # 0x7fffe433ffb0
   0x00007fffe420934e <+78>:    mov    %rsp,%rdi
   0x00007fffe4209351 <+81>:    mov    $0x570,%esi
   0x00007fffe4209356 <+86>:    mov    $0x10,%edx
   0x00007fffe420935b <+91>:    mov    (%rbx),%rax
   0x00007fffe420935e <+94>:    mov    %rax,0x8(%rsp)
   0x00007fffe4209363 <+99>:    mov    %rdi,(%rbx)
   0x00007fffe4209366 <+102>:   mov    (%rcx),%rax
   0x00007fffe4209369 <+105>:   mov    0x8(%rbx),%r12
   0x00007fffe420936d <+109>:   mov    0x10(%rbx),%rdi
   0x00007fffe4209371 <+113>:   mov    %rax,0x8(%rbx)
   0x00007fffe4209375 <+117>:   mov    0x85b8b54(%rip),%r15        # 0x7fffec7c1ed0 <jl_globalYY.10208>
   0x00007fffe420937c <+124>:   mov    0x859ffc5(%rip),%r13        # 0x7fffec7a9348 <SUM.CoreDOT.Ptr1114>
   0x00007fffe4209383 <+131>:   call   0x7fffe3ac1190 <jl_gc_pool_alloc@plt>
   0x00007fffe4209388 <+136>:   lea    0x18(%rsp),%rsi
   0x00007fffe420938d <+141>:   mov    %r15,%rdi
   0x00007fffe4209390 <+144>:   mov    $0x1,%edx
   0x00007fffe4209395 <+149>:   mov    %r13,-0x8(%rax)
   0x00007fffe4209399 <+153>:   mov    %r14,(%rax)
   0x00007fffe420939c <+156>:   mov    %rax,0x10(%rsp)
   0x00007fffe42093a1 <+161>:   mov    %rax,0x18(%rsp)
   0x00007fffe42093a6 <+166>:   call   0x7fffe3ac1200 <jl_apply_generic@plt>
   0x00007fffe42093ab <+171>:   mov    -0x8(%rax),%rcx
   0x00007fffe42093af <+175>:   mov    0x85a1d12(%rip),%rsi        # 0x7fffec7ab0c8 <SUM.CoreDOT.Int32677>
   0x00007fffe42093b6 <+182>:   and    $0xfffffffffffffff0,%rcx
   0x00007fffe42093ba <+186>:   cmp    %rsi,%rcx
   0x00007fffe42093bd <+189>:   jne    0x7fffe42093e9 <jlcapi_async_send_48533+233>
   0x00007fffe42093bf <+191>:   mov    (%rax),%eax
   0x00007fffe42093c1 <+193>:   mov    %r12,0x8(%rbx)
   0x00007fffe42093c5 <+197>:   mov    0x8(%rsp),%rcx
   0x00007fffe42093ca <+202>:   mov    %rcx,(%rbx)
   0x00007fffe42093cd <+205>:   add    $0x20,%rsp
   0x00007fffe42093d1 <+209>:   pop    %rbx
   0x00007fffe42093d2 <+210>:   pop    %r12
   0x00007fffe42093d4 <+212>:   pop    %r13
   0x00007fffe42093d6 <+214>:   pop    %r14
   0x00007fffe42093d8 <+216>:   pop    %r15
   0x00007fffe42093da <+218>:   ret    
   0x00007fffe42093db <+219>:   call   *0x85b9b1f(%rip)        # 0x7fffec7c2f00 <jl_pgcstack_func_slot.real>
   0x00007fffe42093e1 <+225>:   mov    %rax,%rbx
   0x00007fffe42093e4 <+228>:   jmp    0x7fffe420933f <jlcapi_async_send_48533+63>
   0x00007fffe42093e9 <+233>:   lea    0x32924(%rip),%rdi        # 0x7fffe423bd14 <_j_str175>
   0x00007fffe42093f0 <+240>:   mov    %rax,%rdx
   0x00007fffe42093f3 <+243>:   call   0x7fffe3ac1070 <jl_type_error@plt>

The pointer to disassemble was obtained by using Reproducer.launch()

With:

module Reproducer

async_send(data::Ptr{Cvoid}) = ccall(:uv_async_send, Cint, (Ptr{Cvoid},), data) 

function launch()
    callback = @cfunction(async_send, Cint, (Ptr{Cvoid},))
    return callback
end

end # module

and a precompile.jl like:

using Reproducer

Reproducer.launch()

This is funnily related to #43747, since I wanted to have errors instead of mysterious segmentation faults.

cc: @KristofferC (although I don't think there is something we can do in PackageCompiler.jl), @maleadt

@vchuravy vchuravy added the compiler:codegen Generation of LLVM IR and native code label Jan 10, 2022
@JeffBezanson
Copy link
Member

I think in this case cfunction is generating the wrong (very inefficient) code. But in general calling julia code from a foreign thread will not work. Luckily in this case one can call uv_async_send directly.

@JeffBezanson
Copy link
Member

Related: #35252 #36977 #17573

@vchuravy
Copy link
Member Author

But in general calling julia code from a foreign thread will not work.

Right. One has to craft very careful code. I have some more complicated examples if you want xD.

The goal of #43747 was to be able to have some guarantees over such code. What cropped up in #41616 was that in in Base in the presence of @threadcall relies on this.

@vtjnash

This comment was marked as off-topic.

@vtjnash
Copy link
Member

vtjnash commented Oct 25, 2022

now works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

No branches or pull requests

3 participants