Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't look up the method age during deferred compilation. #405

Merged
merged 1 commit into from
Mar 14, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Mar 14, 2023

The following MWE:

using CUDA

function hello()
    @cuda dynamic=true world()
    return
end

function world()
    return
end

@cuda hello()

... resulted in an abort:

julia: /home/tim/Julia/src/julia/src/jitlayers.cpp:191: jl_value_t* (* _jl_compile_codeinst(jl_code_instance_t*, jl_code_info_t*, size_t, llvm::orc::ThreadSafeContext))(jl_value_t*, jl_value_t**, uint32_t, _jl_code_instance_t*): Assertion `codeinst->min_world <= world && (codeinst->max_world >= world || codeinst->max_world == 0) && "invalid world for method-instance"' failed.

[211930] signal (6.-6): Aborted
in expression starting at /home/tim/Julia/pkg/CUDA/wip.jl:20
unknown function (ip: 0x7f70892968ec)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7f708923145b)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
_jl_compile_codeinst at /home/tim/Julia/src/julia/src/jitlayers.cpp:191
jl_generate_fptr_impl at /home/tim/Julia/src/julia/src/jitlayers.cpp:460
jl_compile_method_internal at /home/tim/Julia/src/julia/src/gf.c:2321 [inlined]
jl_compile_method_internal at /home/tim/Julia/src/julia/src/gf.c:2210
_jl_invoke at /home/tim/Julia/src/julia/src/gf.c:2723 [inlined]
ijl_apply_generic at /home/tim/Julia/src/julia/src/gf.c:2913
FunctionSpec at /home/tim/Julia/pkg/GPUCompiler/src/interface.jl:69
#s129#116 at /home/tim/Julia/pkg/GPUCompiler/src/driver.jl:204 [inlined]
#s129#116 at ./none:0
_jl_invoke at /home/tim/Julia/src/julia/src/gf.c:2712 [inlined]
ijl_apply_generic at /home/tim/Julia/src/julia/src/gf.c:2913
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /home/tim/Julia/src/julia/src/gf.c:2712 [inlined]
ijl_apply_generic at /home/tim/Julia/src/julia/src/gf.c:2913
jl_apply at /home/tim/Julia/src/julia/src/julia.h:1878 [inlined]
jl_call_staged at /home/tim/Julia/src/julia/src/method.c:530
ijl_code_for_staged at /home/tim/Julia/src/julia/src/method.c:581
get_staged at ./compiler/utilities.jl:115
retrieve_code_info at ./compiler/utilities.jl:127 [inlined]
InferenceState at ./compiler/inferencestate.jl:354
typeinf_edge at ./compiler/typeinfer.jl:922
abstract_call_method at ./compiler/abstractinterpretation.jl:611
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:152
abstract_call_known at ./compiler/abstractinterpretation.jl:1932
unknown function (ip: 0x7f703c17b3be)

The reason being that we look up the method's age using a generated function that gets invalidated when a function is redefined. However, lacking JuliaLang/julia#48611 we don't actually know the world we're generating code for, so we use the current world age. Turns out that can be too new for the world we're emitting code for, triggering an assertion when our generator returns a ci with those bounds set.

As this only happens with deferred codegen, work around this by not looking up the age of the method, as we'll override it during codegen anyway with the parent's age.

@maleadt
Copy link
Member Author

maleadt commented Mar 14, 2023

I'm not sure all this is correct, but the rework at least looks like an improvement (now setting apparently crucial bits in the ci). However, this is fishy:

using CUDA, GPUCompiler

function hello()
    @cuda dynamic=true world()
    return
end

@show GPUCompiler.get_world(typeof(hello), Tuple{}) |> Int

function world()
    return
end

@cuda hello()
GPUCompiler.get_world(typeof(hello), Tuple{}) |> Int = 33441
ERROR: LoadError: MethodError: no method matching world()
The applicable method may be too new: running in world age 33441, while current world is 33442.

Closest candidates are:
  world() (method too new to be called from this world context.)
   @ Main ~/Julia/pkg/CUDA/wip.jl:10

It makes sense that kernel definitions are now order dependent, because we codegen in the method's world, and world is defined before hello is. However, I'm not sure why this only triggers when adding the get_world call, and not with the MWE at the top...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant