Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the new jl_method_lookup_by_tt API on 1.11. #550

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Feb 27, 2024

GPUCompiler's methodinstance function is pretty slow and allocates:

julia> @benchmark GPUCompiler.methodinstance(typeof(identity), Tuple{Nothing}, GPUCompiler.tls_world_age())
BenchmarkTools.Trial: 10000 samples with 204 evaluations.
 Range (min  max):  378.623 ns   11.078 μs  ┊ GC (min  max): 0.00%  95.51%
 Time  (median):     389.853 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   401.785 ns ± 312.086 ns  ┊ GC (mean ± σ):  2.66% ±  3.28%

               ▂▆▇█▆▄▂▁▁▂▁▂▁
  ▂▂▂▂▁▂▂▂▂▂▃▄▆██████████████▇▅▄▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄
  379 ns           Histogram: frequency by time          414 ns <

 Memory estimate: 288 bytes, allocs estimate: 6.

That's why on 1.10, we improved this by using a generated function to cache the dynamic case, automatically invalidating the result when the world age changes:

julia> @benchmark GPUCompiler.methodinstance(typeof(identity), Tuple{Nothing})
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.889 ns  7.470 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.920 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.918 ns ± 0.112 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                               ▃             █
  ▂▃▁▁▁▁▁▁▁▁▁▁▁▂▁▅▁▁▁▁▁▁▁▁▁▁▁▂▁█▁▁▁▁▁▁▁▁▁▁▁▂▁█▁▁▁▁▁▁▁▁▁▁▁▂▄ ▂
  1.89 ns        Histogram: frequency by time       1.93 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

This however is not always valid, see #530. That's why @vchuravy exposed Julia's methodinstance cache in JuliaLang/julia#52572, so that starting on 1.11 we can now always perform a runtime call that's still reasonably fast:

julia> @benchmark GPUCompiler.methodinstance(typeof(identity), Tuple{Nothing}, GPUCompiler.tls_world_age())
BenchmarkTools.Trial: 10000 samples with 983 evaluations.
 Range (min  max):  57.720 ns  93.692 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     61.383 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   61.848 ns ±  3.046 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▁▃▃▃▄▆█▇▅▃▂
  ▂▂▄▇████████████▆▅▅▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▂▂▂▂▂▂▂▂▂ ▃
  57.7 ns         Histogram: frequency by time        78.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark GPUCompiler.methodinstance(typeof(identity), Tuple{Nothing})
BenchmarkTools.Trial: 10000 samples with 982 evaluations.
 Range (min  max):  58.982 ns  98.370 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     64.938 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   65.184 ns ±  3.429 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▄█▁ ▂▅
  ▁▁▂▆▇▄▂▄▆▅▄▄▅▄███▆███▅▆▆▃▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  59 ns           Histogram: frequency by time        80.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Although slower than our previous cache, it fixes #530:

julia> using GPUCompiler

julia> function foo end
foo (generic function with 0 methods)

julia> methodinstance(Base._stable_typeof(foo), Base.to_tuple_type(()))
ERROR: MethodError: no method matching invoke foo()
The function `foo` exists, but no method is defined for this combination of argument types.
Stacktrace:
 [1] methodinstance(ft::Type, tt::Type, world::UInt64)
   @ GPUCompiler ~/Julia/pkg/GPUCompiler/src/jlgen.jl:71
 [2] methodinstance(ft::Type, tt::Type)
   @ GPUCompiler ~/Julia/pkg/GPUCompiler/src/jlgen.jl:66
 [3] top-level scope
   @ REPL[6]:1

julia> foo() = nothing
foo (generic function with 1 method)

julia> methodinstance(Base._stable_typeof(foo), Base.to_tuple_type(()))
MethodInstance for foo()

I would propose to keep using the old version on 1.10 though, as nobody except @topolarity had run into its issues so far.

@maleadt
Copy link
Member Author

maleadt commented Feb 27, 2024

Hmm, looks like we're still leaving some performance on the table:

julia> function f() end
f (generic function with 1 method)

julia> @benchmark @ccall jl_method_lookup_by_tt(Tuple{typeof(f)}::Any, $(Base.get_world_counter())::Csize_t, nothing::Any)::Any
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min  max):  8.387 ns  29.098 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     8.589 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.873 ns ±  1.121 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃ ▅█▇▃                         ▂▅▄▁                        ▂
  ██████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▇▅█████▆▅▄▄▃▄▆▄▃▃▁▁▁▃▁▁▁▁▃▁▃▄█ █
  8.39 ns      Histogram: log(frequency) by time     11.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

EDIT: not sure what's happening here

julia> @btime @ccall jl_method_lookup_by_tt(Tuple{typeof(f)}::Any, $(Base.get_world_counter())::Csize_t, nothing::Any)::Any;
  8.397 ns (0 allocations: 0 bytes)

julia> @btime @ccall jl_method_lookup_by_tt($(Tuple{typeof(f)})::Any, $(Base.get_world_counter())::Csize_t, nothing::Any)::Any;
  86.284 ns (1 allocation: 16 bytes)

EDIT2: looks like performance is OK; I needed to compare identical functions.

julia> @btime @ccall jl_method_lookup_by_tt(Tuple{typeof(f)}::Any, $(Base.get_world_counter())::Csize_t, nothing::Any)::Any;
  8.528 ns (0 allocations: 0 bytes)

julia> @btime @ccall jl_method_lookup_by_tt(Tuple{typeof(identity), Nothing}::Any, $(Base.get_world_counter())::Csize_t, nothing::Any)::Any;
  56.381 ns (0 allocations: 0 bytes)

@maleadt maleadt merged commit 8c5d6ac into master Feb 27, 2024
11 of 13 checks passed
@maleadt maleadt deleted the tb/jl_method_lookup_by_tt branch February 27, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

methodinstance(foo, types) returns MethodError permanently if method is not defined
1 participant