Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme + KernelAbstractions: KA syntax changes and differentiating multiple kernels #896

Closed
p-hss opened this issue Jun 6, 2023 · 9 comments

Comments

@p-hss
Copy link

p-hss commented Jun 6, 2023

Hi, I have two questions regarding the use of Enzyme with KernelAbstractions.

(1) How can we differentiate a kernel with the new KernelAbstractions version (>=0.9.4)? An example in the old syntax would be:

using Enzyme, Test,  KernelAbstractions, KernelGradients

@kernel function f_kernel!(x::AbstractArray) 
    i = @index(Global)
    x[i] = x[i]*x[i]
end

x = Array([2.0, 2.0])
dx = Array([0.0, 1.0])

dev = get_device(x)
n = dev isa GPU ? 256 : 4

∇! = autodiff(f_kernel!(dev, n))
wait(∇!(Duplicated(x, dx); ndrange=size(x)))

@test dx  Array([0.0, 4.0])  # works!

It is not clear to me how this can be done with the new syntax and the following does not work:

using Test, Enzyme, KernelAbstractions #Enzyme v0.11.1, KernelAbstractions v0.9.4

@kernel function f_kernel!(x::AbstractArray) 
    i = @index(Global)
    x[i] = x[i]*x[i]
end

x = Array([2.0, 2.0])
dx = Array([0.0, 1.0])

backend = get_backend(x)
∇! = autodiff(f_kernel!(backend), Duplicated(x, dx))

Gives the error:

MethodError: no method matching autodiff(::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_f_kernel!)}, ::Duplicated{Vector{Float64}})

Closest candidates are:
  autodiff(::EnzymeCore.ForwardMode, ::FA, ::Type{A}, ::Any...) where {FA<:EnzymeCore.Annotation, A<:EnzymeCore.Annotation}
   @ Enzyme ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:285
  autodiff(::EnzymeCore.ReverseMode{ReturnPrimal}, ::FA, ::Type{A}, ::Any...) where {FA<:EnzymeCore.Annotation, A<:EnzymeCore.Annotation, ReturnPrimal}
   @ Enzyme ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:170
  autodiff(::CMode, ::FA, ::Any...) where {FA<:EnzymeCore.Annotation, CMode<:EnzymeCore.Mode}
   @ Enzyme ~/.julia/packages/Enzyme/YBQJk/src/Enzyme.jl:222
  ...

Stacktrace:
 [1] top-level scope
   @ In[6]:9

(2) How can a wrapper function for a kernel be differentiated? Which might also include multiple kernels, and how derivatives of (multiple) kernels can be composed with Enzyme, so that derivatives of larger models on GPU can be written. E.g. a wrapper function might look something like this (old KA syntax):

function f_wrap!(x::AbstractArray)
    
    dev = get_device(x)
    n = dev isa GPU ? 256 : 4
    
    kernel! = f_kernel!(dev, n)
    kernel!(x, ndrange=size(x))
    kernel!(x, ndrange=size(x)) # this could be a different kernel as well
  
end

x = Array([2.0, 2.0])
dx = Array([0.0, 1.0])

∇! = autodiff(f_wrap!, Duplicated(x, dx))

which throws the error:

AssertionError: waitfn !== nothing

Stacktrace:
  [1] enq_work_rev(B::Ptr{LLVM.API.LLVMOpaqueBuilder}, OrigCI::Ptr{LLVM.API.LLVMOpaqueValue}, gutils::Ptr{Nothing}, tape::Ptr{LLVM.API.LLVMOpaqueValue})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:2276
  [2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{Nothing}, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/packages/Enzyme/2M9tI/src/api.jl:123
  [3] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(f_wrap!), Tuple{Vector{Float64}}}}, mod::LLVM.Module, primalf::LLVM.Function, adjoint::GPUCompiler.FunctionSpec{typeof(f_wrap!), Tuple{Duplicated{Vector{Float64}}}}, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, dupClosure::Bool, wrap::Bool, modifiedBetween::Bool, returnPrimal::Bool, jlrules::Vector{String})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:5049
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(f_wrap!), Tuple{Vector{Float64}}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, ctx::LLVM.Context, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6242
  [5] _thunk
    @ ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6729 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(f_wrap!), Tuple{Vector{Float64}}}})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6723
  [7] cached_compilation(job::GPUCompiler.CompilerJob, key::UInt64, specid::UInt64)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6767
  [8] #s879#169
    @ ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6827 [inlined]
  [9] var"#s879#169"(F::Any, Fn::Any, DF::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, specid::Any, ReturnPrimal::Any, ShadowInit::Any, ::Any, #unused#::Type, f::Any, df::Any, #unused#::Type, tt::Any, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Type, #unused#::Any)
    @ Enzyme.Compiler ./none:0
 [10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
    @ Core ./boot.jl:582
 [11] thunk
    @ ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6860 [inlined]
 [12] thunk (repeats 3 times)
    @ ~/.julia/packages/Enzyme/2M9tI/src/compiler.jl:6853 [inlined]
 [13] autodiff
    @ ~/.julia/packages/Enzyme/2M9tI/src/Enzyme.jl:211 [inlined]
 [14] autodiff
    @ ~/.julia/packages/Enzyme/2M9tI/src/Enzyme.jl:248 [inlined]
 [15] autodiff(f::typeof(f_wrap!), args::Duplicated{Vector{Float64}})
    @ Enzyme ~/.julia/packages/Enzyme/2M9tI/src/Enzyme.jl:354
 [16] top-level scope
    @ In[20]:4

Any help would be very much appreciated!

@vchuravy
Copy link
Member

vchuravy commented Jun 6, 2023

For 0.9 we deprecated KernelGradients (albeit it shouldn't be impossible to add it back)

Long-term our intent is to finish JuliaGPU/KernelAbstractions.jl#382 and allow you to differentiate directly using the normal Enzyme API

@vchuravy
Copy link
Member

vchuravy commented Jun 6, 2023

CC: @michel2323 @sriharikrishna

@p-hss
Copy link
Author

p-hss commented Jun 9, 2023

Thanks a lot for the quick clarifications!! Are there options besides KernelAbstractions to make Enzyme work on GPUs with a vectorized code? And can you already give a very rough time estimate for the long-term plan to support KA? Thanks!

@wsmoses
Copy link
Member

wsmoses commented Jun 10, 2023

Honestly it's just a matter of the linked PR getting merged into KA. That itself shouldn't be hard -- probably just a matter of @vchuravy and I sitting down together. The one potential blocker is that we've been waiting to cut a new release of Enzyme (which may be necessary) until we figure out the GC issue currently in CI.

Sadly we've failed to reproduce that on any local systems, but some folks from Pernosco are looking to add their amazing omniscient debugger to our repo CI, which we've historically used locally to debug such issues.

My guess is ~2 months, with the above reasons being the actual blockers that I'm just guessing re timeline on.

@vchuravy
Copy link
Member

I will also point out that KernelGradients was never more than https://github.com/JuliaGPU/KernelAbstractions.jl/blob/release-0.8/lib/KernelGradients/src/KernelGradients.jl and I think that definition should mostly still work.

@p-hss
Copy link
Author

p-hss commented Jun 12, 2023

Okay I see. Thanks a lot again, that's very helpful to know.

@michel2323
Copy link
Collaborator

CC: @jlk9

@wsmoses
Copy link
Member

wsmoses commented Jul 12, 2023

FYI the Forward mode (for any backend) EnzymeRules for KA as landed, as well as CPU reverse mode

@wsmoses
Copy link
Member

wsmoses commented Jul 12, 2023

As the remaining reverse mode backends has a KA issue (JuliaGPU/KernelAbstractions.jl#408), closing here

@wsmoses wsmoses closed this as completed Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants