Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal CUDA version for Lux Dense Layer #1442

Closed
avik-pal opened this issue May 13, 2024 · 11 comments
Closed

Minimal CUDA version for Lux Dense Layer #1442

avik-pal opened this issue May 13, 2024 · 11 comments

Comments

@avik-pal
Copy link
Contributor

avik-pal commented May 13, 2024

using CUDA, Enzyme

w = cu(rand(Float32, 10, 10))
b = cu(rand(Float32, 10))
x = cu(rand(Float32, 10, 10))
act = x -> max(0f0, x)

function loss_function(act, w, x, b)
    return sum(abs2, act.(muladd(w, x, b)))
    # return sum(abs2, fused_dense_bias_activation(act, w, x, b))
end

begin
    dw = zero(w)
    dx = zero(x)
    db = zero(b)
    
    Enzyme.autodiff(Reverse, loss_function, Active, Const(act), Duplicated(w, dw), Duplicated(x, dx), Duplicated(b, db))
end

Fails with

ERROR: LLVM error: function failed verification (4)
Stacktrace:
  [1] handle_error(reason::Cstring)
    @ LLVM ~/.julia/packages/LLVM/bzSzE/src/core/context.jl:168
  [2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{…}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…}, augmented::Ptr{…}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/packages/Enzyme/2FwRI/src/api.jl:154
  [3] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:3177
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5070
  [5] codegen
    @ ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:4477 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5755
  [7] _thunk
    @ ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5755 [inlined]
  [8] cached_compilation
    @ ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5793 [inlined]
  [9] (::Enzyme.Compiler.var"#554#555"{})(ctx::LLVM.Context)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5859
 [10] JuliaContext(f::Enzyme.Compiler.var"#554#555"{}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
 [11] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
 [12] #s2027#553
    @ ~/.julia/packages/Enzyme/2FwRI/src/compiler.jl:5811 [inlined]
 [13] 
    @ Enzyme.Compiler ./none:0
 [14] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [15] autodiff
    @ ~/.julia/packages/Enzyme/2FwRI/src/Enzyme.jl:286 [inlined]
 [16] autodiff(::ReverseMode{…}, ::typeof(loss_function), ::Type{…}, ::Const{…}, ::Duplicated{…}, ::Duplicated{…}, ::Duplicated{…})
    @ Enzyme ~/.julia/packages/Enzyme/2FwRI/src/Enzyme.jl:303
 [17] top-level scope
    @ /mnt/research/ongoing/lux/enzyme_cuda.jl:18
Some type information was truncated. Use `show(err)` to see complete types

and a very long LLVM dump.

Details

JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
┌ Warning: The Pkg REPL mode is intended for interactive use only, and should not be used from scripts. It is recommended to use the functional API instead.
└ @ Pkg.REPLMode ~/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/share/julia/stdlib/v1.10/Pkg/src/REPLMode/REPLMode.jl:382
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10), {} addrspace(10)** inttoptr (i64 138527901362528 to {} addrspace(10)**) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)
, {} addrspace(10)** inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] loss_function
│ @ /mnt/research/ongoing/lux/enzyme_cuda.jl:9
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 138527901362528 to {} addrspace(10)**) unordered, align 8, !dbg !439, !tbaa !440, !alias.scope !431, !noalias !434
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
ERROR: LoadError: LLVM error: Failed to materialize symbols: { (JuliaExternal, { libname_cuCtxSetCurrent_40759, libname_cuMemHostAlloc_38650, libname_cuDeviceCanAccessPeer_41778, libname_cuCtxSetCurrent_40231, libname_cuStreamSynchronize_40575, libname_cuCtxSetCurrent_41988, libname_cuStreamGetCaptureInfo_41984, libname_cuCtxGetId_39658, libname_cublasGetProperty_39361, libname_cuMemPoolCreate_40279, libname_cuCtxPopCurrent_v2_40732, libname_cuCtxSetCurrent_41463, libname_cuMemAllocFromPoolAsync_38536, libname_cuMemPoolCreate_41596, libname_cuStreamQuery_41973, libname_cuCtxGetId_37318, libname_cuCtxPushCurrent_v2_40729, libname_cuMemPoolTrimTo_39311, libname_cuMemPoolSetAttribute_40353, libname_cublasSsymm_v2_64_39307, libname_cuDeviceGetAttribute_40201, libname_cuMemPoolCreate_40364, libname_cudaRuntimeGetVersion_39775, libname_cuCtxGetCurrent_40830, libname_cuMemAlloc_v2_38538, libname_cuMemsetD32Async_39068, libname_cuMemsetD32Async_39556, libname_cuCtxSetCurrent_40756, libname_cuMemPoolTrimTo_38488, libname_cuDeviceGet_41449, libname_cuStreamQuery_40740, libname_cuStreamGetCaptureInfo_40752, libname_cuStreamSynchronize_37225, libname_cuDeviceGetName_41015, libname_cuCtxSynchronize_38500, libname_cuCtxGetDevice_41967, libname_cuStreamCreate_41741, libname_cuMemGetInfo_v2_38102, libname_cuMemPoolGetAttribute_39590, libname_cuCtxGetCurrent_37272, libname_cuCtxGetApiVersion_41478, libname_cuCtxGetId_41476, libname_cuDeviceSetMemPool_38018, libname_cuLaunchKernel_38608, libname_cuOccupancyMaxPotentialBlockSize_38792, libname_cublasSgemm_v2_64_39435, libname_cuStreamSynchronize_37220, libname_cuMemGetInfo_v2_40442, libname_cuStreamSynchronize_41807, libname_cuDeviceGet_37877, libname_cuMemAlloc_v2_38530, libname_cuMemcpyDtoHAsync_v2_38437, libname_cuDevicePrimaryCtxRetain_37310, libname_cuDeviceCanAccessPeer_38375, libname_cuCtxSynchronize_38494, libname_cublasGetProperty_39352, libname_cuCtxGetDevice_38330, libname_cuPointerGetAttribute_38405, libname_cuStreamCreate_38169, libname_cuStreamCreate_40509, libname_cuStreamSynchronize_40580, libname_cuCtxSetCurrent_41991, libname_cuMemPoolTrimTo_39526, libname_cuMemPoolCreate_38024, libname_cuDeviceGet_40217, libname_cuDeviceGetAttribute_41433, libname_cuCtxSetCurrent_38409, libname_cuCtxGetCurrent_40285, libname_cuMemPoolTrimTo_39511, libname_cuMemPoolTrimTo_39395, libname_cuStreamQuery_41972, libname_cuCtxGetCurrent_37945, libname_cublasSscal_v2_39494, libname_cuMemPoolTrimTo_39462, libname_cuDeviceGetName_37443, libname_cuDeviceGetCount_39613, libname_cublasSgemm_v2_39442, libname_cuCtxSetCurrent_37891, libname_cuMemAllocAsync_38528, libname_cuMemPoolSetAttribute_38013, libname_cuCtxPushCurrent_v2_41961, libname_cuMemcpyDtoDAsync_v2_38454, libname_cuMemcpyHtoDAsync_v2_38348, libname_cuDeviceGetCount_37287, libname_cublasSetStream_v2_39406, libname_cuMemsetD32Async_40788, libname_cuStreamQuery_38336, libname_cuMemGetInfo_v2_41674, libname_cuMemPoolSetAccess_41767, libname_cuStreamSynchronize_41812, libname_cuMemPoolTrimTo_39365, libname_cuCtxGetDevice_40735, libname_cuCtxSetCurrent_38412, libname_cuStreamGetCaptureInfo_38400, libname_cuDeviceGetAttribute_37861, libname_cuStreamQuery_38335, libname_cuDeviceSetMemPool_40358, libname_cuMemPoolTrimTo_39480, libname_cuMemPoolSetAccess_38364, libname_cuCtxGetId_40890, libname_cublasSscal_v2_64_39523, libname_cudaRuntimeGetVersion_41007, libname_cuCtxGetId_37904, libname_cuMemPoolGetAttribute_37264, libname_cuDeviceGetCount_40845, libname_cublasSetMathMode_39329, libname_cuCtxGetCurrent_41517, libname_cuDeviceGetName_39783, libname_cuDevicePrimaryCtxRetain_40882, libname_cublasSsymm_v2_39283, libname_cublasSgemm_v2_39457, libname_cublasSscal_v2_39507, diffejulia_loss_function_37136wrap, libname_cublasSsymm_v2_64_39261, libname_cuMemPoolCreate_37939, libname_cuCtxPopCurrent_v2_38327, libname_cuMemPoolSetAccess_40535, libname_cuMemPoolCreate_41511, libname_cudaRuntimeGetVersion_37435, libname_cublasSscal_v2_64_39490, libname_cuMemPoolGetAttribute_40822, libname_cuMemPoolSetAttribute_41585, libname_cuMemPoolTrimTo_39288, libname_cuStreamQuery_40741, libname_cublasSetStream_v2_39392, libname_cuCtxGetApiVersion_40246, libname_cublasSetStream_v2_39382, libname_cuCtxSetCurrent_40832, libname_cuCtxGetId_40244, libname_cuCtxSetCurrent_39600, libname_cuDeviceCanAccessPeer_40546, libname_cuMemcpyPeerAsync_38458, libname_cuCtxPopCurrent_v2_41964, libname_cuCtxGetApiVersion_37906, libname_cublasSetMathMode_39340, libname_cuCtxGetCurrent_39598, libname_cublasSsymm_v2_39268, libname_cublasSgemm_v2_64_39476, libname_cuCtxSetCurrent_37274, libname_cuLaunchCooperativeKernel_38605, libname_cuDevicePrimaryCtxRetain_39650, libname_cuDeviceSetMemPool_41590, libname_cuMemPoolTrimTo_39343, libname_cuCtxPushCurrent_v2_38324 }) }
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/LLVM/bzSzE/src/executionengine/utils.jl:32 [inlined]
[2] lookup
@ ~/.julia/packages/LLVM/bzSzE/src/orc.jl:434 [inlined]
[3] lookup
@ ~/.julia/packages/LLVM/bzSzE/src/orc.jl:433 [inlined]
[4] lookup
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler/orcv2.jl:255 [inlined]
[5] _link(job::GPUCompiler.CompilerJob{…}, ::Tuple{…})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5751
[6] cached_compilation
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5811 [inlined]
[7] (::Enzyme.Compiler.var"#562#563"{…})(ctx::LLVM.Context)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5876
[8] JuliaContext(f::Enzyme.Compiler.var"#562#563"{…}; kwargs::@kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[9] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[10] #s2025#561
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5828 [inlined]
[11]
@ Enzyme.Compiler ./none:0
[12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[13] autodiff
@ ~/.julia/packages/Enzyme/NVk8T/src/Enzyme.jl:286 [inlined]
[14] autodiff(::ReverseMode{…}, ::typeof(loss_function), ::Type{…}, ::Const{…}, ::Duplicated{…}, ::Duplicated{…}, ::Duplicated{…})
@ Enzyme ~/.julia/packages/Enzyme/NVk8T/src/Enzyme.jl:303
[15] top-level scope
@ /mnt/research/ongoing/lux/enzyme_cuda.jl:17
Some type information was truncated. Use show(err) to see complete types.
in expression starting at /mnt/research/ongoing/lux/enzyme_cuda.jl:12

@avik-pal
Copy link
Contributor Author

avik-pal commented May 13, 2024

Seems like it could stem from

function sum_loss(x)
    return sum(abs2, x)
end

begin
    x = cu(rand(Float32, 10, 10))
    dx = zero(x)
    Enzyme.autodiff(Reverse, sum_loss, Active, Duplicated(x, dx))
end

which is documented as non-functional, but how can we bypass it? We could write in the pullback form which prevents the reduction and that might work.

Details

JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
JIT session error: Symbols not found: [ cuMemsetD8 ]
┌ Warning: The Pkg REPL mode is intended for interactive use only, and should not be used from scripts. It is recommended to use the functional API instead.
└ @ Pkg.REPLMode ~/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/share/julia/stdlib/v1.10/Pkg/src/REPLMode/REPLMode.jl:382
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10), {} addrspace(10)** inttoptr (i64 134824559520096 to {} addrspace(10)**) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)
, {} addrspace(10)** inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
┌ Error: Found null pointer at

│ Stacktrace:
│ [1] get_repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:63
│ [2] repl_frontend_task
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:56
│ [3] default_scalar_indexing
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:75
│ [4] assertscalar
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:105
│ [5] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:48
│ [6] scalar_getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:34
│ [7] _getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:17
│ [8] getindex
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/indexing.jl:15
│ [9] macro expansion
│ @ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210
│ [10] #_mapreduce#42
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:71
│ [11] _mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:33
│ [12] mapreduce
│ @ ~/.julia/packages/GPUArrays/OKkAu/src/host/mapreduce.jl:28
│ [13] _sum
│ @ ./reducedim.jl:1015
│ [14] sum
│ @ ./reducedim.jl:1011
│ [15] sum_loss
│ @ /mnt/research/ongoing/lux/enzyme_cuda1.jl:4
│ arg = %active_repl.checked = load atomic {} addrspace(10)*, {} addrspace(10)
inttoptr (i64 134824559520096 to {} addrspace(10)**) unordered, align 8, !dbg !367, !tbaa !368, !alias.scope !359, !noalias !362
└ @ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/absint.jl:121
ERROR: LoadError: LLVM error: Failed to materialize symbols: { (JuliaExternal, { libname_cuCtxSetCurrent_34777, libname_cuMemAlloc_v2_32950, libname_cuCtxGetCurrent_33543, libname_cuMemPoolCreate_35456, libname_cuStreamSynchronize_34520, libname_cuMemPoolSetAccess_33207, libname_cuMemPoolCreate_34309, libname_cuMemPoolGetAttribute_31973, libname_cuCtxGetCurrent_34230, libname_cuDeviceGetName_34960, libname_cuCtxSetCurrent_35408, libname_cuStreamQuery_34686, libname_cuCtxSetCurrent_34704, libname_cuMemGetInfo_v2_35619, libname_cuMemsetD32Async_34733, libname_cuStreamSynchronize_32980, libname_cuDeviceGet_32725, libname_cuCtxGetId_33603, libname_cuStreamQuery_35918, libname_cuStreamCreate_34454, diffejulia_sum_loss_31707wrap, libname_cuStreamGetCaptureInfo_35929, libname_cuCtxPushCurrent_v2_35906, libname_cuMemPoolSetAccess_35712, libname_cuCtxSynchronize_32789, libname_cuCtxGetId_35421, libname_cuMemPoolSetAttribute_34298, libname_cudaRuntimeGetVersion_32449, libname_cuCtxGetId_34189, libname_cuCtxGetDevice_32921, libname_cuCtxPushCurrent_v2_34674, libname_cuStreamQuery_32927, libname_cuCtxSetCurrent_34176, libname_cuCtxGetCurrent_31981, libname_cuMemPoolCreate_35541, libname_cuCtxPopCurrent_v2_32918, libname_cuCtxSetCurrent_33401, libname_cuCtxPopCurrent_v2_35909, libname_cuDeviceSetMemPool_34303, libname_cuDeviceGetAttribute_34146, libname_cuMemAllocFromPoolAsync_32956, libname_cuCtxPopCurrent_v2_34677, libname_cuCtxGetDevice_35912, libname_cuStreamSynchronize_35752, libname_cuDevicePrimaryCtxRetain_32339, libname_cuCtxSetCurrent_31983, libname_cudaRuntimeGetVersion_33720, libname_cuMemsetD32Async_33456, libname_cuStreamSynchronize_34525, libname_cuPointerGetAttribute_33397, libname_cuCtxGetId_34835, libname_cuDeviceCanAccessPeer_35723, libname_cuDeviceGet_34162, libname_cuMemGetInfo_v2_32215, libname_cuCtxSetCurrent_35936, libname_cuDeviceGetAttribute_35378, libname_cuDeviceSetMemPool_32108, libname_cuCtxSetCurrent_32739, libname_cuMemAllocAsync_32948, libname_cuMemPoolCreate_32016, libname_cuDeviceGetCount_34790, libname_cuCtxPushCurrent_v2_32915, libname_cuCtxGetApiVersion_34191, libname_cuDeviceGet_35394, libname_cuDeviceCanAccessPeer_34491, libname_cuMemPoolCreate_32114, libname_cuMemPoolSetAttribute_32103, libname_cuStreamCreate_35686, libname_cuMemcpyDtoHAsync_v2_33391, libname_cuCtxSetCurrent_34701, libname_cuMemsetD32Async_33501, libname_cuMemAlloc_v2_32958, libname_cuStreamGetCaptureInfo_33243, libname_cuMemGetInfo_v2_34387, libname_cuCtxGetId_32030, libname_cuMemHostAlloc_33053, libname_cuDeviceGetName_33728, libname_cuCtxSetCurrent_33545, libname_cuDeviceGetCount_33558, libname_cuCtxSynchronize_32783, libname_cuStreamQuery_32926, libname_cuCtxGetId_32752, libname_cuCtxGetCurrent_35462, libname_cuCtxGetCurrent_32022, libname_cuCtxGetDevice_34680, libname_cuCtxGetApiVersion_32754, libname_cudaRuntimeGetVersion_34952, libname_cuMemPoolGetAttribute_33535, libname_cuStreamQuery_35917, libname_cuStreamGetCaptureInfo_34697, libname_cuDeviceGetCount_31939, libname_cuDevicePrimaryCtxRetain_34827, libname_cuMemPoolSetAttribute_35530, libname_cuOccupancyMaxPotentialBlockSize_33265, libname_cuStreamSynchronize_35757, libname_cuStreamSynchronize_32975, libname_cuDeviceSetMemPool_35535, libname_cuMemPoolSetAccess_34480, libname_cuCtxGetCurrent_34775, libname_cuDeviceGetName_32455, libname_cuDeviceCanAccessPeer_33218, libname_cuCtxSetCurrent_35933, libname_cuCtxGetApiVersion_35423, libname_cuCtxSetCurrent_33404, libname_cuMemPoolTrimTo_32777, libname_cuMemPoolCreate_34224, libname_cuDeviceGetAttribute_32277, libname_cuStreamCreate_32305, libname_cuMemPoolGetAttribute_34767, libname_cuDevicePrimaryCtxRetain_33595, libname_cuStreamQuery_34685 }) }
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/LLVM/bzSzE/src/executionengine/utils.jl:32 [inlined]
[2] lookup
@ ~/.julia/packages/LLVM/bzSzE/src/orc.jl:434 [inlined]
[3] lookup
@ ~/.julia/packages/LLVM/bzSzE/src/orc.jl:433 [inlined]
[4] lookup
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler/orcv2.jl:255 [inlined]
[5] _link(job::GPUCompiler.CompilerJob{…}, ::Tuple{…})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5751
[6] cached_compilation
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5811 [inlined]
[7] (::Enzyme.Compiler.var"#562#563"{…})(ctx::LLVM.Context)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5876
[8] JuliaContext(f::Enzyme.Compiler.var"#562#563"{…}; kwargs::@kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
[9] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
[10] #s2025#561
@ ~/.julia/packages/Enzyme/NVk8T/src/compiler.jl:5828 [inlined]
[11]
@ Enzyme.Compiler ./none:0
[12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[13] autodiff
@ ~/.julia/packages/Enzyme/NVk8T/src/Enzyme.jl:286 [inlined]
[14] autodiff(mode::ReverseMode{false, FFIABI, false}, f::typeof(sum_loss), ::Type{Active}, args::Duplicated{CuArray{…}})
@ Enzyme ~/.julia/packages/Enzyme/NVk8T/src/Enzyme.jl:303
[15] top-level scope
@ /mnt/research/ongoing/lux/enzyme_cuda1.jl:10
Some type information was truncated. Use show(err) to see complete types.
in expression starting at /mnt/research/ongoing/lux/enzyme_cuda1.jl:7

@wsmoses
Copy link
Member

wsmoses commented May 13, 2024

Apparently the differentiation of CUmemPoolProps_st

@wsmoses
Copy link
Member

wsmoses commented May 13, 2024

@avik-pal with all of these issues can you upload the full error log?

@avik-pal
Copy link
Contributor Author

Added both the stacktraces with CUDA, cuDNN and Enzyme master

@wsmoses
Copy link
Member

wsmoses commented May 17, 2024

Oh cool so now with master your last one (#1442 (comment)) now successfully differentiates and hits a GPUCompiler / LLVM.jl / LLJIT related question that I may need some assist/explanation from @vchuravy / @maleadt

So quick tldr. The original code we AD has say a cublasgemm inside it. We take that code via gpucompiler and also inject a cudamemset call to it. then send it to our custom jit infra to deal compile/run (in 1.10+ thats now the lljit stuff @gbaraldi did)

Ignore the nullptr stuff for now.

The JIT now complains it can't find cuMemsetD8 (we specificallt don't call that we call cudaMemset, but presumably somewhere something says that should be called instead maybe or maybe cuda.jl itself calls it). Either way somehow we're not loading the symbols cuda needs (either from the ones we added and/or the ones from the original cuda program).

Any thoughts?

@wsmoses
Copy link
Member

wsmoses commented May 17, 2024

Basically as soon as we fix this we seem to successfully differentiate through cublasgemm (since at least generating the llvm module of the derivative has now succeeded!)

@wsmoses
Copy link
Member

wsmoses commented May 17, 2024

In these cases Enzyme clearly is not the originator of all these symbols:

ERROR: LoadError: LLVM error: Failed to materialize symbols: { (JuliaExternal, { libname_cuCtxSetCurrent_40759, libname_cuMemHostAlloc_38650, libname_cuDeviceCanAccessPeer_41778, libname_cuCtxSetCurrent_40231, libname_cuStreamSynchronize_40575, libname_cuCtxSetCurrent_41988, libname_cuStreamGetCaptureInfo_41984, libname_cuCtxGetId_39658, libname_cublasGetProperty_39361, libname_cuMemPoolCreate_40279, libname_cuCtxPopCurrent_v2_40732, libname_cuCtxSetCurrent_41463, libname_cuMemAllocFromPoolAsync_38536, libname_cuMemPoolCreate_41596, libname_cuStreamQuery_41973, libname_cuCtxGetId_37318, libname_cuCtxPushCurrent_v2_40729, libname_cuMemPoolTrimTo_39311, libname_cuMemPoolSetAttribute_40353, libname_cublasSsymm_v2_64_39307, libname_cuDeviceGetAttribute_40201, libname_cuMemPoolCreate_40364, libname_cudaRuntimeGetVersion_39775, libname_cuCtxGetCurrent_40830, libname_cuMemAlloc_v2_38538, libname_cuMemsetD32Async_39068, libname_cuMemsetD32Async_39556, libname_cuCtxSetCurrent_40756, libname_cuMemPoolTrimTo_38488, libname_cuDeviceGet_41449, libname_cuStreamQuery_40740, libname_cuStreamGetCaptureInfo_40752, libname_cuStreamSynchronize_37225, libname_cuDeviceGetName_41015, libname_cuCtxSynchronize_38500, libname_cuCtxGetDevice_41967, libname_cuStreamCreate_41741, libname_cuMemGetInfo_v2_38102, libname_cuMemPoolGetAttribute_39590, libname_cuCtxGetCurrent_37272, libname_cuCtxGetApiVersion_41478, libname_cuCtxGetId_41476, libname_cuDeviceSetMemPool_38018, libname_cuLaunchKernel_38608, libname_cuOccupancyMaxPotentialBlockSize_38792, libname_cublasSgemm_v2_64_39435, libname_cuStreamSynchronize_37220, libname_cuMemGetInfo_v2_40442, libname_cuStreamSynchronize_41807, libname_cuDeviceGet_37877, libname_cuMemAlloc_v2_38530, libname_cuMemcpyDtoHAsync_v2_38437, libname_cuDevicePrimaryCtxRetain_37310, libname_cuDeviceCanAccessPeer_38375, libname_cuCtxSynchronize_38494, libname_cublasGetProperty_39352, libname_cuCtxGetDevice_38330, libname_cuPointerGetAttribute_38405, libname_cuStreamCreate_38169, libname_cuStreamCreate_40509, libname_cuStreamSynchronize_40580, libname_cuCtxSetCurrent_41991, libname_cuMemPoolTrimTo_39526, libname_cuMemPoolCreate_38024, libname_cuDeviceGet_40217, libname_cuDeviceGetAttribute_41433, libname_cuCtxSetCurrent_38409, libname_cuCtxGetCurrent_40285, libname_cuMemPoolTrimTo_39511, libname_cuMemPoolTrimTo_39395, libname_cuStreamQuery_41972, libname_cuCtxGetCurrent_37945, libname_cublasSscal_v2_39494, libname_cuMemPoolTrimTo_39462, libname_cuDeviceGetName_37443, libname_cuDeviceGetCount_39613, libname_cublasSgemm_v2_39442, libname_cuCtxSetCurrent_37891, libname_cuMemAllocAsync_38528, libname_cuMemPoolSetAttribute_38013, libname_cuCtxPushCurrent_v2_41961, libname_cuMemcpyDtoDAsync_v2_38454, libname_cuMemcpyHtoDAsync_v2_38348, libname_cuDeviceGetCount_37287, libname_cublasSetStream_v2_39406, libname_cuMemsetD32Async_40788, libname_cuStreamQuery_38336, libname_cuMemGetInfo_v2_41674, libname_cuMemPoolSetAccess_41767, libname_cuStreamSynchronize_41812, libname_cuMemPoolTrimTo_39365, libname_cuCtxGetDevice_40735, libname_cuCtxSetCurrent_38412, libname_cuStreamGetCaptureInfo_38400, libname_cuDeviceGetAttribute_37861, libname_cuStreamQuery_38335, libname_cuDeviceSetMemPool_40358, libname_cuMemPoolTrimTo_39480, libname_cuMemPoolSetAccess_38364, libname_cuCtxGetId_40890, libname_cublasSscal_v2_64_39523, libname_cudaRuntimeGetVersion_41007, libname_cuCtxGetId_37904, libname_cuMemPoolGetAttribute_37264, libname_cuDeviceGetCount_40845, libname_cublasSetMathMode_39329, libname_cuCtxGetCurrent_41517, libname_cuDeviceGetName_39783, libname_cuDevicePrimaryCtxRetain_40882, libname_cublasSsymm_v2_39283, libname_cublasSgemm_v2_39457, libname_cublasSscal_v2_39507, diffejulia_loss_function_37136wrap, libname_cublasSsymm_v2_64_39261, libname_cuMemPoolCreate_37939, libname_cuCtxPopCurrent_v2_38327, libname_cuMemPoolSetAccess_40535, libname_cuMemPoolCreate_41511, libname_cudaRuntimeGetVersion_37435, libname_cublasSscal_v2_64_39490, libname_cuMemPoolGetAttribute_40822, libname_cuMemPoolSetAttribute_41585, libname_cuMemPoolTrimTo_39288, libname_cuStreamQuery_40741, libname_cublasSetStream_v2_39392, libname_cuCtxGetApiVersion_40246, libname_cublasSetStream_v2_39382, libname_cuCtxSetCurrent_40832, libname_cuCtxGetId_40244, libname_cuCtxSetCurrent_39600, libname_cuDeviceCanAccessPeer_40546, libname_cuMemcpyPeerAsync_38458, libname_cuCtxPopCurrent_v2_41964, libname_cuCtxGetApiVersion_37906, libname_cublasSetMathMode_39340, libname_cuCtxGetCurrent_39598, libname_cublasSsymm_v2_39268, libname_cublasSgemm_v2_64_39476, libname_cuCtxSetCurrent_37274, libname_cuLaunchCooperativeKernel_38605, libname_cuDevicePrimaryCtxRetain_39650, libname_cuDeviceSetMemPool_41590, libname_cuMemPoolTrimTo_39343, libname_cuCtxPushCurrent_v2_38324 }) }

@wsmoses
Copy link
Member

wsmoses commented Jul 31, 2024

Now hitting a type tree issue not deducing types for:

julia> obj(129876066796880)
Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}}
ERROR: Enzyme execution failed.
Enzyme cannot deduce type
Current scope: 
; Function Attrs: mustprogress willreturn
define "enzyme_type"="{[0]:Pointer, [8]:Pointer, [8,0]:Pointer, [8,8]:Pointer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [16]:Pointer, [16,0]:Integer, [16,1]:Integer, [16,2]:Integer, [16,3]:Integer, [16,4]:Integer, [16,5]:Integer, [16,6]:Integer, [16,7]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Pointer, [16,32,-1]:Integer, [16,40]:Pointer, [16,40,-1]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer}" "enzymejl_parmtype"="129868876166928" "enzymejl_parmtype_ref"="1" { { i64, {} addrspace(10)* }, { i64, i32 } } @preprocess_julia__gpu_call_1_36685_inner.1({} addrspace(10)* nocapture nofree noundef nonnull readnone align 8 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Pointer, [-1,0,0,0,0]:Integer, [-1,0,0,0,1]:Integer, [-1,0,0,0,2]:Integer, [-1,0,0,0,3]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,16]:Integer, [-1,0,0,0,17]:Integer, [-1,0,0,0,18]:Integer, [-1,0,0,0,19]:Integer, [-1,0,0,0,20]:Integer, [-1,0,0,0,21]:Integer, [-1,0,0,0,22]:Integer, [-1,0,0,0,23]:Integer, [-1,0,0,0,32]:Integer, [-1,0,0,0,33]:Integer, [-1,0,0,0,34]:Integer, [-1,0,0,0,35]:Integer, [-1,0,0,0,36]:Integer, [-1,0,0,0,37]:Integer, [-1,0,0,0,38]:Integer, [-1,0,0,0,39]:Integer, [-1,0,0,0,40]:Integer, [-1,0,0,0,48]:Pointer, [-1,0,0,0,48,0]:Pointer, [-1,0,0,0,48,8]:Integer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,16,-1]:Integer, [-1,0,8]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="129876071124560" "enzymejl_parmtype_ref"="2" %0, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="129875927902912" "enzymejl_parmtype_ref"="0" %1, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="129875927902912" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* noundef nonnull align 8 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Pointer, [-1,0,0,0,0]:Integer, [-1,0,0,0,1]:Integer, [-1,0,0,0,2]:Integer, [-1,0,0,0,3]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,16]:Integer, [-1,0,0,0,17]:Integer, [-1,0,0,0,18]:Integer, [-1,0,0,0,19]:Integer, [-1,0,0,0,20]:Integer, [-1,0,0,0,21]:Integer, [-1,0,0,0,22]:Integer, [-1,0,0,0,23]:Integer, [-1,0,0,0,32]:Integer, [-1,0,0,0,33]:Integer, [-1,0,0,0,34]:Integer, [-1,0,0,0,35]:Integer, [-1,0,0,0,36]:Integer, [-1,0,0,0,37]:Integer, [-1,0,0,0,38]:Integer, [-1,0,0,0,39]:Integer, [-1,0,0,0,40]:Integer, [-1,0,0,0,48]:Pointer, [-1,0,0,0,48,0]:Pointer, [-1,0,0,0,48,8]:Integer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,16,-1]:Integer, [-1,0,8]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="129876071124560" "enzymejl_parmtype_ref"="2" %3, [2 x [1 x i64]] "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="129875904414848" "enzymejl_parmtype_ref"="0" %4, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="129875927902912" "enzymejl_parmtype_ref"="0" %5, { {} addrspace(10)*, [1 x i8], [1 x i64] } "enzyme_type"="{[0]:Pointer, [0,0]:Pointer, [0,0,0]:Pointer, [0,0,0,-1]:Pointer, [0,0,0,0,0]:Integer, [0,0,0,0,1]:Integer, [0,0,0,0,2]:Integer, [0,0,0,0,3]:Integer, [0,0,0,0,8]:Pointer, [0,0,0,0,16]:Integer, [0,0,0,0,17]:Integer, [0,0,0,0,18]:Integer, [0,0,0,0,19]:Integer, [0,0,0,0,20]:Integer, [0,0,0,0,21]:Integer, [0,0,0,0,22]:Integer, [0,0,0,0,23]:Integer, [0,0,0,0,32]:Integer, [0,0,0,0,33]:Integer, [0,0,0,0,34]:Integer, [0,0,0,0,35]:Integer, [0,0,0,0,36]:Integer, [0,0,0,0,37]:Integer, [0,0,0,0,38]:Integer, [0,0,0,0,39]:Integer, [0,0,0,0,40]:Integer, [0,0,0,0,48]:Pointer, [0,0,0,0,48,0]:Pointer, [0,0,0,0,48,8]:Integer, [0,0,0,0,56]:Integer, [0,0,0,0,57]:Integer, [0,0,0,16,-1]:Integer, [0,0,8]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [8]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer}" "enzymejl_parmtype"="129876066796880" "enzymejl_parmtype_ref"="0" %6) local_unnamed_addr #317 !dbg !17759 {
entry:
  %7 = call {}*** @julia.get_pgcstack()
  %8 = call {}*** @julia.get_pgcstack()
  %9 = bitcast {}*** %8 to {}**
  %10 = getelementptr inbounds {}*, {}** %9, i64 -14
  %11 = getelementptr inbounds {}*, {}** %10, i64 16
  %12 = bitcast {}** %11 to i8**
  %13 = load i8*, i8** %12, align 8
  %14 = call noalias nonnull dereferenceable(32) dereferenceable_or_null(32) {} addrspace(10)* @julia.gc_alloc_obj({}** %10, i64 32, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129867219910160 to {}*) to {} addrspace(10)*)), !enzyme_fromstack !618
  call void @zeroType({} addrspace(10)* %14, i8 0, i64 32), !enzyme_zerostack !0
  %15 = bitcast {} addrspace(10)* %14 to { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)*, !enzyme_caststack !0
  %16 = call noalias nonnull dereferenceable(7) dereferenceable_or_null(7) i8* @malloc(i64 7), !enzyme_fromstack !310
  %args.i.sroa.26 = bitcast i8* %16 to [7 x i8]*, !enzyme_caststack !0
  %17 = bitcast {}*** %7 to {}**
  %18 = getelementptr inbounds {}*, {}** %17, i64 -14
  %19 = getelementptr inbounds {}*, {}** %18, i64 16
  %20 = bitcast {}** %19 to i8**
  %21 = load i8*, i8** %20, align 8
  %22 = call noalias nonnull dereferenceable(8) dereferenceable_or_null(8) {} addrspace(10)* @julia.gc_alloc_obj({}** %18, i64 8, {} addrspace(10)* @ejl_enz_any_array_1), !enzyme_fromstack !618
  call void @zeroType.135({} addrspace(10)* %22, i8 0, i64 8), !enzyme_zerostack !0
  %23 = bitcast {} addrspace(10)* %22 to [1 x {} addrspace(10)*] addrspace(10)*, !enzyme_caststack !0
  %args.i.sroa.26.0.sroa_idx36 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0
  %args.i.sroa.26.0.sroa_idx2 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0
  call void @llvm.lifetime.start.p0i8(i64 7, i8* %args.i.sroa.26.0.sroa_idx2)
  %24 = bitcast { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15 to i8 addrspace(10)*
  %25 = addrspacecast i8 addrspace(10)* %24 to i8*
  call void @llvm.lifetime.start.p0i8(i64 noundef 32, i8* noundef nonnull align 8 dereferenceable(32) %25) #320
  %26 = bitcast [1 x {} addrspace(10)*] addrspace(10)* %23 to i8 addrspace(10)*
  %27 = addrspacecast i8 addrspace(10)* %26 to i8*
  call void @llvm.lifetime.start.p0i8(i64 noundef 8, i8* noundef nonnull align 8 dereferenceable(8) %27) #320
  %28 = call {}*** @julia.get_pgcstack() #320, !noalias !17760
  %current_task2.i27 = getelementptr inbounds {}**, {}*** %28, i64 -14
  %current_task2.i = bitcast {}*** %current_task2.i27 to {}**
  %ptls_field.i28 = getelementptr inbounds {}**, {}*** %28, i64 2
  %29 = bitcast {}*** %ptls_field.i28 to i64***
  %ptls_load.i2930 = load i64**, i64*** %29, align 8, !tbaa !295, !noalias !17760
  %30 = getelementptr inbounds i64*, i64** %ptls_load.i2930, i64 2
  %safepoint.i = load i64*, i64** %30, align 8, !tbaa !299, !noalias !17760
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint.i) #320, !dbg !17764, !noalias !17760
  fence syncscope("singlethread") seq_cst
  %31 = icmp slt i64 %1, 1, !dbg !17766
  br i1 %31, label %L34.i, label %L9.i, !dbg !17769

L9.i:                                             ; preds = %entry
  %32 = icmp slt i64 %2, 1, !dbg !17770
  br i1 %32, label %L31.i, label %julia__gpu_call_1_36685_inner.exit, !dbg !17773

L31.i:                                            ; preds = %L9.i
  %33 = call fastcc [1 x {} addrspace(10)*] @julia_AssertionError_36691({} addrspace(10)* nofree noundef nonnull readonly align 16 addrspacecast ({}* inttoptr (i64 129871334054032 to {}*) to {} addrspace(10)*)) #321, !dbg !17773
  %box17.i = call noalias nonnull dereferenceable(8) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875856083216 to {}*) to {} addrspace(10)*)) #322, !dbg !17773
  %34 = bitcast {} addrspace(10)* %box17.i to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !17773
  %35 = extractvalue [1 x {} addrspace(10)*] %33, 0, !dbg !17773
  %36 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %34, i64 0, i64 0, !dbg !17773
  store {} addrspace(10)* %35, {} addrspace(10)* addrspace(10)* %36, align 8, !dbg !17773, !tbaa !1024, !alias.scope !320, !noalias !17774
  %37 = addrspacecast {} addrspace(10)* %box17.i to {} addrspace(12)*, !dbg !17773
  call void @ijl_throw({} addrspace(12)* %37) #323, !dbg !17773, !noalias !17760
  unreachable, !dbg !17773

L34.i:                                            ; preds = %entry
  %38 = call fastcc [1 x {} addrspace(10)*] @julia_AssertionError_36691({} addrspace(10)* nofree noundef nonnull readonly align 16 addrspacecast ({}* inttoptr (i64 129871334054064 to {}*) to {} addrspace(10)*)) #320, !dbg !17769
  %box19.i = call noalias nonnull dereferenceable(8) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875856083216 to {}*) to {} addrspace(10)*)) #322, !dbg !17769
  %39 = bitcast {} addrspace(10)* %box19.i to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !17769
  %40 = extractvalue [1 x {} addrspace(10)*] %38, 0, !dbg !17769
  %41 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %39, i64 0, i64 0, !dbg !17769
  store {} addrspace(10)* %40, {} addrspace(10)* addrspace(10)* %41, align 8, !dbg !17769, !tbaa !1024, !alias.scope !320, !noalias !17774
  %42 = addrspacecast {} addrspace(10)* %box19.i to {} addrspace(12)*, !dbg !17769
  call void @ijl_throw({} addrspace(12)* %42) #323, !dbg !17769, !noalias !17760
  unreachable, !dbg !17769

julia__gpu_call_1_36685_inner.exit:               ; preds = %L9.i
  %.fca.3.2.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 2, 0, !enzyme_inactive !0
  %.fca.3.1.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 1, 0, !enzyme_inactive !0
  %.fca.3.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 0
  %.fca.1.1.0.extract = extractvalue [2 x [1 x i64]] %4, 1, 0, !enzyme_inactive !0
  %.fca.1.0.0.extract = extractvalue [2 x [1 x i64]] %4, 0, 0, !enzyme_inactive !0
  %43 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nofree nonnull align 8 dereferenceable(40) %3, {} addrspace(10)* %.fca.3.0.extract) #320, !dbg !17777
  call fastcc void @julia_unsafe_convert_38074({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(40) %3) #320, !dbg !17781, !noalias !17760
  %44 = icmp ne {} addrspace(10)* %.fca.3.0.extract, null
  call void @llvm.assume(i1 noundef %44) #320
  call fastcc void @julia_unsafe_convert_37835({} addrspace(10)* nocapture noundef nonnull readonly align 8 dereferenceable(32) %.fca.3.0.extract) #320, !dbg !17788, !noalias !17760
  %45 = addrspacecast { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15 to { { i64, {} addrspace(10)* }, { i64, i32 } }*, !dbg !17797
  %46 = addrspacecast [1 x {} addrspace(10)*] addrspace(10)* %23 to [1 x {} addrspace(10)*]*, !dbg !17797
  call void @julia_cufunction_36693({ { i64, {} addrspace(10)* }, { i64, i32 } }* noalias nocapture nofree noundef nonnull writeonly sret({ { i64, {} addrspace(10)* }, { i64, i32 } }) align 8 dereferenceable(32) %45, [1 x {} addrspace(10)*]* noalias nocapture nofree noundef nonnull writeonly align 8 dereferenceable(8) "enzymejl_returnRoots" %46) #320, !dbg !17797, !noalias !17760
  %box.i = call noalias nonnull dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129876069019600 to {}*) to {} addrspace(10)*)) #322, !dbg !17798
  %47 = bitcast {} addrspace(10)* %box.i to i8 addrspace(10)*, !dbg !17798
  %newstruct.i.sroa.0.0..sroa_cast = bitcast {} addrspace(10)* %box.i to i64 addrspace(10)*, !dbg !17798
  store i64 %1, i64 addrspace(10)* %newstruct.i.sroa.0.0..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %newstruct.i.sroa.4.0..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %47, i64 8, !dbg !17798
  %newstruct.i.sroa.4.0..sroa_cast = bitcast i8 addrspace(10)* %newstruct.i.sroa.4.0..sroa_idx to i64 addrspace(10)*, !dbg !17798
  store i64 %2, i64 addrspace(10)* %newstruct.i.sroa.4.0..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %box10.i = call noalias nonnull dereferenceable(32) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129868876166928 to {}*) to {} addrspace(10)*)) #322, !dbg !17798
  %48 = bitcast {} addrspace(10)* %box10.i to i8 addrspace(10)*, !dbg !17798
  call void @llvm.memcpy.p10i8.p10i8.i64(i8 addrspace(10)* nocapture nofree noundef nonnull align 8 dereferenceable(32) %48, i8 addrspace(10)* noundef nonnull align 8 dereferenceable(32) %24, i64 noundef 32, i1 noundef false) #320, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %box12.i = call noalias nonnull dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875904414848 to {}*) to {} addrspace(10)*)) #322, !dbg !17798
  %49 = bitcast {} addrspace(10)* %box12.i to i8 addrspace(10)*, !dbg !17798
  %args.i.sroa.7.8..sroa_cast = bitcast {} addrspace(10)* %box12.i to i64 addrspace(10)*, !dbg !17798
  store i64 %.fca.1.0.0.extract, i64 addrspace(10)* %args.i.sroa.7.8..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %args.i.sroa.11.8..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %49, i64 8, !dbg !17798
  %args.i.sroa.11.8..sroa_cast = bitcast i8 addrspace(10)* %args.i.sroa.11.8..sroa_idx to i64 addrspace(10)*, !dbg !17798
  store i64 %.fca.1.1.0.extract, i64 addrspace(10)* %args.i.sroa.11.8..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %50 = call noalias nonnull "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @ijl_box_int64(i64 signext %5) #324, !dbg !17798, !noalias !17760
  %box15.i = call noalias nonnull dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,-1]:Pointer, [-1,0,0,0,0,0]:Integer, [-1,0,0,0,0,1]:Integer, [-1,0,0,0,0,2]:Integer, [-1,0,0,0,0,3]:Integer, [-1,0,0,0,0,8]:Pointer, [-1,0,0,0,0,16]:Integer, [-1,0,0,0,0,17]:Integer, [-1,0,0,0,0,18]:Integer, [-1,0,0,0,0,19]:Integer, [-1,0,0,0,0,20]:Integer, [-1,0,0,0,0,21]:Integer, [-1,0,0,0,0,22]:Integer, [-1,0,0,0,0,23]:Integer, [-1,0,0,0,0,32]:Integer, [-1,0,0,0,0,33]:Integer, [-1,0,0,0,0,34]:Integer, [-1,0,0,0,0,35]:Integer, [-1,0,0,0,0,36]:Integer, [-1,0,0,0,0,37]:Integer, [-1,0,0,0,0,38]:Integer, [-1,0,0,0,0,39]:Integer, [-1,0,0,0,0,40]:Integer, [-1,0,0,0,0,48]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,16,-1]:Integer, [-1,0,0,8]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,8]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 24, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129876066796880 to {}*) to {} addrspace(10)*)) #322, !dbg !17798
  %51 = bitcast {} addrspace(10)* %box15.i to i8 addrspace(10)*, !dbg !17798
  %args.i.sroa.18.32..sroa_cast = bitcast {} addrspace(10)* %box15.i to {} addrspace(10)* addrspace(10)*, !dbg !17798
  store {} addrspace(10)* %.fca.3.0.extract, {} addrspace(10)* addrspace(10)* %args.i.sroa.18.32..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %args.i.sroa.23.32..sroa_raw_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 8, !dbg !17798
  store i8 %.fca.3.1.0.extract, i8 addrspace(10)* %args.i.sroa.23.32..sroa_raw_idx, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %args.i.sroa.26.32..sroa_raw_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 9, !dbg !17798
  call void @llvm.memcpy.p10i8.p0i8.i64(i8 addrspace(10)* nocapture nofree noundef align 1 dereferenceable(7) %args.i.sroa.26.32..sroa_raw_idx, i8* noundef nonnull align 1 dereferenceable(7) %args.i.sroa.26.0.sroa_idx36, i64 noundef 7, i1 noundef false) #320, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %args.i.sroa.2623.32..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 16, !dbg !17798
  %args.i.sroa.2623.32..sroa_cast = bitcast i8 addrspace(10)* %args.i.sroa.2623.32..sroa_idx to i64 addrspace(10)*, !dbg !17798
  store i64 %.fca.3.2.0.extract, i64 addrspace(10)* %args.i.sroa.2623.32..sroa_cast, align 8, !dbg !17798, !tbaa !788, !alias.scope !1306, !noalias !17799
  %52 = call nonnull "enzyme_type"="{[-1]:Pointer}" {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129868875496304 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875916852480 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %box.i, {} addrspace(10)* nofree nonnull %box10.i, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129868927664704 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %3, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129871010901672 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875856197024 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %box12.i, {} addrspace(10)* nonnull %50, {} addrspace(10)* nofree nonnull %box15.i) #321, !dbg !17798
  call void @llvm.julia.gc_preserve_end(token %43) #320, !dbg !17800, !noalias !17760
  %innersret.sroa.0.0..sroa_idx = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 0, i32 0, !dbg !17801
  %innersret.sroa.0.0.copyload = load i64, i64 addrspace(10)* %innersret.sroa.0.0..sroa_idx, align 8, !dbg !17801, !noalias !17802
  %innersret.sroa.2.0..sroa_idx3 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 0, i32 1, !dbg !17801
  %innersret.sroa.2.0.copyload = load {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %innersret.sroa.2.0..sroa_idx3, align 8, !dbg !17801, !noalias !17802
  %innersret.sroa.3.0..sroa_idx4 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 1, i32 0, !dbg !17801
  %innersret.sroa.3.0.copyload = load i64, i64 addrspace(10)* %innersret.sroa.3.0..sroa_idx4, align 8, !dbg !17801, !noalias !17802
  %innersret.sroa.4.0..sroa_idx5 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 1, i32 1, !dbg !17801
  %innersret.sroa.4.0.copyload = load i32, i32 addrspace(10)* %innersret.sroa.4.0..sroa_idx5, align 8, !dbg !17801, !noalias !17802
  %args.i.sroa.26.0.sroa_idx3 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0, !dbg !17801
  call void @llvm.lifetime.end.p0i8(i64 7, i8* %args.i.sroa.26.0.sroa_idx3), !dbg !17801
  %53 = addrspacecast i8 addrspace(10)* %24 to i8*, !dbg !17801
  call void @llvm.lifetime.end.p0i8(i64 noundef 32, i8* noundef nonnull %53) #320, !dbg !17801
  %54 = addrspacecast i8 addrspace(10)* %26 to i8*, !dbg !17801
  call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %54) #320, !dbg !17801
  %.fca.0.0.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } poison, i64 %innersret.sroa.0.0.copyload, 0, 0, !dbg !17803
  %.fca.0.1.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.0.0.insert, {} addrspace(10)* %innersret.sroa.2.0.copyload, 0, 1, !dbg !17803
  %.fca.1.0.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.0.1.insert, i64 %innersret.sroa.3.0.copyload, 1, 0, !dbg !17803
  %.fca.1.1.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.1.0.insert, i32 %innersret.sroa.4.0.copyload, 1, 1, !dbg !17803
  ret { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.1.1.insert, !dbg !17803
}

 Type analysis state: 
<analysis>
  %32 = icmp slt i64 %2, 1, !dbg !317: {[-1]:Integer}, intvals: {}
  %innersret.sroa.4.0..sroa_idx5 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 1, i32 1, !dbg !375: {[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer}, intvals: {}
  %box10.i = call noalias nonnull dereferenceable(32) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129868876166928 to {}*) to {} addrspace(10)*)) #322, !dbg !370: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %43 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nofree nonnull align 8 dereferenceable(40) %3, {} addrspace(10)* %.fca.3.0.extract) #320, !dbg !335: {}, intvals: {}
  %.fca.1.0.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.0.1.insert, i64 %innersret.sroa.3.0.copyload, 1, 0, !dbg !377: {[0]:Pointer, [8]:Pointer, [8,0]:Pointer, [8,8]:Pointer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [16]:Pointer, [16,0]:Integer, [16,1]:Integer, [16,2]:Integer, [16,3]:Integer, [16,4]:Integer, [16,5]:Integer, [16,6]:Integer, [16,7]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Pointer, [16,32,-1]:Integer, [16,40]:Pointer, [16,40,-1]:Integer, [24]:Anything, [25]:Anything, [26]:Anything, [27]:Anything, [28]:Anything, [29]:Anything, [30]:Anything, [31]:Anything}, intvals: {}
  %args.i.sroa.23.32..sroa_raw_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 8, !dbg !370: {[-1]:Pointer, [-1,0]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer}, intvals: {}
  %args.i.sroa.2623.32..sroa_cast = bitcast i8 addrspace(10)* %args.i.sroa.2623.32..sroa_idx to i64 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer}, intvals: {}
  %innersret.sroa.3.0.copyload = load i64, i64 addrspace(10)* %innersret.sroa.3.0..sroa_idx4, align 8, !dbg !375, !noalias !376: {[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Pointer, [-1,32,-1]:Integer, [-1,40]:Pointer, [-1,40,-1]:Integer}, intvals: {}
{}* inttoptr (i64 129868876166928 to {}*): {[-1]:Anything}, intvals: {}
  %.fca.3.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 0: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Pointer, [-1,0,0,0,0]:Integer, [-1,0,0,0,1]:Integer, [-1,0,0,0,2]:Integer, [-1,0,0,0,3]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,16]:Integer, [-1,0,0,0,17]:Integer, [-1,0,0,0,18]:Integer, [-1,0,0,0,19]:Integer, [-1,0,0,0,20]:Integer, [-1,0,0,0,21]:Integer, [-1,0,0,0,22]:Integer, [-1,0,0,0,23]:Integer, [-1,0,0,0,32]:Integer, [-1,0,0,0,33]:Integer, [-1,0,0,0,34]:Integer, [-1,0,0,0,35]:Integer, [-1,0,0,0,36]:Integer, [-1,0,0,0,37]:Integer, [-1,0,0,0,38]:Integer, [-1,0,0,0,39]:Integer, [-1,0,0,0,40]:Integer, [-1,0,0,0,48]:Pointer, [-1,0,0,0,48,0]:Pointer, [-1,0,0,0,48,8]:Integer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,16,-1]:Integer, [-1,0,8]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer}, intvals: {}
  %.fca.1.1.0.extract = extractvalue [2 x [1 x i64]] %4, 1, 0, !enzyme_inactive !0: {[-1]:Integer}, intvals: {}
  %newstruct.i.sroa.4.0..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %47, i64 8, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %48 = bitcast {} addrspace(10)* %box10.i to i8 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %newstruct.i.sroa.0.0..sroa_cast = bitcast {} addrspace(10)* %box.i to i64 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
i64 129871010901672: {[-1]:Anything}, intvals: {129871010901672,}
  %42 = addrspacecast {} addrspace(10)* %box19.i to {} addrspace(12)*, !dbg !316: {}, intvals: {}
  %current_task2.i27 = getelementptr inbounds {}**, {}*** %28, i64 -14: {[-1]:Pointer}, intvals: {}
  %args.i.sroa.7.8..sroa_cast = bitcast {} addrspace(10)* %box12.i to i64 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
i32 0: {[-1]:Anything}, intvals: {0,}
  %current_task2.i = bitcast {}*** %current_task2.i27 to {}**: {[-1]:Pointer}, intvals: {}
{}* inttoptr (i64 129875856083216 to {}*): {[-1]:Anything}, intvals: {}
  %33 = call fastcc [1 x {} addrspace(10)*] @julia_AssertionError_36691({} addrspace(10)* nofree noundef nonnull readonly align 16 addrspacecast ({}* inttoptr (i64 129871334054032 to {}*) to {} addrspace(10)*)) #321, !dbg !320: {[-1]:Pointer}, intvals: {}
  %53 = addrspacecast i8 addrspace(10)* %24 to i8*, !dbg !375: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %45 = addrspacecast { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15 to { { i64, {} addrspace(10)* }, { i64, i32 } }*, !dbg !369: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %22 = call noalias nonnull dereferenceable(8) dereferenceable_or_null(8) {} addrspace(10)* @julia.gc_alloc_obj({}** %18, i64 8, {} addrspace(10)* @ejl_enz_any_array_1), !enzyme_fromstack !295: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %args.i.sroa.26.0.sroa_idx3 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0, !dbg !375: {[-1]:Pointer}, intvals: {}
{}* inttoptr (i64 129871334054032 to {}*): {[-1]:Anything}, intvals: {}
i64 1: {[-1]:Integer}, intvals: {1,}
{}* inttoptr (i64 129871334054064 to {}*): {[-1]:Anything}, intvals: {}
  %box17.i = call noalias nonnull dereferenceable(8) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875856083216 to {}*) to {} addrspace(10)*)) #322, !dbg !320: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
{} addrspace(10)* null: {[-1]:Pointer, [-1,-1]:Anything}, intvals: {0,}
  call void @zeroType({} addrspace(10)* %14, i8 0, i64 32), !enzyme_zerostack !0: {}, intvals: {}
  %36 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %34, i64 0, i64 0, !dbg !320: {}, intvals: {}
  %39 = bitcast {} addrspace(10)* %box19.i to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !316: {}, intvals: {}
  %40 = extractvalue [1 x {} addrspace(10)*] %38, 0, !dbg !316: {}, intvals: {}
  %34 = bitcast {} addrspace(10)* %box17.i to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !320: {}, intvals: {}
  %35 = extractvalue [1 x {} addrspace(10)*] %33, 0, !dbg !320: {}, intvals: {}
  %44 = icmp ne {} addrspace(10)* %.fca.3.0.extract, null: {[-1]:Integer}, intvals: {}
  %.fca.1.1.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.1.0.insert, i32 %innersret.sroa.4.0.copyload, 1, 1, !dbg !377: {[0]:Pointer, [8]:Pointer, [8,0]:Pointer, [8,8]:Pointer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [16]:Pointer, [16,0]:Integer, [16,1]:Integer, [16,2]:Integer, [16,3]:Integer, [16,4]:Integer, [16,5]:Integer, [16,6]:Integer, [16,7]:Integer, [16,8]:Integer, [16,9]:Integer, [16,10]:Integer, [16,11]:Integer, [16,12]:Integer, [16,13]:Integer, [16,14]:Integer, [16,15]:Integer, [16,16]:Integer, [16,17]:Integer, [16,18]:Integer, [16,19]:Integer, [16,20]:Integer, [16,21]:Integer, [16,22]:Integer, [16,23]:Integer, [16,24]:Integer, [16,25]:Integer, [16,26]:Integer, [16,27]:Integer, [16,28]:Integer, [16,29]:Integer, [16,30]:Integer, [16,31]:Integer, [16,32]:Pointer, [16,32,-1]:Integer, [16,40]:Pointer, [16,40,-1]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer}, intvals: {}
  %8 = call {}*** @julia.get_pgcstack(): {[-1]:Pointer}, intvals: {}
  %37 = addrspacecast {} addrspace(10)* %box17.i to {} addrspace(12)*, !dbg !320: {}, intvals: {}
  call void @zeroType.135({} addrspace(10)* %22, i8 0, i64 8), !enzyme_zerostack !0: {}, intvals: {}
  %54 = addrspacecast i8 addrspace(10)* %26 to i8*, !dbg !375: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %46 = addrspacecast [1 x {} addrspace(10)*] addrspace(10)* %23 to [1 x {} addrspace(10)*]*, !dbg !369: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %30 = getelementptr inbounds i64*, i64** %ptls_load.i2930, i64 2: {[-1]:Pointer}, intvals: {}
  %41 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %39, i64 0, i64 0, !dbg !316: {}, intvals: {}
  %29 = bitcast {}*** %ptls_field.i28 to i64***: {[-1]:Pointer, [-1,0]:Pointer}, intvals: {}
  %innersret.sroa.4.0.copyload = load i32, i32 addrspace(10)* %innersret.sroa.4.0..sroa_idx5, align 8, !dbg !375, !noalias !376: {[-1]:Integer}, intvals: {}
  %15 = bitcast {} addrspace(10)* %14 to { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)*, !enzyme_caststack !0: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %box15.i = call noalias nonnull dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,-1]:Pointer, [-1,0,0,0,0,0]:Integer, [-1,0,0,0,0,1]:Integer, [-1,0,0,0,0,2]:Integer, [-1,0,0,0,0,3]:Integer, [-1,0,0,0,0,8]:Pointer, [-1,0,0,0,0,16]:Integer, [-1,0,0,0,0,17]:Integer, [-1,0,0,0,0,18]:Integer, [-1,0,0,0,0,19]:Integer, [-1,0,0,0,0,20]:Integer, [-1,0,0,0,0,21]:Integer, [-1,0,0,0,0,22]:Integer, [-1,0,0,0,0,23]:Integer, [-1,0,0,0,0,32]:Integer, [-1,0,0,0,0,33]:Integer, [-1,0,0,0,0,34]:Integer, [-1,0,0,0,0,35]:Integer, [-1,0,0,0,0,36]:Integer, [-1,0,0,0,0,37]:Integer, [-1,0,0,0,0,38]:Integer, [-1,0,0,0,0,39]:Integer, [-1,0,0,0,0,40]:Integer, [-1,0,0,0,0,48]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,16,-1]:Integer, [-1,0,0,8]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,8]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 24, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129876066796880 to {}*) to {} addrspace(10)*)) #322, !dbg !370: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,-1]:Pointer, [-1,0,0,0,0,0]:Integer, [-1,0,0,0,0,1]:Integer, [-1,0,0,0,0,2]:Integer, [-1,0,0,0,0,3]:Integer, [-1,0,0,0,0,8]:Pointer, [-1,0,0,0,0,16]:Integer, [-1,0,0,0,0,17]:Integer, [-1,0,0,0,0,18]:Integer, [-1,0,0,0,0,19]:Integer, [-1,0,0,0,0,20]:Integer, [-1,0,0,0,0,21]:Integer, [-1,0,0,0,0,22]:Integer, [-1,0,0,0,0,23]:Integer, [-1,0,0,0,0,32]:Integer, [-1,0,0,0,0,33]:Integer, [-1,0,0,0,0,34]:Integer, [-1,0,0,0,0,35]:Integer, [-1,0,0,0,0,36]:Integer, [-1,0,0,0,0,37]:Integer, [-1,0,0,0,0,38]:Integer, [-1,0,0,0,0,39]:Integer, [-1,0,0,0,0,40]:Integer, [-1,0,0,0,0,48]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,16,-1]:Integer, [-1,0,0,8]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,8]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}, intvals: {}
  %52 = call nonnull "enzyme_type"="{[-1]:Pointer}" {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129868875496304 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875916852480 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %box.i, {} addrspace(10)* nofree nonnull %box10.i, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129868927664704 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %3, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129871010901672 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875856197024 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %box12.i, {} addrspace(10)* nonnull %50, {} addrspace(10)* nofree nonnull %box15.i) #321, !dbg !370: {[-1]:Pointer}, intvals: {}
  %.fca.3.2.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 2, 0, !enzyme_inactive !0: {[-1]:Integer}, intvals: {}
  %.fca.3.1.0.extract = extractvalue { {} addrspace(10)*, [1 x i8], [1 x i64] } %6, 1, 0, !enzyme_inactive !0: {[-1]:Integer}, intvals: {}
  %box19.i = call noalias nonnull dereferenceable(8) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875856083216 to {}*) to {} addrspace(10)*)) #322, !dbg !316: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %23 = bitcast {} addrspace(10)* %22 to [1 x {} addrspace(10)*] addrspace(10)*, !enzyme_caststack !0: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %11 = getelementptr inbounds {}*, {}** %10, i64 16: {[-1]:Pointer}, intvals: {}
  %13 = load i8*, i8** %12, align 8: {}, intvals: {}
  %newstruct.i.sroa.4.0..sroa_cast = bitcast i8 addrspace(10)* %newstruct.i.sroa.4.0..sroa_idx to i64 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %innersret.sroa.2.0..sroa_idx3 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 0, i32 1, !dbg !375: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,8]:Pointer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer}, intvals: {}
  %24 = bitcast { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15 to i8 addrspace(10)*: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %args.i.sroa.26 = bitcast i8* %16 to [7 x i8]*, !enzyme_caststack !0: {[-1]:Pointer}, intvals: {}
i64 32: {[-1]:Integer}, intvals: {32,}
  %38 = call fastcc [1 x {} addrspace(10)*] @julia_AssertionError_36691({} addrspace(10)* nofree noundef nonnull readonly align 16 addrspacecast ({}* inttoptr (i64 129871334054064 to {}*) to {} addrspace(10)*)) #320, !dbg !316: {[-1]:Pointer}, intvals: {}
  %args.i.sroa.26.32..sroa_raw_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 9, !dbg !370: {[-1]:Pointer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer}, intvals: {}
  %innersret.sroa.0.0.copyload = load i64, i64 addrspace(10)* %innersret.sroa.0.0..sroa_idx, align 8, !dbg !375, !noalias !376: {[-1]:Pointer}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129868875496304 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %ptls_field.i28 = getelementptr inbounds {}**, {}*** %28, i64 2: {[-1]:Pointer, [-1,0]:Pointer}, intvals: {}
{}* inttoptr (i64 129867219910160 to {}*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129867219910160 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %.fca.1.0.0.extract = extractvalue [2 x [1 x i64]] %4, 0, 0, !enzyme_inactive !0: {[-1]:Integer}, intvals: {}
  %.fca.0.0.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } poison, i64 %innersret.sroa.0.0.copyload, 0, 0, !dbg !377: {[0]:Pointer, [8]:Anything, [9]:Anything, [10]:Anything, [11]:Anything, [12]:Anything, [13]:Anything, [14]:Anything, [15]:Anything, [16]:Anything, [17]:Anything, [18]:Anything, [19]:Anything, [20]:Anything, [21]:Anything, [22]:Anything, [23]:Anything, [24]:Anything, [25]:Anything, [26]:Anything, [27]:Anything, [28]:Anything, [29]:Anything, [30]:Anything, [31]:Anything}, intvals: {}
{}* inttoptr (i64 129875904414848 to {}*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129876066796880 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{}* inttoptr (i64 129876066796880 to {}*): {[-1]:Anything}, intvals: {}
{}* inttoptr (i64 129868927664704 to {}*): {[-1]:Anything}, intvals: {}
  %10 = getelementptr inbounds {}*, {}** %9, i64 -14: {[-1]:Pointer}, intvals: {}
  %12 = bitcast {}** %11 to i8**: {[-1]:Pointer}, intvals: {}
  %box12.i = call noalias nonnull dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129875904414848 to {}*) to {} addrspace(10)*)) #322, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
{}* inttoptr (i64 129875916852480 to {}*): {[-1]:Anything}, intvals: {}
  %16 = call noalias nonnull dereferenceable(7) dereferenceable_or_null(7) i8* @malloc(i64 7), !enzyme_fromstack !296: {[-1]:Pointer}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129871334054064 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875904414848 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %14 = call noalias nonnull dereferenceable(32) dereferenceable_or_null(32) {} addrspace(10)* @julia.gc_alloc_obj({}** %10, i64 32, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 129867219910160 to {}*) to {} addrspace(10)*)), !enzyme_fromstack !295: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
  %21 = load i8*, i8** %20, align 8: {}, intvals: {}
i64 129875856197024: {[-1]:Anything}, intvals: {129875856197024,}
{} addrspace(10)* %0: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Pointer, [-1,0,0,0,0]:Integer, [-1,0,0,0,1]:Integer, [-1,0,0,0,2]:Integer, [-1,0,0,0,3]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,16]:Integer, [-1,0,0,0,17]:Integer, [-1,0,0,0,18]:Integer, [-1,0,0,0,19]:Integer, [-1,0,0,0,20]:Integer, [-1,0,0,0,21]:Integer, [-1,0,0,0,22]:Integer, [-1,0,0,0,23]:Integer, [-1,0,0,0,32]:Integer, [-1,0,0,0,33]:Integer, [-1,0,0,0,34]:Integer, [-1,0,0,0,35]:Integer, [-1,0,0,0,36]:Integer, [-1,0,0,0,37]:Integer, [-1,0,0,0,38]:Integer, [-1,0,0,0,39]:Integer, [-1,0,0,0,40]:Integer, [-1,0,0,0,48]:Pointer, [-1,0,0,0,48,0]:Pointer, [-1,0,0,0,48,8]:Integer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,16,-1]:Integer, [-1,0,8]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}, intvals: {}
i64 %1: {[-1]:Integer}, intvals: {}
i64 %2: {[-1]:Integer}, intvals: {}
{} addrspace(10)* %3: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Pointer, [-1,0,0,0,0]:Integer, [-1,0,0,0,1]:Integer, [-1,0,0,0,2]:Integer, [-1,0,0,0,3]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,16]:Integer, [-1,0,0,0,17]:Integer, [-1,0,0,0,18]:Integer, [-1,0,0,0,19]:Integer, [-1,0,0,0,20]:Integer, [-1,0,0,0,21]:Integer, [-1,0,0,0,22]:Integer, [-1,0,0,0,23]:Integer, [-1,0,0,0,32]:Integer, [-1,0,0,0,33]:Integer, [-1,0,0,0,34]:Integer, [-1,0,0,0,35]:Integer, [-1,0,0,0,36]:Integer, [-1,0,0,0,37]:Integer, [-1,0,0,0,38]:Integer, [-1,0,0,0,39]:Integer, [-1,0,0,0,40]:Integer, [-1,0,0,0,48]:Pointer, [-1,0,0,0,48,0]:Pointer, [-1,0,0,0,48,8]:Integer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,16,-1]:Integer, [-1,0,8]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}, intvals: {}
[2 x [1 x i64]] %4: {[-1]:Integer}, intvals: {}
i64 %5: {[-1]:Integer}, intvals: {}
{ {} addrspace(10)*, [1 x i8], [1 x i64] } %6: {[0]:Pointer, [0,0]:Pointer, [0,0,0]:Pointer, [0,0,0,-1]:Pointer, [0,0,0,0,0]:Integer, [0,0,0,0,1]:Integer, [0,0,0,0,2]:Integer, [0,0,0,0,3]:Integer, [0,0,0,0,8]:Pointer, [0,0,0,0,16]:Integer, [0,0,0,0,17]:Integer, [0,0,0,0,18]:Integer, [0,0,0,0,19]:Integer, [0,0,0,0,20]:Integer, [0,0,0,0,21]:Integer, [0,0,0,0,22]:Integer, [0,0,0,0,23]:Integer, [0,0,0,0,32]:Integer, [0,0,0,0,33]:Integer, [0,0,0,0,34]:Integer, [0,0,0,0,35]:Integer, [0,0,0,0,36]:Integer, [0,0,0,0,37]:Integer, [0,0,0,0,38]:Integer, [0,0,0,0,39]:Integer, [0,0,0,0,40]:Integer, [0,0,0,0,48]:Pointer, [0,0,0,0,48,0]:Pointer, [0,0,0,0,48,8]:Integer, [0,0,0,0,56]:Integer, [0,0,0,0,57]:Integer, [0,0,0,16,-1]:Integer, [0,0,8]:Integer, [0,8]:Integer, [0,9]:Integer, [0,10]:Integer, [0,11]:Integer, [0,12]:Integer, [0,13]:Integer, [0,14]:Integer, [0,15]:Integer, [0,16]:Integer, [0,17]:Integer, [0,18]:Integer, [0,19]:Integer, [0,20]:Integer, [0,21]:Integer, [0,22]:Integer, [0,23]:Integer, [0,24]:Integer, [0,25]:Integer, [0,26]:Integer, [0,27]:Integer, [0,28]:Integer, [0,29]:Integer, [0,30]:Integer, [0,31]:Integer, [8]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer}, intvals: {}
  %28 = call {}*** @julia.get_pgcstack() #320, !noalias !297: {[-1]:Pointer, [-1,16]:Pointer}, intvals: {}
  %20 = bitcast {}** %19 to i8**: {[-1]:Pointer}, intvals: {}
  %ptls_load.i2930 = load i64**, i64*** %29, align 8, !tbaa !301, !noalias !297: {[-1]:Pointer}, intvals: {}
  %safepoint.i = load i64*, i64** %30, align 8, !tbaa !305, !noalias !297: {}, intvals: {}
  %args.i.sroa.11.8..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %49, i64 8, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %args.i.sroa.11.8..sroa_cast = bitcast i8 addrspace(10)* %args.i.sroa.11.8..sroa_idx to i64 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %25 = addrspacecast i8 addrspace(10)* %24 to i8*: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Pointer, [-1,8,16]:Integer, [-1,8,17]:Integer, [-1,8,18]:Integer, [-1,8,19]:Integer, [-1,8,20]:Integer, [-1,8,21]:Integer, [-1,8,22]:Integer, [-1,8,23]:Integer, [-1,16]:Pointer, [-1,16,0]:Integer, [-1,16,1]:Integer, [-1,16,2]:Integer, [-1,16,3]:Integer, [-1,16,4]:Integer, [-1,16,5]:Integer, [-1,16,6]:Integer, [-1,16,7]:Integer, [-1,16,8]:Integer, [-1,16,9]:Integer, [-1,16,10]:Integer, [-1,16,11]:Integer, [-1,16,12]:Integer, [-1,16,13]:Integer, [-1,16,14]:Integer, [-1,16,15]:Integer, [-1,16,16]:Integer, [-1,16,17]:Integer, [-1,16,18]:Integer, [-1,16,19]:Integer, [-1,16,20]:Integer, [-1,16,21]:Integer, [-1,16,22]:Integer, [-1,16,23]:Integer, [-1,16,24]:Integer, [-1,16,25]:Integer, [-1,16,26]:Integer, [-1,16,27]:Integer, [-1,16,28]:Integer, [-1,16,29]:Integer, [-1,16,30]:Integer, [-1,16,31]:Integer, [-1,16,32]:Pointer, [-1,16,32,-1]:Integer, [-1,16,40]:Pointer, [-1,16,40,-1]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer}, intvals: {}
{}* inttoptr (i64 129871010901672 to {}*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875856197024 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129871010901672 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{ { i64, {} addrspace(10)* }, { i64, i32 } } poison: {[-1]:Anything}, intvals: {}
i64 129875916852480: {[-1]:Anything}, intvals: {129875916852480,}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875916852480 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129876069019600 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %18 = getelementptr inbounds {}*, {}** %17, i64 -14: {[-1]:Pointer}, intvals: {}
  %17 = bitcast {}*** %7 to {}**: {[-1]:Pointer}, intvals: {}
  %args.i.sroa.2623.32..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %51, i64 16, !dbg !370: {[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer}, intvals: {}
  %innersret.sroa.2.0.copyload = load {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %innersret.sroa.2.0..sroa_idx3, align 8, !dbg !375, !noalias !376: {[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}, intvals: {}
  %50 = call noalias nonnull "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @ijl_box_int64(i64 signext %5) #324, !dbg !370, !noalias !297: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %51 = bitcast {} addrspace(10)* %box15.i to i8 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,-1]:Pointer, [-1,0,0,0,0,0]:Integer, [-1,0,0,0,0,1]:Integer, [-1,0,0,0,0,2]:Integer, [-1,0,0,0,0,3]:Integer, [-1,0,0,0,0,8]:Pointer, [-1,0,0,0,0,16]:Integer, [-1,0,0,0,0,17]:Integer, [-1,0,0,0,0,18]:Integer, [-1,0,0,0,0,19]:Integer, [-1,0,0,0,0,20]:Integer, [-1,0,0,0,0,21]:Integer, [-1,0,0,0,0,22]:Integer, [-1,0,0,0,0,23]:Integer, [-1,0,0,0,0,32]:Integer, [-1,0,0,0,0,33]:Integer, [-1,0,0,0,0,34]:Integer, [-1,0,0,0,0,35]:Integer, [-1,0,0,0,0,36]:Integer, [-1,0,0,0,0,37]:Integer, [-1,0,0,0,0,38]:Integer, [-1,0,0,0,0,39]:Integer, [-1,0,0,0,0,40]:Integer, [-1,0,0,0,0,48]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,16,-1]:Integer, [-1,0,0,8]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,8]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129868927664704 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
{}* inttoptr (i64 129876069019600 to {}*): {[-1]:Anything}, intvals: {}
  %innersret.sroa.3.0..sroa_idx4 = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 1, i32 0, !dbg !375: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Integer, [-1,0,1]:Integer, [-1,0,2]:Integer, [-1,0,3]:Integer, [-1,0,4]:Integer, [-1,0,5]:Integer, [-1,0,6]:Integer, [-1,0,7]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,0,32]:Pointer, [-1,0,32,-1]:Integer, [-1,0,40]:Pointer, [-1,0,40,-1]:Integer}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129868876166928 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %innersret.sroa.0.0..sroa_idx = getelementptr { { i64, {} addrspace(10)* }, { i64, i32 } }, { { i64, {} addrspace(10)* }, { i64, i32 } } addrspace(10)* %15, i64 0, i32 0, i32 0, !dbg !375: {[-1]:Pointer, [-1,0]:Pointer}, intvals: {}
  %47 = bitcast {} addrspace(10)* %box.i to i8 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %.fca.0.1.insert = insertvalue { { i64, {} addrspace(10)* }, { i64, i32 } } %.fca.0.0.insert, {} addrspace(10)* %innersret.sroa.2.0.copyload, 0, 1, !dbg !377: {[0]:Pointer, [8]:Pointer, [8,0]:Pointer, [8,8]:Pointer, [8,16]:Integer, [8,17]:Integer, [8,18]:Integer, [8,19]:Integer, [8,20]:Integer, [8,21]:Integer, [8,22]:Integer, [8,23]:Integer, [16]:Anything, [17]:Anything, [18]:Anything, [19]:Anything, [20]:Anything, [21]:Anything, [22]:Anything, [23]:Anything, [24]:Anything, [25]:Anything, [26]:Anything, [27]:Anything, [28]:Anything, [29]:Anything, [30]:Anything, [31]:Anything}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129871334054032 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
  %box.i = call noalias nonnull dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,-1]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task2.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 129876069019600 to {}*) to {} addrspace(10)*)) #322, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %args.i.sroa.18.32..sroa_cast = bitcast {} addrspace(10)* %box15.i to {} addrspace(10)* addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,-1]:Pointer, [-1,0,0,0,0,0]:Integer, [-1,0,0,0,0,1]:Integer, [-1,0,0,0,0,2]:Integer, [-1,0,0,0,0,3]:Integer, [-1,0,0,0,0,8]:Pointer, [-1,0,0,0,0,16]:Integer, [-1,0,0,0,0,17]:Integer, [-1,0,0,0,0,18]:Integer, [-1,0,0,0,0,19]:Integer, [-1,0,0,0,0,20]:Integer, [-1,0,0,0,0,21]:Integer, [-1,0,0,0,0,22]:Integer, [-1,0,0,0,0,23]:Integer, [-1,0,0,0,0,32]:Integer, [-1,0,0,0,0,33]:Integer, [-1,0,0,0,0,34]:Integer, [-1,0,0,0,0,35]:Integer, [-1,0,0,0,0,36]:Integer, [-1,0,0,0,0,37]:Integer, [-1,0,0,0,0,38]:Integer, [-1,0,0,0,0,39]:Integer, [-1,0,0,0,0,40]:Integer, [-1,0,0,0,0,48]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,16,-1]:Integer, [-1,0,0,8]:Integer, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,8]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}, intvals: {}
  %49 = bitcast {} addrspace(10)* %box12.i to i8 addrspace(10)*, !dbg !370: {[-1]:Pointer, [-1,-1]:Integer}, intvals: {}
  %26 = bitcast [1 x {} addrspace(10)*] addrspace(10)* %23 to i8 addrspace(10)*: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
  %27 = addrspacecast i8 addrspace(10)* %26 to i8*: {[-1]:Pointer, [-1,-1]:Pointer}, intvals: {}
i64 129868927664704: {[-1]:Anything}, intvals: {129868927664704,}
  %9 = bitcast {}*** %8 to {}**: {[-1]:Pointer}, intvals: {}
  %7 = call {}*** @julia.get_pgcstack(): {[-1]:Pointer}, intvals: {}
  %19 = getelementptr inbounds {}*, {}** %18, i64 16: {[-1]:Pointer}, intvals: {}
i64 8: {[-1]:Integer}, intvals: {8,}
  %args.i.sroa.26.0.sroa_idx2 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0: {[-1]:Pointer}, intvals: {}
i64 129868875496304: {[-1]:Anything}, intvals: {129868875496304,}
{}* inttoptr (i64 129868875496304 to {}*): {[-1]:Anything}, intvals: {}
  %args.i.sroa.26.0.sroa_idx36 = getelementptr inbounds [7 x i8], [7 x i8]* %args.i.sroa.26, i64 0, i64 0: {[-1]:Pointer}, intvals: {}
@ijl_invoke: {[-1]:Pointer}, intvals: {}
{}* inttoptr (i64 129875856197024 to {}*): {[-1]:Anything}, intvals: {}
  %31 = icmp slt i64 %1, 1, !dbg !309: {[-1]:Integer}, intvals: {}
{} addrspace(10)* addrspacecast ({}* inttoptr (i64 129875856083216 to {}*) to {} addrspace(10)*): {[-1]:Anything}, intvals: {}
i32 1: {[-1]:Integer}, intvals: {1,}
i64 0: {[-1]:Anything}, intvals: {0,}
</analysis>

Cannot deduce type of copy   call void @llvm.memcpy.p10i8.p0i8.i64(i8 addrspace(10)* nocapture nofree noundef align 1 dereferenceable(7) %args.i.sroa.26.32..sroa_raw_idx, i8* noundef nonnull align 1 dereferenceable(7) %args.i.sroa.26.0.sroa_idx36, i64 noundef 7, i1 noundef false) #320, !dbg !370, !tbaa !371, !alias.scope !372, !noalias !373

Caused by:
Stacktrace:
 [1] macro expansion
   @ ~/git/CUDA.jl/src/compiler/execution.jl:114
 [2] #gpu_call#1205
   @ ~/git/CUDA.jl/src/gpuarrays.jl:30
 [3] gpu_call
   @ ~/git/CUDA.jl/src/gpuarrays.jl:28
 [4] #gpu_call#1
   @ ~/git/GPUArrays.jl/src/device/execution.jl:69
 [5] #gpu_call#1
   @ ~/git/GPUArrays.jl/src/device/execution.jl:0
within MethodInstance for GPUArrays.var"#gpu_call#1"(::CuArray{Float32, 2, CUDA.DeviceMemory}, ::Nothing, ::Int64, ::Int64, ::Nothing, ::typeof(GPUArrays.gpu_call), ::GPUArrays.var"#35#37", ::CuArray{Float32, 2, CUDA.DeviceMemory}, ::CUDA.CuArrayStyle{2, CUDA.DeviceMemory}, ::typeof(identity), ::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ::Int64, ::Base.Broadcast.Extruded{CuArray{Float32, 1, CUDA.DeviceMemory}, Tuple{Bool}, Tuple{Int64}})


Stacktrace:
  [1] throwerr(cstr::Cstring)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:1797
  [2] macro expansion
    @ ~/git/CUDA.jl/src/compiler/execution.jl:114 [inlined]
  [3] #gpu_call#1205
    @ ~/git/CUDA.jl/src/gpuarrays.jl:30 [inlined]
  [4] gpu_call
    @ ~/git/CUDA.jl/src/gpuarrays.jl:28 [inlined]
  [5] #gpu_call#1
    @ ~/git/GPUArrays.jl/src/device/execution.jl:69 [inlined]
  [6] augmented_julia__gpu_call_1_36685_inner_1wrap
    @ ~/git/GPUArrays.jl/src/device/execution.jl:0
  [7] macro expansion
    @ ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6819 [inlined]
  [8] enzyme_call(::Val{…}, ::Ptr{…}, ::Type{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Type{…}, ::Const{…}, ::Type{…}, ::Duplicated{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Duplicated{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Const{…}, ::Duplicated{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6419
  [9] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{…}, Const{…}, Const{…}, Tuple{…}, 1, true, @NamedTuple{…}})(::Const{GPUArrays.var"##gpu_call#1"}, ::Duplicated{CuArray{…}}, ::Vararg{Any})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6307
 [10] runtime_generic_augfwd(activity::Type{…}, width::Val{…}, ModifiedBetween::Val{…}, RT::Val{…}, f::GPUArrays.var"##gpu_call#1", df::Nothing, primal_1::CuArray{…}, shadow_1_1::CuArray{…}, primal_2::Nothing, shadow_2_1::Nothing, primal_3::Int64, shadow_3_1::Nothing, primal_4::Int64, shadow_4_1::Nothing, primal_5::Nothing, shadow_5_1::Nothing, primal_6::typeof(GPUArrays.gpu_call), shadow_6_1::Nothing, primal_7::GPUArrays.var"#35#37", shadow_7_1::Nothing, primal_8::CuArray{…}, shadow_8_1::CuArray{…}, primal_9::CUDA.CuArrayStyle{…}, shadow_9_1::Nothing, primal_10::typeof(identity), shadow_10_1::Nothing, primal_11::Tuple{…}, shadow_11_1::Nothing, primal_12::Int64, shadow_12_1::Nothing, primal_13::Base.Broadcast.Extruded{…}, shadow_13_1::Base.Broadcast.Extruded{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/r8mFE/src/rules/jitrules.jl:313
 [11] gpu_call
    @ ~/git/GPUArrays.jl/src/device/execution.jl:34 [inlined]
 [12] _copyto!
    @ ~/git/GPUArrays.jl/src/host/broadcast.jl:97 [inlined]
 [13] materialize!
    @ ~/git/GPUArrays.jl/src/host/broadcast.jl:38 [inlined]
 [14] materialize!
    @ ./broadcast.jl:911 [inlined]
 [15] muladd
    @ ~/git/Enzyme.jl/julia-1.10.2/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:210
 [16] loss_function
    @ ./REPL[7]:2 [inlined]
 [17] diffejulia_loss_function_1328wrap
    @ ./REPL[7]:0
 [18] macro expansion
    @ ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6819 [inlined]
 [19] enzyme_call
    @ ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6419 [inlined]
 [20] CombinedAdjointThunk
    @ ~/.julia/packages/Enzyme/r8mFE/src/compiler.jl:6296 [inlined]
 [21] autodiff
    @ ~/.julia/packages/Enzyme/r8mFE/src/Enzyme.jl:314 [inlined]
 [22] autodiff(::ReverseMode{false, FFIABI, false}, ::typeof(loss_function), ::Type{Active}, ::Const{var"#1#2"}, ::Duplicated{CuArray{…}}, ::Duplicated{CuArray{…}}, ::Duplicated{CuArray{…}})
    @ Enzyme ~/.julia/packages/Enzyme/r8mFE/src/Enzyme.jl:326
 [23] top-level scope
    @ REPL[8]:6
Some type information was truncated. Use `show(err)` to see complete types.

@avik-pal
Copy link
Contributor Author

avik-pal commented Sep 9, 2024

using Lux, Random, LuxCUDA, Enzyme

gdev = gpu_device()

rng = Random.default_rng()

model = Chain(Dense(2 => 3, tanh), Dense(3 => 2))
ps, st = Lux.setup(Random.default_rng(), model) |> gdev
x = rand(rng, Float32, 2, 10) |> gdev

y = first(model(x, ps, st))

function loss_function(y, model, ps, st, x)
    y .= first(model(x, ps, st))
    return
end

begin
    y = zeros(Float32, 2, 10) |> gdev
    dy = ones(Float32, 2, 10) |> gdev
    dx = zeros(Float32, 2, 10) |> gdev
    dps = Enzyme.make_zero(ps)

    Enzyme.autodiff(Reverse, loss_function, Active, Duplicated(y, dy),
        Const(model), Duplicated(ps, dps), Const(st), Duplicated(x, dx))

    @show dx
    @show dps
end

I am now getting:

1-element ExceptionStack:
Return type `Nothing` not marked Const, but type is guaranteed to be constant
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{0x0000000000007bf5}, ::Type{Const{typeof(loss_function)}}, ::Type{Active}, tt::Type{Tuple{Duplicated{CuArray{Float32, 2, CUDA.DeviceMemory}}, Const{Chain{@NamedTuple{layer_1::Dense{typeof(tanh), Int64, Int64, Nothing, Nothing, Static.True}, layer_2::Dense{typeof(identity), Int64, Int64, Nothing, Nothing, Static.True}}, Nothing}}, Duplicated{@NamedTuple{layer_1::@NamedTuple{weight::CuArray{Float32, 2, CUDA.DeviceMemory}, bias::CuArray{Float32, 1, CUDA.DeviceMemory}}, layer_2::@NamedTuple{weight::CuArray{Float32, 2, CUDA.DeviceMemory}, bias::CuArray{Float32, 1, CUDA.DeviceMemory}}}}, Const{@NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}}}, Duplicated{CuArray{Float32, 2, CUDA.DeviceMemory}}}}, ::Val{Enzyme.API.DEM_ReverseModeCombined}, ::Val{1}, ::Val{(false, false, false, false, false, false)}, ::Val{false}, ::Val{false}, ::Type{FFIABI}, ::Val{true})
   @ Enzyme.Compiler /mnt/.julia/packages/Enzyme/TiboG/src/compiler.jl:7328
 [3] #s2055#19000
   @ /mnt/.julia/packages/Enzyme/TiboG/src/compiler.jl:7407 [inlined]
 [4] var"#s2055#19000"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ErrIfFuncWritten::Any, ::Any, ::Type, ::Type, ::Type, tt::Any, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Any)
   @ Enzyme.Compiler ./none:0
 [5] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
   @ Core ./boot.jl:602
 [6] autodiff
   @ /mnt/.julia/packages/Enzyme/TiboG/src/Enzyme.jl:315 [inlined]
 [7] autodiff(::ReverseMode{false, FFIABI, false, false}, ::typeof(loss_function), ::Type{Active}, ::Duplicated{CuArray{Float32, 2, CUDA.DeviceMemory}}, ::Const{Chain{@NamedTuple{layer_1::Dense{typeof(tanh), Int64, Int64, Nothing, Nothing, Static.True}, layer_2::Dense{typeof(identity), Int64, Int64, Nothing, Nothing, Static.True}}, Nothing}}, ::Duplicated{@NamedTuple{layer_1::@NamedTuple{weight::CuArray{Float32, 2, CUDA.DeviceMemory}, bias::CuArray{Float32, 1, CUDA.DeviceMemory}}, layer_2::@NamedTuple{weight::CuArray{Float32, 2, CUDA.DeviceMemory}, bias::CuArray{Float32, 1, CUDA.DeviceMemory}}}}, ::Const{@NamedTuple{layer_1::@NamedTuple{}, layer_2::@NamedTuple{}}}, ::Duplicated{CuArray{Float32, 2, CUDA.DeviceMemory}})
   @ Enzyme /mnt/.julia/packages/Enzyme/TiboG/src/Enzyme.jl:332
 [8] top-level scope
   @ REPL[19]:7
 [9] top-level scope
   @ none:1

@wsmoses
Copy link
Member

wsmoses commented Sep 9, 2024 via email

@avik-pal
Copy link
Contributor Author

avik-pal commented Sep 9, 2024

Yeah nvm it should have been Reverse, loss_function, Const,.... It is still hitting

ERROR: 
No augmented forward pass found for cublasLtMatmulDescCreate
 at context:   %133 = call i32 @cublasLtMatmulDescCreate(i64 %bitcast_coercion, i32 %unbox32, i32 0) #916 [ "jl_roots"({} addrspace(10)* %126) ], !dbg !1609

Stacktrace:
 [1] macro expansion
   @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:218
 [2] macro expansion
   @ /mnt/.julia/packages/CUDA/Tl08O/lib/cublas/libcublasLt.jl:400
 [3] #1158
   @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:35
 [4] retry_reclaim
   @ /mnt/.julia/packages/CUDA/Tl08O/src/memory.jl:434
 [5] check
   @ /mnt/.julia/packages/CUDA/Tl08O/lib/cublas/libcublas.jl:24
 [6] cublasLtMatmulDescCreate
   @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:34
 [7] cublaslt_matmul_fused!
   @ /mnt/.julia/packages/LuxLib/xXJ6n/ext/LuxLibCUDAExt/cublaslt.jl:62


Stacktrace:
  [1] macro expansion
    @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:218 [inlined]
  [2] macro expansion
    @ /mnt/.julia/packages/CUDA/Tl08O/lib/cublas/libcublasLt.jl:400 [inlined]
  [3] #1158
    @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:35 [inlined]
  [4] retry_reclaim
    @ /mnt/.julia/packages/CUDA/Tl08O/src/memory.jl:434 [inlined]
  [5] check
    @ /mnt/.julia/packages/CUDA/Tl08O/lib/cublas/libcublas.jl:24 [inlined]
  [6] cublasLtMatmulDescCreate
    @ /mnt/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:34 [inlined]
  [7] cublaslt_matmul_fused!
    @ /mnt/.julia/packages/LuxLib/xXJ6n/ext/LuxLibCUDAExt/cublaslt.jl:62
  [8] cublaslt_matmul_fused!
    @ /mnt/.julia/packages/LuxLib/xXJ6n/ext/LuxLibCUDAExt/cublaslt.jl:13 [inlined]
  [9] cublasLt_fused_dense!
    @ /mnt/.julia/packages/LuxLib/xXJ6n/ext/LuxLibCUDAExt/cublaslt.jl:195
 [10] cublasLt_fused_dense!
    @ /mnt/.julia/packages/LuxLib/xXJ6n/ext/LuxLibCUDAExt/cublaslt.jl:193 [inlined]
 [11] matmuladd!
    @ /mnt/.julia/packages/LuxLib/xXJ6n/src/impl/matmul.jl:64 [inlined]
 [12] matmuladd
    @ /mnt/.julia/packages/LuxLib/xXJ6n/src/impl/matmul.jl:23 [inlined]
 [13] matmuladd
    @ /mnt/.julia/packages/LuxLib/xXJ6n/src/impl/matmul.jl:7 [inlined]
 [14] fused_dense
    @ /mnt/.julia/packages/LuxLib/xXJ6n/src/impl/dense.jl:6 [inlined]
 [15] fused_dense_bias_activation
    @ /mnt/.julia/packages/LuxLib/xXJ6n/src/api/dense.jl:30 [inlined]
 [16] Dense
    @ /mnt/research/lux/Lux.jl/src/layers/basic.jl:344 [inlined]
 [17] apply
    @ /mnt/.julia/packages/LuxCore/3RH53/src/LuxCore.jl:155 [inlined]
 [18] macro expansion
    @ /mnt/research/lux/Lux.jl/src/layers/containers.jl:0 [inlined]
 [19] applychain
    @ /mnt/research/lux/Lux.jl/src/layers/containers.jl:482 [inlined]
 [20] Chain
    @ /mnt/research/lux/Lux.jl/src/layers/containers.jl:480 [inlined]
 [21] loss_function
    @ ./REPL[8]:2 [inlined]
 [22] loss_function
    @ ./REPL[8]:0 [inlined]
 [23] diffejulia_loss_function_5796_inner_1wrap
    @ ./REPL[8]:0
 [24] macro expansion
    @ /mnt/.julia/packages/Enzyme/TiboG/src/compiler.jl:7187 [inlined]
 [25] enzyme_call
    @ /mnt/.julia/packages/Enzyme/TiboG/src/compiler.jl:6794 [inlined]
 [26] CombinedAdjointThunk
    @ /mnt/.julia/packages/Enzyme/TiboG/src/compiler.jl:6671 [inlined]
 [27] autodiff
    @ /mnt/.julia/packages/Enzyme/TiboG/src/Enzyme.jl:320 [inlined]
 [28] autodiff(::ReverseMode{false, FFIABI, false, false}, ::typeof(loss_function), ::Type{Const}, ::Duplicated{CuArray{…}}, ::Const{Chain{…}}, ::Duplicated{@NamedTuple{…}}, ::Const{@NamedTuple{…}}, ::Duplicated{CuArray{…}})
    @ Enzyme /mnt/.julia/packages/Enzyme/TiboG/src/Enzyme.jl:332
 [29] top-level scope
    @ REPL[9]:7
 [30] top-level scope
    @ none:1
Some type information was truncated. Use `show(err)` to see complete types.

I will reopen the LuxLib issue, I missed this dispatch. You can close this issue if you want.

@wsmoses wsmoses closed this as completed Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants