Kernels fail on CPU when waiting on kernels that allocate shared memory #55

leios · 2020-03-06T21:05:30Z

Here is a snippet of code

using KernelAbstractions, Test, CUDAapi
if CUDAapi.has_cuda_gpu()
    using CuArrays
    CuArrays.allowscalar(false)
end

@kernel function localmem_check!(a, @Const(TDIM))
    T = eltype(a)
    #i = @index(Global)

    # Fails when waiting on event
    # tile = @localmem(T, (TDIM+1, TDIM))

    # succeeds
    tile = @localmem(Float32, (TDIM+1, TDIM))

end

# creating wrapper functions
function launch_localmem_check(a)
    TDIM = 32
    if isa(a, Array)
        kernel! = localmem_check!(CPU(),4)
    else
        kernel! = localmem_check!(CUDA(),256)
    end
    kernel!(a, TDIM, ndrange=(TDIM, TDIM))
end

function main()

    a = zeros(32, 32)
    ev = launch_localmem_check(a)
    wait(ev)

    if has_cuda_gpu()
        d_a = CuArray(a)

        launch_localmem_check(d_a)
    end

    return nothing
end

main()

It provides the following backtrace if you wait(ev) or allocate shared memory with a dynamic type:

ERROR: LoadError: TaskFailedException:
TaskFailedException:
UndefVarError: T not defined
Stacktrace:
 [1] cpu_localmem_check! at /home/jars/projects/KernelAbstractions.jl/src/KernelAbstractions.jl:96 [inlined]
 [2] (::KernelAbstractions.var"#25#28"{Cassette.Context{nametype(CPUCtx),KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize,true,CartesianIndex{2},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},KernelAbstractions.NDIteration.NDRange{2,KernelAbstractions.NDIteration.DynamicSize,KernelAbstractions.NDIteration.StaticSize{(4, 1)},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Nothing}},Nothing,KernelAbstractions.var"##PassType#455",Nothing,Cassette.DisableHooks},KernelAbstractions.Kernel{CPU,KernelAbstractions.NDIteration.StaticSize{(4,)},KernelAbstractions.NDIteration.DynamicSize,typeof(cpu_localmem_check!)},Tuple{Array{Float64,2},Int64}})() at ./threadingconstructs.jl:126

...and 255 more exception(s).

Stacktrace:
 [1] sync_end(::Array{Any,1}) at ./task.jl:316
 [2] macro expansion at ./task.jl:335 [inlined]
 [3] macro expansion at /home/jars/projects/KernelAbstractions.jl/src/backends/cpu.jl:64 [inlined]
 [4] (::KernelAbstractions.var"#23#26"{KernelAbstractions.Kernel{CPU,KernelAbstractions.NDIteration.StaticSize{(4,)},KernelAbstractions.NDIteration.DynamicSize,typeof(cpu_localmem_check!)},Tuple{Int64,Int64},KernelAbstractions.NDIteration.NDRange{2,KernelAbstractions.NDIteration.DynamicSize,KernelAbstractions.NDIteration.StaticSize{(4, 1)},CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}},Nothing},Tuple{Array{Float64,2},Int64},Nothing})() at ./threadingconstructs.jl:126
Stacktrace:
 [1] wait at ./task.jl:267 [inlined]
 [2] wait(::CPU, ::KernelAbstractions.CPUEvent, ::Nothing) at /home/jars/projects/KernelAbstractions.jl/src/backends/cpu.jl:14
 [3] wait at /home/jars/projects/KernelAbstractions.jl/src/backends/cpu.jl:9 [inlined] (repeats 2 times)
 [4] main() at /home/jars/projects/KernelAbstractions.jl/examples/localmem_example.jl:30
 [5] top-level scope at /home/jars/projects/KernelAbstractions.jl/examples/localmem_example.jl:41
 [6] include(::String) at ./client.jl:439
 [7] top-level scope at REPL[10]:1
in expression starting at /home/jars/projects/KernelAbstractions.jl/examples/localmem_example.jl:41

The text was updated successfully, but these errors were encountered:

vchuravy · 2020-03-06T22:04:37Z

duplicate of #13, needs better docs though.

vchuravy added the documentation Improvements or additions to documentation label Mar 17, 2020

vchuravy closed this as completed May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernels fail on CPU when waiting on kernels that allocate shared memory #55

Kernels fail on CPU when waiting on kernels that allocate shared memory #55

leios commented Mar 6, 2020

vchuravy commented Mar 6, 2020

Kernels fail on CPU when waiting on kernels that allocate shared memory #55

Kernels fail on CPU when waiting on kernels that allocate shared memory #55

Comments

leios commented Mar 6, 2020

vchuravy commented Mar 6, 2020