-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add training benchmarking script #264
base: master
Are you sure you want to change the base?
Conversation
a5214a4
to
7dac4fc
Compare
This also looks good vs. my code. Only main difference I see is the use of data augmentation. It would be nice to see if that makes a meaningful difference. |
I think the Inception errors are because that family of models use an image size of 299x299, and I don't think they support alternate image sizes. AlexNet uses 224x224 and doesn't support anything else either. VGG needs a special imsize parameter passed to it to work for smaller image sizes, as does ViT. So those errors can at least be diagnosed at a glance. The ResNet family errors are weird. One set seems to be with the larger ResNet variants, which seems to be some sort of memory issue? Correct me if I'm wrong. The Res2Net variant was never tested on GPU unfortunately, which means most likely there is something in the code incompatible with GPU 😬. There's some "clever" code there which I think may not be as GPU compatible. |
I can confirm with #262 that the scalar indexing errors are real. Thanks for helping me corroborate those. For now, those models can be ignored here. The |
These work with arbitrary sizes, but there is a lower bound on how small an image they can handle. I guess 32x32 is too small.
The use of
Shortened stack trace (cut off the part that is irrelevant): julia> m(x)
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f1ca13fc010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007f1a2236dc30.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:106
ERROR: TaskFailedException
nested task error: TaskFailedException
nested task error: MethodError: no method matching gemm!(::Val{false}, ::Val{false}, ::Int64, ::Int64, ::Int64, ::Float32, ::CuPtr{Float32}, ::CuPtr{Float32}, ::Float32, ::CuPtr{Float32})
Closest candidates are:
gemm!(::Val, ::Val, ::Int64, ::Int64, ::Int64, ::Float32, ::Ptr{Float32}, ::Ptr{Float32}, ::Float32, ::Ptr{Float32})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/gemm.jl:29
gemm!(::Val, ::Val, ::Int64, ::Int64, ::Int64, ::Float64, ::Ptr{Float64}, ::Ptr{Float64}, ::Float64, ::Ptr{Float64})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/gemm.jl:29
gemm!(::Val, ::Val, ::Int64, ::Int64, ::Int64, ::ComplexF64, ::Ptr{ComplexF64}, ::Ptr{ComplexF64}, ::ComplexF64, ::Ptr{ComplexF64})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/gemm.jl:29
...
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/NNlib/sXmAj/src/impl/conv_im2col.jl:59 [inlined]
[2] (::NNlib.var"#647#648"{CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, Float32, Float32, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, SubArray{Float32, 5, Base.ReshapedArray{Float32, 5, SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, DenseConvDims{3, 3, 3, 6, 3}, Int64, Int64, Int64, UnitRange{Int64}, Int64})()
@ NNlib ./threadingconstructs.jl:416
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:445
[2] macro expansion
@ ./task.jl:477 [inlined]
[3] conv_im2col!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, Base.ReshapedArray{Float32, 5, SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; col::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, alpha::Float32, beta::Float32, ntasks::Int64)
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/impl/conv_im2col.jl:50
[4] conv_im2col!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, Base.ReshapedArray{Float32, 5, SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/impl/conv_im2col.jl:23
[5] (::NNlib.var"#305#309"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, DenseConvDims{3, 3, 3, 6, 3}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, SubArray{Float32, 5, Base.ReshapedArray{Float32, 5, SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}})()
@ NNlib ./threadingconstructs.jl:416
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:445
[2] macro expansion
@ ./task.jl:477 [inlined]
[3] conv!(out::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in1::Base.ReshapedArray{Float32, 5, SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, in2::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/conv.jl:205
[4] conv!
@ ~/.julia/packages/NNlib/sXmAj/src/conv.jl:185 [inlined]
[5] #conv!#264
@ ~/.julia/packages/NNlib/sXmAj/src/conv.jl:145 [inlined]
[6] conv!
@ ~/.julia/packages/NNlib/sXmAj/src/conv.jl:140 [inlined]
[7] conv(x::SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ NNlib ~/.julia/packages/NNlib/sXmAj/src/conv.jl:88
[8] conv
@ ~/.julia/packages/NNlib/sXmAj/src/conv.jl:83 [inlined]
[9] (::Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool})(x::SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false})
@ Flux ~/.julia/packages/Flux/jgpVj/src/layers/conv.jl:202
[10] macro expansion
@ ~/.julia/packages/Flux/jgpVj/src/layers/basic.jl:53 [inlined]
[11] _applychain
@ ~/.julia/packages/Flux/jgpVj/src/layers/basic.jl:53 [inlined]
[12] Chain
@ ~/.julia/packages/Flux/jgpVj/src/layers/basic.jl:51 [inlined]
[13] |>
@ ./operators.jl:907 [inlined]
[14] map (repeats 2 times)
@ ./tuple.jl:302 [inlined]
[15] (::Parallel{typeof(Metalhead.cat_channels), Tuple{MeanPool{2, 4}, Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}, Chain{Tuple{Conv{2, 4, typeof(identity), CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Bool}, BatchNorm{typeof(relu), CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}}}})(::SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, ::Vararg{SubArray{Float32, 4, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}})
@ Flux ~/.julia/packages/Flux/jgpVj/src/layers/basic.jl:541 |
As discussed in #198 (comment) I think it would be good to demonstrate that each of these are trainable on a generic dataset, and while doing so collect benchmark information.
I am happy to run this all locally, but want to collect feedback before doing so, in case these models have nuance that is worth taking into account.
The general approach here is to train each of the smallest variants of the models found in
/test