-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autograd.Sparse type causes regression #114
Comments
Ekin: sparse gradients are a big boost to some models, e.g. ones that use word embeddings with large vocabularies. Without sparse gradients, the gradient would have to be the same size as the whole embedding matrix even though you only want to update a few columns. However I cannot support all possible array operations for this Sparse type without significant effort. I supported the ones I used internally for update! etc., we can add others as needed (your + seems to be an easy one, we have to decide whether you want the result to be sparse or dense). In the meantime you can simply use full(grad(x,y)) to get a regular array.
|
Let say your gpu cannot handle a task with batchsize = 32. However, you want to simulate same training. One way accomplish this using batchsize=8 and averaging the gradients through 4 iteration. This is where I got the error. I hope this helps.
________________________________
From: denizyuret <[email protected]>
Sent: Wednesday, October 2, 2019 1:16 AM
To: denizyuret/AutoGrad.jl
Cc: Ekin Akyürek; Author
Subject: Re: [denizyuret/AutoGrad.jl] Autograd.Sparse type causes regression (#114)
I understand adding gradients to parameters but why would you add two
gradients together?
On Tue, Oct 1, 2019 at 9:17 PM Ekin Akyürek ***@***.***> wrote:
Earlier, I could accumulate my gradients across iterations. However,
recent changes in AutoGrad break it. because I can't sum two gradient now.
There can be other issues with this type which I didn't test. In general, I
believe one should get a gradient which is capable of everything that the
corresponding parameter type can do it.
julia> function foo(w)
return w[1][1]+w[2][1]
end
foo (generic function with 1 method)
julia> w = [param(3,3),param(3,3)]
2-element Array{Param{KnetArray{Float32,2}},1}:
P(KnetArray{Float32,2}(3,3))
P(KnetArray{Float32,2}(3,3))
julia> J = @diff foo(w)
T(-0.32367945)
julia> grad(J,w[1])
Sparse(KnetArray{Float32,2}(3,3)())
julia> grad(J,w[1]) + grad(J,w[2])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
+(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
+(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
+(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
[1] top-level scope at REPL[28]:1
julia> grad(J,w[1]) + grad(J,w[1])
ERROR: MethodError: +(::AutoGrad.Sparse{Float32,2}, ::AutoGrad.Sparse{Float32,2}) is ambiguous. Candidates:
+(a::AbstractArray, s::AutoGrad.Sparse) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:73
+(s::AutoGrad.Sparse, a::AbstractArray) in AutoGrad at /home/gridsan/eakyurek/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:74
Possible fix, define
+(::AutoGrad.Sparse, ::AutoGrad.Sparse)
Stacktrace:
[1] top-level scope at REPL[29]:1
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#114?email_source=notifications&email_token=AAN43JTJRUYR6L5ZYSUOSSLQMOH4TA5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HO5T6RA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAN43JQWTA67GE3ZVLZXTZDQMOH4TANCNFSM4I4NCDRQ>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#114?email_source=notifications&email_token=ADVGX4JP2W76GNB4U6I3V2DQMQVCPA5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEADSK7A#issuecomment-537339260>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADVGX4MXJAFOAUYOLQV4AGTQMQVCPANCNFSM4I4NCDRQ>.
|
yeah, |
I will make + work as well.
…On Fri, Oct 4, 2019, 3:47 AM Ekin Akyürek ***@***.***> wrote:
yeah, full works for me! Though, the problematic thing about this
interface is that you don't know what will your gradient type be in advance.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#114?email_source=notifications&email_token=AAN43JXNEPCDVQ3YVTAYTXLQM2HD5A5CNFSM4I4NCDR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKAQEI#issuecomment-538183697>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAN43JVPUBSMCO4ZZC4P2CDQM2HD5ANCNFSM4I4NCDRQ>
.
|
I realize that it has broken the Knet. When you have Adam optimizer with gclip and you get a Sparse gradient, the gclip fails in this case. |
I can't replicate, the following works fine. Please provide a minimal example.
|
dy/sparsebugs branch has implemented + for two Sparse values, please test. |
Although, I didn't run your example, I believe you didn't get the error because your gradients doesn't exceed the gclip value. Here is a simpler example you can replicate without downloading anything. julia> using Knet
julia> function foo(w)
s = 0.0
for i=1:length(w); s+=w[i]; end
return s
end
foo (generic function with 1 method)
julia> w = Param(randn(2,2))
2×2 Param{Array{Float64,2}}:
0.427868 0.657678
-0.332868 -1.50003
julia> J = @diff foo(w)
T(-0.7473544438700652)
julia> update!(value(w), grad(J,w), Adam(gclip=0.1))
ERROR: MethodError: lmul!(::Float64, ::AutoGrad.Sparse{Float64,2}) is ambiguous. Candidates:
lmul!(a, x::AutoGrad.Sparse{T,N}) where {T, N} in AutoGrad at /kuacc/users/eakyurek13/.julia/packages/AutoGrad/9MrCC/src/sparse.jl:51
lmul!(s::Number, X::AbstractArray) in LinearAlgebra at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/LinearAlgebra/src/generic.jl:100
Possible fix, define
lmul!(::Number, ::AutoGrad.Sparse{T,N})
Stacktrace:
[1] gclip!(::AutoGrad.Sparse{Float64,2}, ::Float64) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:613
[2] update!(::Array{Float64,2}, ::AutoGrad.Sparse{Float64,2}, ::Adam) at /kuacc/users/eakyurek13/.julia/packages/Knet/IIjk8/src/update.jl:537
[3] top-level scope at REPL[6]:1
|
You are right, it was an ambiguity issue. I will create a PR now. |
Fixed in current master. |
Earlier, I could accumulate my gradients across iterations. However, recent changes in AutoGrad break it, because I can't sum two gradient array when they are AutoGrad.Sparse. There can be other issues with this type which I didn't test yet. In general, I believe one should get a gradient which is capable of everything that the corresponding parameter type can do.
The text was updated successfully, but these errors were encountered: