-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove rules for conj
, adjoint
, and transpose
#67
Remove rules for conj
, adjoint
, and transpose
#67
Conversation
Codecov Report
@@ Coverage Diff @@
## master #67 +/- ##
==========================================
- Coverage 93.89% 93.75% -0.15%
==========================================
Files 2 2
Lines 131 128 -3
==========================================
- Hits 123 120 -3
Misses 8 8
Continue to review full report at Codecov.
|
ForwardDiff test errors seem unrelated and the same that JuliaDiff/ForwardDiff.jl#544 tries to address. |
ModelingToolkit errors are the same as on the master branch: https://github.com/SciML/ModelingToolkit.jl/runs/3563293020 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @devmotion - looks good to me!
Is there a short explanation of why these are wrong? Or why they lead ReverseDiff to the wrong answer? I tried to read the thread but don't follow the details of its internals. There are no complex numbers involved. The rule does not apply to arrays, hence
yet you say "but the [rule] for transpose lead to a TrackedReal with a derivative of zero!". But why? ForwardDiff used to define rules for these (or at least for |
The short anwser is JuliaDiff/ReverseDiff.jl#183 (comment) 😛I guess my explanations were not completely clear though. I did not refer to derivatives of functions that involve I don't think this PR is problematic for ForwardDiff: for all these functions you want So in my opinion these rules are not helpful but actually cause problems in downstream packages, and hence it would be good to remove them (basically like the already removed rule for |
@shashi you really need to prioritize fixing MTK master. This has gone on for way too long. |
But how does my example not involve this? Presumably within a gradient call you will not get |
When you compute |
This still seems pretty weird to me. Is it clear to you when this is and isn't going to be triggered? Is it clear what properties other functions would have to have to trigger it?
|
I tried to explain this in the linked issue 🤷♂️ The problem occurs when we call Fixing indexing of |
Then it seems like the claim is that any array indexing which calls a function, for which a gradient is defined, may lead to problems? I tried a bit to trigger this with MappedArrays.jl but haven't managed to. Is this the zero gradient you refer to?
I don't oppose the quick fix, I'm just slightly disturbed by the ability for a mathematically not wrong rule to silently break something deep inside how it accumulates gradients. And wonder where else that can surface. |
I haven't actively looked for any similar issues, I can try MappedArrays as well when I'm back on my computer. I'm by far not a ReverseDiff expert but, of course, there are bugs and issues about incorrect gradients (as in every AD system I assume), so I wouldn't be surprised if it can be reproduced in a similar setting 🤷♂️ I disagree though, I don't think this is a quick fix. In my opinion this is the correct thing to do, regardless of ReverseDiff. The default definitions in base already do the correct thing for dual and tracked numbers, so the definitions are not needed - and as we see (and already saw with |
@ChrisRackauckas i'm fixing it. |
This seems to be the question. Is the sole purpose of this package is to provide rules for scalar operator-overloading AD? Is a rule for Or does "Many differentiation methods" in the readme include other things which may never see the Base definitions? Such as symbolic differentiation. Or generating rules for broadcasting. |
I don't know, I based my opinion on the integration tests and the removal of But I guess this is something that should be discussed in a different issue. Since you said you're not opposed to the PR, I guess you're fine with me going ahead and merging it? |
FYI I checked and with this PR all your examples above return the correct gradient. |
This PR removes the rules for
conj
,adjoint
, andtranspose
since they cause problems with ReverseDiff (#54 broke some tests foradjoint
; see JuliaDiff/ReverseDiff.jl#183 (comment) and the other comments there for details) and the default fallbacks in https://github.com/JuliaLang/julia/blob/c5f348726cebbe55e169d4d62225c2b1e587f497/base/number.jl#L211-L213 should be sufficient (similar to the discussion aboutidentity
in #64).I checked locally that the ReverseDiff issues are fixed by this PR.
Edit: I also added ReverseDiff to the integration tests.