-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex gradient on real function with complex intermediates #342
Comments
FWIW, Tracker and ForwardDiff do what I'd expect: julia> using Tracker, ForwardDiff
julia> Tracker.gradient(x->imag(x + 2.0*im), 3.0)[1]
0.0 (tracked)
julia> ForwardDiff.derivative(x->imag(x + 2.0*im), 3.0)
0.0 |
For anyone coming across this, a workaround is to directly call julia> Zygote.gradient(x->imag(x + 2.0*im), 3.0)
(0.0 + 1.0im,)
julia> Zygote.gradient(x->imag(complex(x, 2.0)), 3.0)
(0.0,) |
Yes, this is expected. This is the mathematically correct result; otherwise the result would change if you asked for |
Are we sure this is the desired behaviour? Promoting I was scribbling somewhere an example of adding scalars to Pauli matrices, in which it would clearly be crazy to return a matrix-valued gradient for a scalar |
So do Tracker and ReverseDiff do the wrong thing here, or is there a fundamental difference between how these packages interpret complex sensitivities? I don't quite follow the above reasoning. While it may be that If this remains the behavior, it would be nice to have this made explicit in the docs, with a recommendation to use something like |
From Zygote's perspective, there is no fundamental difference between ints and floats; they both represent different (but equally finite and discrete) sets of points on the real line. If a gradient doesn't have a fractional part, it's legitimate to represent it with an
We have to have a default representation of complex numbers to use in the standard library, and that default happens to be This is not the only valid behaviour, but it is actually the one with the least special cases as far as implementing Zygote goes. |
Tracker and ReverseDiff are both self-consistent, insofar as they don't really support complex AD; you're actually differentiating a slightly different real->real function. Another way to look at this is that, to the extent an F64 is a "sparse"/efficient representation of a ComplexF64, the imaginary component is "fixed" to zero rather than just coincidentally being zero (see #163 for more discussion). Having Zygote's semantics change based on input type could definitely cause problems. For hand-written examples it's not a big deal, but in more complex code you might dynamically choose the type of an object without realising that this changes how your AD works down the line. That may or may not be an acceptable tradeoff for simplicity in some other use cases. |
I agree completely about Int/float, but am not sure that imaginary is the same. Here's my example, which has 3 representations of an
where
Now we can calculate gradients, and since
This imaginary direction is not wrong, in the sense that if we walk that way, then we can indeed tunnel out of the bottom of this parabola! And of course this is true in either representation:
But that's not the question which we asked. I think it would be reasonable to insist that you give input And further, to argue that
In this toy example, taking the real part of the gradients right before |
The difference between It's annoying to have that semantic subtlety, but it's also pretty fundamental; the same would be true for a custom float type implemented by bit operations. It would be completely non-differentiable until it's declared as a number. We will always have to choose which set of types gets natural / mathematical derivatives vs. structural / element-wise ones. Calling I take the point about complex numbers being higher-dimensional than real ones; this makes the issue closer to the sparse array one than to int/float. I think dimensionality may be a useful (if imperfect) heuristic for deciding what should be natural vs. structural, but the implementation difficulty remains. |
Thanks for the sparse link BTW, lots of intersting issues there. I guess my verbose example is trying to argue that problems involving gradients of real-valued loss functions are intrinsically over Will think some more. Would be interested to know of a pathological example where |
OK maybe this is simpler than expected: see what you think of That change to |
It's not ideal to capture all of the original arguments in the pullback that way (memory usage is an issue). Also, this won't work for things like |
I came across something odd while working with complex numbers:
While the inputs and outputs of both functions are real, the first produces a real gradient, while the second produces a complex gradient. Is this Zygote's intended behavior?
Version info:
The text was updated successfully, but these errors were encountered: