-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define chain rules for complex functions #167
Conversation
Is this the outcome of #159 ? |
I would say it's the outcome of https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317 and JuliaDiff/ChainRules.jl#196. However, these are open-ended discussions where it can be hard to determine whether or not consensus has been reached. One of the goals of this PR is to make sure that this discussion does eventually come to a conclusion. |
Co-authored-by: willtebbutt <[email protected]>
Co-authored-by: willtebbutt <[email protected]>
It'd be nice to be able to see the rendered version of this addition. But for some reason, the preview docs are not being built (http://www.juliadiff.org/ChainRulesCore.jl/preview-PR167) -- anyone know why? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I left some comments where I think some things can be clarified a bit. Otherwise this is great!
docs/src/FAQ.md
Outdated
|
||
## How do chain rules work for complex functions? | ||
|
||
`ChainRules.jl` follows the convention that `frule` applied to a function ``f(x + i y) = u(x,y) + i \, v(x,y)`` with perturbation ``\dot x + i \dot y`` returns the value and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is too general of a statement. The way you describe it only really applies for frule
if your input is real or rrule
if your eventual output is real. I find it quite useful to think of forward-mode AD as propagating a perturbation in the input and getting out a derivative wrt to that input and reverse-mode as propagating a sensitivity in the output and getting out a vector of directional derivatives. Those perturbations/sensitivities can be complex, but not the input/output itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this FAQ is to point out two things:
frule
andrrule
are always defined (e.g. even for non-holomorphicComplex -> Complex
functions).- These definitions are sufficient to compute all quantities of interest.
I've added a bit more details to the documentation, I hope this helps to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess your comment arose because this convention for complex frule
and rrule
leads to a really awkward API for e.g. extracting Jacobians. This is definitely an issue, but the right place to fix this is probably in another package built on top of ChainRules
, not in ChainRules
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These definitions are sufficient to compite all quantities of interest.
Depends on what you are interested in. Like I said above, only if the input/output is real will you get a meaningful result. For C -> C
you would either want the Wirtinger primal and conjugate or a 2x2 Jacobian, which we don't currently support. We should clarify that, so that people don't get any wrong impressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a note at the very end which I hope addresses your concern. Let me know what you think.
docs/src/FAQ.md
Outdated
\tfrac{\partial u}{\partial x} \, \dot x + \tfrac{\partial u}{\partial y} \, \dot y + i \, \Bigl( \tfrac{\partial v}{\partial x} \, \dot x + \tfrac{\partial v}{\partial y} \, \dot y \Bigr) | ||
, | ||
``` | ||
and similarly `rrule` applied to the same function evaluates the function and returns a pullback which, when applied to adjoint ``\bar u + i \bar v``, returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think \bar u
is a bit confusing notation here, because it's usually used for conjugation. What do you think of \mathrm{d}u
instead, since you are basically pulling back differential one-forms? The notation should probably also be explained briefly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I replaced both \bar
and \dot
with \Delta
. This notation (and the earlier dot
and bar
notation) are documented further up on this FAQ page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to use \Delta
for both, frule
and rrule
, because one lives in the tangent plane, whereas the other one lives in the cotangent plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the notation proposed by the ChainRules
documentation. Waiting for input from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking point on this! Most of my comments are notational. Would it also be useful to address the oft-debated conjugation convention implied by these definitions? When representing all complex numbers as vectors of real numbers, it's difficult at least for me to work out how that relates to the two different discussed conjugation conventions. Put another way, how do these conventions relate to those used by FDM and Zygote (e.g. as documented here: #159 (comment))
Co-authored-by: Seth Axen <[email protected]>
Co-authored-by: Seth Axen <[email protected]>
I believe I resolved all the requested changes, so AFAICT this is ready to be merged. Also, I deleted the multidimensional example again, for several interrelated reasons:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me.
So if I understand correctly, this is saying we do not take the conjugate in reverse diff, right? Is that direction that we want ChainRules to take? I don't have a super strong feeling either way, though @MikeInnes might. Either way, we need to make it much more explicit which convention we follow.
docs/src/FAQ.md
Outdated
\Delta u \, \tfrac{\partial u}{\partial x} + \Delta v \, \tfrac{\partial v}{\partial x} + i \, \Bigl(\Delta u \, \tfrac{\partial u }{\partial y} + \Delta v \, \tfrac{\partial v}{\partial y} \Bigr) | ||
. | ||
``` | ||
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then these rules correspond to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then these rules correspond to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to | |
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then the `frule` (`rrule`) corresponds to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, but "the frule
" sounds weird to my (nonnative) ear. Removed the "the".
Unless I have misunderstood something, the convention proposed here is the same as Zygote's and FiniteDifference's. @ettersi I thought it was clear in the matrix section earlier, but I agree it is less clear now what the convention is. Maybe it makes sense to bring it back? |
Co-authored-by: Mason Protter <[email protected]>
Doesn't this address the question about the conjugate? It's only the scalar case, but I guess it is clear that if we conjugate scalars than we also conjugate matrices. Btw, thanks to @MasonProtter for starting Taking Complex Autodiff Seriously in ChainRules. This is what got me started on this project. |
I think maybe we should add a sentance pointing out explicitly that the Jacobian matrix shown in the The fact that I missed that difference after explicitly looking for it makes me thing I might not be the only one. Thanks a lot for writing this up @ettersi! |
I changed the |
@ettersi is this ready to merge? And does anyone else have objections to merging? |
Co-authored-by: Mason Protter <[email protected]>
Co-authored-by: Mason Protter <[email protected]>
I'm happy for this to be merged! 🥳 |
Okay! Unless there are any objections before then, I'll merge tomorrow. |
and with that like over a dozen PRs will become unblocked! |
Or at least get closer to being unblocked! It's probably best to make the existing rules conform to these new standards before merging PRs with new rules. Then there are also open questions like https://github.com/JuliaDiff/ChainRules.jl/issues/212 that we haven't had to address until now. |
Thank you everyone involved in this, its been a journey! |
Adds a rigorous definition of how
frule
andrrule
are to be implemented for complex functions.