Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define chain rules for complex functions #167

Merged
merged 33 commits into from
Jun 26, 2020
Merged

Conversation

ettersi
Copy link
Contributor

@ettersi ettersi commented Jun 2, 2020

Adds a rigorous definition of how frule and rrule are to be implemented for complex functions.

@oxinabox
Copy link
Member

oxinabox commented Jun 2, 2020

Is this the outcome of #159 ?
I have not caught up with it yet.

@ettersi
Copy link
Contributor Author

ettersi commented Jun 3, 2020

I would say it's the outcome of https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317 and JuliaDiff/ChainRules.jl#196. However, these are open-ended discussions where it can be hard to determine whether or not consensus has been reached. One of the goals of this PR is to make sure that this discussion does eventually come to a conclusion.

docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
@nickrobinson251
Copy link
Contributor

It'd be nice to be able to see the rendered version of this addition. But for some reason, the preview docs are not being built (http://www.juliadiff.org/ChainRulesCore.jl/preview-PR167) -- anyone know why?

docs/src/FAQ.md Outdated Show resolved Hide resolved
Copy link
Member

@simeonschaub simeonschaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I left some comments where I think some things can be clarified a bit. Otherwise this is great!

docs/src/FAQ.md Outdated

## How do chain rules work for complex functions?

`ChainRules.jl` follows the convention that `frule` applied to a function ``f(x + i y) = u(x,y) + i \, v(x,y)`` with perturbation ``\dot x + i \dot y`` returns the value and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is too general of a statement. The way you describe it only really applies for frule if your input is real or rrule if your eventual output is real. I find it quite useful to think of forward-mode AD as propagating a perturbation in the input and getting out a derivative wrt to that input and reverse-mode as propagating a sensitivity in the output and getting out a vector of directional derivatives. Those perturbations/sensitivities can be complex, but not the input/output itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this FAQ is to point out two things:

  1. frule and rrule are always defined (e.g. even for non-holomorphic Complex -> Complex functions).
  2. These definitions are sufficient to compute all quantities of interest.

I've added a bit more details to the documentation, I hope this helps to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess your comment arose because this convention for complex frule and rrule leads to a really awkward API for e.g. extracting Jacobians. This is definitely an issue, but the right place to fix this is probably in another package built on top of ChainRules, not in ChainRules itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These definitions are sufficient to compite all quantities of interest.

Depends on what you are interested in. Like I said above, only if the input/output is real will you get a meaningful result. For C -> C you would either want the Wirtinger primal and conjugate or a 2x2 Jacobian, which we don't currently support. We should clarify that, so that people don't get any wrong impressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a note at the very end which I hope addresses your concern. Let me know what you think.

docs/src/FAQ.md Outdated
\tfrac{\partial u}{\partial x} \, \dot x + \tfrac{\partial u}{\partial y} \, \dot y + i \, \Bigl( \tfrac{\partial v}{\partial x} \, \dot x + \tfrac{\partial v}{\partial y} \, \dot y \Bigr)
,
```
and similarly `rrule` applied to the same function evaluates the function and returns a pullback which, when applied to adjoint ``\bar u + i \bar v``, returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think \bar u is a bit confusing notation here, because it's usually used for conjugation. What do you think of \mathrm{d}u instead, since you are basically pulling back differential one-forms? The notation should probably also be explained briefly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I replaced both \bar and \dot with \Delta. This notation (and the earlier dot and bar notation) are documented further up on this FAQ page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to use \Delta for both, frule and rrule, because one lives in the tangent plane, whereas the other one lives in the cotangent plane.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the notation proposed by the ChainRules documentation. Waiting for input from others.

docs/src/FAQ.md Outdated Show resolved Hide resolved
@nickrobinson251 nickrobinson251 added the Complex Differentiation Relating to any form of complex differentiation label Jun 17, 2020
Copy link
Member

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking point on this! Most of my comments are notational. Would it also be useful to address the oft-debated conjugation convention implied by these definitions? When representing all complex numbers as vectors of real numbers, it's difficult at least for me to work out how that relates to the two different discussed conjugation conventions. Put another way, how do these conventions relate to those used by FDM and Zygote (e.g. as documented here: #159 (comment))

docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
@ettersi
Copy link
Contributor Author

ettersi commented Jun 22, 2020

I believe I resolved all the requested changes, so AFAICT this is ready to be merged.

Also, I deleted the multidimensional example again, for several interrelated reasons:

  • It requires quite a bit of brain power to grasp what's going on, but doesn't help much in terms of clarifying the ideas (I think the 1d case is sufficient for this).
  • It may wrongly suggest that it rigorously defined complex differentiation in full generality, which it doesn't (see e.g. sethaxen's matrix product example, which isn't really covered ).

Copy link
Member

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks again!

Copy link
Contributor

@MasonProtter MasonProtter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me.

So if I understand correctly, this is saying we do not take the conjugate in reverse diff, right? Is that direction that we want ChainRules to take? I don't have a super strong feeling either way, though @MikeInnes might. Either way, we need to make it much more explicit which convention we follow.

docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated
\Delta u \, \tfrac{\partial u}{\partial x} + \Delta v \, \tfrac{\partial v}{\partial x} + i \, \Bigl(\Delta u \, \tfrac{\partial u }{\partial y} + \Delta v \, \tfrac{\partial v}{\partial y} \Bigr)
.
```
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then these rules correspond to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then these rules correspond to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to
If we interpret complex numbers as vectors in ``\mathbb{R}^2``, then the `frule` (`rrule`) corresponds to multiplication with the (transposed) Jacobian of ``f(z)``, i.e. `frule` corresponds to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, but "the frule" sounds weird to my (nonnative) ear. Removed the "the".

@sethaxen
Copy link
Member

sethaxen commented Jun 23, 2020

So if I understand correctly, this is saying we do not take the conjugate in reverse diff, right? Is that direction that we want ChainRules to take? I don't have a super strong feeling either way, though @MikeInnes might. Either way, we need to make it much more explicit which convention we follow.

Unless I have misunderstood something, the convention proposed here is the same as Zygote's and FiniteDifference's. @ettersi I thought it was clear in the matrix section earlier, but I agree it is less clear now what the convention is. Maybe it makes sense to bring it back?

Co-authored-by: Mason Protter <[email protected]>
@ettersi
Copy link
Contributor Author

ettersi commented Jun 23, 2020

the derivative part of rrule can be implemented as conj(f'(z)) \Delta f

Doesn't this address the question about the conjugate? It's only the scalar case, but I guess it is clear that if we conjugate scalars than we also conjugate matrices.

Btw, thanks to @MasonProtter for starting Taking Complex Autodiff Seriously in ChainRules. This is what got me started on this project.

docs/src/FAQ.md Outdated Show resolved Hide resolved
@MasonProtter
Copy link
Contributor

MasonProtter commented Jun 23, 2020

I think maybe we should add a sentance pointing out explicitly that the Jacobian matrix shown in the rrule is transposed relative to the Jacobian for frule and that transpose is what encodes the complex conjugate.

The fact that I missed that difference after explicitly looking for it makes me thing I might not be the only one.

Thanks a lot for writing this up @ettersi!

@ettersi
Copy link
Contributor Author

ettersi commented Jun 23, 2020

I think maybe we should add a sentance pointing out explicitly that the Jacobian matrix shown in the rrule is transposed relative to the Jacobian for frule and that transpose is what encodes the complex conjugate.

I changed the rrule formula to ^T instead of writing out the transpose explicitly. Hope this makes it clearer.

@sethaxen
Copy link
Member

@ettersi is this ready to merge? And does anyone else have objections to merging?

docs/src/FAQ.md Outdated Show resolved Hide resolved
docs/src/FAQ.md Outdated Show resolved Hide resolved
@ettersi
Copy link
Contributor Author

ettersi commented Jun 25, 2020

I'm happy for this to be merged! 🥳

@sethaxen
Copy link
Member

Okay! Unless there are any objections before then, I'll merge tomorrow.

@oxinabox
Copy link
Member

and with that like over a dozen PRs will become unblocked!

@sethaxen
Copy link
Member

and with that like over a dozen PRs will become unblocked!

Or at least get closer to being unblocked! It's probably best to make the existing rules conform to these new standards before merging PRs with new rules. Then there are also open questions like https://github.com/JuliaDiff/ChainRules.jl/issues/212 that we haven't had to address until now.

@sethaxen sethaxen merged commit d187432 into JuliaDiff:master Jun 26, 2020
@oxinabox
Copy link
Member

Thank you everyone involved in this, its been a journey!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complex Differentiation Relating to any form of complex differentiation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants