Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Covector Wrapper #165

Open
simeonschaub opened this issue May 25, 2020 · 1 comment
Open

Custom Covector Wrapper #165

simeonschaub opened this issue May 25, 2020 · 1 comment

Comments

@simeonschaub
Copy link
Member

What is becoming more and more apparent to me in light of #159 and #160 that neither Base.adjoint, nor Base.transpose are the right abstraction for covector differentials, so I have been tinkering a bit with the idea of rolling our own wrapper instead. To keep with adopting differential geometry terminology, we could name such a type OneForm, but I am open to a name that's more approachable to non-math folk. This could represent multiple different cases of directional derivatives.

  • Complex Numbers: We discovered that Wirtinger derivatives are quite difficult to deal with, so I think we should focus on getting R -> C^n and C^n -> R right first. Representing R->C^n as just complex vectors already works really well and I don't think we should change that. Zygote uses Adjoint for the C^n -> R, but this doesn't compose very well. The Problem here is that we want a real scalar product C^n x C^n = R^2n x R^2n -> R, so OneForm(v) takes in a seed as a complex vector and spits out a real number, i.e. the directional derivative wrt the seed. The conjugation that Zygote uses gives us a complex scalar product, which is not really that useful to us. If we allowed a OneForm wrapper to wrap complex numbers a well as complex vectors, we could define such a composition.
  • n-dimensional arrays: We could also allow OneForm to wrap n-dimensional arrays to represent differentials of functions T = R^(m1 x m2 x ... x mn) ->R. AFAIK Zygote currently isn't very consistent about handling these, I believe for matrices it usually takes the conjugate transpose, but not for higher dimensional arrays. This way we could also define a scalar product T x T -> R and it would compose well with forward diff.
  • Composite: Composite are also basically just real vectors with some additional information, so I think it would also make sense to have OneForm(::Composite) as well. That way we also have a nice Composite x Composite -> R scalar product relation

I sketched out a prototype in the sim/one_form branch, but I am still not quite happy with it. Uni currently keeps me fairly busy, but hope I can dedicate some time after next month. My worry is that ChainRules is already quite big and adopting something like this would have to touch pretty much everything, but I think it could be worth it. Do people generally agree with this solution, or should we go in a different direction? I know @willtebbutt is also interested in this, this doesn't yet solve the C -> C (holomorphic) case, but I believe it would let us handle the most common cases, I would love to hear other thoughts on this!

@ettersi
Copy link
Contributor

ettersi commented Jun 4, 2020

In the case of complex numbers, I am becoming more and more convinced that the issue is just a lack of rigorous definitions of what exactly the various derivative functions (frule, rrule, fdm, gradient, jacobian, etc.) should be computing for complex arguments. I am trying to start to remedy this here.

More generally, I am becoming more and more convinced of the "treat everything as R^n"-approach to chain rules (and "everything" here is meant to include things like Complex -> R^2, n1 x ... x nN Array -> R^(n1 * ... nN), struct with n scalar fields -> R^n). It might be slightly counter-intuitive at first in some specific cases (e.g. complex numbers), but once you get the hang of it it becomes super easy to remember and apply this rule consistently. Conceptual subtleties like whether a pile of numbers represents a functional / one-form, a gradient, etc., can be handled at a higher level where we can provide multiple functions for arranging the same pile of numbers in whichever way it is needed by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants