-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests on generic structs #343
Comments
The answer will be different for Flux and Lux, the former most likely requiring support for Functor.jl. |
This is what Lux and its derivative frameworks were designed to fix.
Lux uses a restrictive definition of |
I don't have a strong view on exactly what structs you should test, but I do know of several things that you will need to make decisions about, based on my experience helping @oxinabox design the tangent type system in ChainRules, and my experience with Tapir.jl. Firstly, there are a couple of edge cases that you'll probably want to actively ensure that people avoid in order to reduce the number of tests you have to write:
Additionally, you'll need to consider whether to follow ChainRules' v1 approach and be flexible regarding what type is used to represent the tangent of a given struct of a given type, or whether to go down the route that Tapir.jl and Enzyme.jl take of insisting on there being a unique tangent type for each primal type. If you choose the former, you massively blow up the interface surface that you'll have to test. Moreover, you run the risk of different AD backends giving different answers and them both technically being "correct". Personally, I would encourage you to take an opinionated view, and insist upon unique tangent types. I doubt it will matter too much what types you pick, but my experience is that being restrictive makes your life much easier. I hope the above is helpful. I'm very excited to see what we wind up with here! (Also, I'm on holiday at the minute, so I probably won't be super responsive to this thread until next week. Apologies in advance!) |
Thanks for your advice! |
@gdalle did you ever think any more about this? The release of 1.11 has prompted me to restart this discussion because While non-array like things are the correct thing to use internally in Mooncake, they're probably not what we want to be presenting to users. I'm keen to write some convenience functionality on my end to provide translations (for some types), but before doing that I would like to know what you would like in DI. For example, I'm reasonably sure we would agree that an acceptable type for the gradient of a function w.r.t.
Maybe a useful exercise would be to define for some specific types what the type of the result ought to be, and to clearly state which set of types DI has strong opinions on, and which it does not yet have strong opinions on. |
The goal of DI is to be as unopinionated as possible, so I probably won't be taking sides here. Think of DI as a fancy argument-passer, which returns whatever the backends return. There have been endless discussions on the meaning of derivatives when you're on a manifold, and this meaning differs between backends. From what I understand, ChainRules tries to preserve structure while Enzyme takes a more cartesian approach, so there is no universally right answer. If I try to unify return types for structured objects, I will definitely make a lot of people unhappy, and probably trash performance in the process. There are also differences on how every backend handles some fields in a struct. Some backends error on integers (ChainRules?), others just ignore them as inactive values (Enzyme?), others differentiate them fine (FiniteDiff?). Some backends even ignore numbers to differentiate only arrays (Tracker?). Similarly, some backends accept arbitrary tangent types, while other backends (Enzyme and Mooncake) are stricter. For the stricter ones, I implement automatic conversion, but not automatic structure adaptation. In other words, if DI is thoroughly tested with the standard TLDR: Everything is in place to differentiate non- |
Fair enough. In that case I'll ignore this issue until the upgrades are done, and figure out how to make everything work on the DI end when we get to it :) |
Does your new tangent type behave like an array? Can one index it, sum it, etc.? |
It almost certainly won't by default. edit: I say "almost" because I'm not 100% sure what the best choice is from Mooncake's perspective yet. |
Let's discuss it in compintell/Mooncake.jl#286? |
I remember this has been discussed at length in ChainRules as well, although I couldn't find the relevant link. I think one outcome of those discussion is ProjectTo . |
What kind of structs should we add to enable deep learning applications?
The text was updated successfully, but these errors were encountered: