Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable adjoint method #3

Merged
merged 5 commits into from
Feb 29, 2020
Merged

Enable adjoint method #3

merged 5 commits into from
Feb 29, 2020

Conversation

eozd
Copy link
Contributor

@eozd eozd commented Feb 29, 2020

Fixes #2

This PR proposes a way to make the adjoint method work with tensorflow custom_gradient interface. The main changes are in tfdiffeq/adjoint.py and can be summarized as:

  1. Don't pass the ODE parameters to OdeintAdjointMethod function. We instead get these parameters from the variables keyword argument of grad function.
  2. tf.custom_gradient requires grad function to return two sets of gradients as a pair. These are
    i. The gradient with respect to the inputs of OdeintAdjointMethod. These are x0 and t in our case.
    ii. The gradient with respect to the parameters which are tf.Variable objects stored in our ODE object.
  3. To prevent getting all the tf.Variable objects created in adams optimizer, we mark them as non-trainable. However, there still seems

Caveats: I wasn't able to make the method work with the adams method (therefore adams - adjoint test is not enabled either). The problem is that the elements of the tuple returned from augmented_dynamics function have different shapes, and this causes problems with adams.py:138

eozd added 5 commits February 29, 2020 10:42
tensorflow custom_gradient decorator requires the
grad function to return the gradients as a pair.
The first value should contain the gradients of
all the inputs passed to the function (in our
case this is x0 and t). The second element must
contain the gradients for the model parameters
which are tf.Variable objects stored in the ODE
object. We don't use these params in the function
interface; instead tf passes all the trainable
parameters related to our method in variables
keyword argument.
Adams method for some reason crashes. Maybe this
can be fixed in a later revision
@titu1994
Copy link
Owner

This solution is ingenious ! I completely missed that I can recover the parameters from variables. Thank you so very much for your help with this.

As to adams-bashforth implementation, it seems there are certain issues with the current implementation, which I am closely following in the pytorch discussions.

As dopri tests pass, I will be glad to merge this PR upon your go ahead.

@eozd
Copy link
Contributor Author

eozd commented Feb 29, 2020

If the idea looks good to you, then by any means please go ahead. By the way, I would also like to thank you for the original implementation. As I will be working with tfdiffeq in the immediate future I will make sure to post any issues I may find with the changes I introduced.

@titu1994 titu1994 merged commit f0b4550 into titu1994:master Feb 29, 2020
@titu1994
Copy link
Owner

Merged ! I do advise wrapping the callable portion of the ode function call(u,t) inside a tf.function block to see some noticeable speedups. There's some performance bottlenecks I'd like to look into, and hopefully somehow implement the universal ordinary differential equations paper in the future, if I ever get to parse the Julia codebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possibilities for Including the Adjoint Method
2 participants