Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AD for parameters #175

Merged
merged 3 commits into from
Sep 17, 2023
Merged

Fix AD for parameters #175

merged 3 commits into from
Sep 17, 2023

Conversation

lxvm
Copy link
Collaborator

@lxvm lxvm commented Sep 16, 2023

This pr should fix #99 but in writing this I ran into a lot of internal inconsistencies in Integrals.jl as to how it handles the behavior of nout, batch, and inplace integration (as defined in the docs for IntegralProblem). Moreover, there are two ambiguities that mean I can't make autodiff type stable:

  1. When nout = 1, batch = 0, and not inplace, it is unclear whether the integrand returns a scalar or a vector of length one
  2. When nout = 1 and not inplace, it is unclear if the algorithm should return a scalar or a vector of length 1. In fact the different algorithms make different choices.

This pr also creates breaking changes in IntegralsCubature.jl because a) it did not follow the interface in the docs (i.e. in places it used adjoints to turn vectors into matrices where it was not supposed to) and b) I changed the default choice regarding point 2. above to return a scalar when nout = 1.

In light of these points, I advocate for merging this pr, and also making the following changes in the next breaking release of Integrals.jl

  • Features of inplace integrands, nout, and batch should be replaced by specialized integrand wrappers that allow the user to provide the output (and input) arrays of the correct Julia types, which can be arbitrary. (I believe this is the right abstraction and have implemented it in https://github.com/lxvm/AutoBZCore.jl)
  • It seems like IntegralSolutions are treated like arrays, so we should decide how to interpret the output of algorithms in terms of this so we can write type-stable pullbacks.
  • The C library wrappers (i.e. IntegralsCuba.jl and IntegralsCubature.jl), which have restricted integrand types of floats or float arrays, may need their own AD package extensions if we would like to implement more efficient pullbacks for pure Julia packages with more generic types. For example, it would be nice to calculate the integral of a function over an ellipse and calculate its partials w.r.t. the semiaxes of the ellipse. I don't see anything preventing this in a pure Julia implementation, but the code may have to be different from the C code.
  • Is it possible to replace the sensealg argument with different AD backends provided by AbstractDifferentiation.jl?

Thanks in advance for your help.

@ArnoStrouwen
Copy link
Member

  • Features of inplace integrands, nout, and batch should be replaced by specialized integrand wrappers that allow the user to provide the output (and input) arrays of the correct Julia types, which can be arbitrary. (I believe this is the right abstraction and have implemented it in https://github.com/lxvm/AutoBZCore.jl)

Yes, nout and maybe batch needs to be removed, and replaced with a prototype container passed by the user.

  • It seems like IntegralSolutions are treated like arrays, so we should decide how to interpret the output of algorithms in terms of this so we can write type-stable pullbacks.

I would try to make the behaviour of the solutions as similar as possible to how this is handled for AD of ODEProblem.

  • The C library wrappers (i.e. IntegralsCuba.jl and IntegralsCubature.jl), which have restricted integrand types of floats or float arrays, may need their own AD package extensions if we would like to implement more efficient pullbacks for pure Julia packages with more generic types. For example, it would be nice to calculate the integral of a function over an ellipse and calculate its partials w.r.t. the semiaxes of the ellipse. I don't see anything preventing this in a pure Julia implementation, but the code may have to be different from the C code.

I'm not sure if I understand what aspect of the ellipse is being nudged here. Do you mean how the integral would change if one of the axes were to change in length a bit? If so, indeed we need better AD for handling the boundaries of the integration domain. Getting it working for the bounds of a box would already be a great improvement.

  • Is it possible to replace the sensealg argument with different AD backends provided by AbstractDifferentiation.jl?

I would hold off on this until AbstractDifferentiation is more mature. I don't think any of the other SciML solver packages made the switch already?

@ChrisRackauckas
Copy link
Member

I would hold off on this until AbstractDifferentiation is more mature. I don't think any of the other SciML solver packages made the switch already?

It's not ready. It doesn't cache.

@ChrisRackauckas
Copy link
Member

Yes, nout and maybe batch needs to be removed, and replaced with a prototype container passed by the user.

Agreed, that interface just needs to change.

@ChrisRackauckas
Copy link
Member

Features of inplace integrands, nout, and batch should be replaced by specialized integrand wrappers that allow the user to provide the output (and input) arrays of the correct Julia types, which can be arbitrary. (I believe this is the right abstraction and have implemented it in https://github.com/lxvm/AutoBZCore.jl)

Yeah this is as discussed in #169. We should change the interface, just no one has gotten around to doing it.

It seems like IntegralSolutions are treated like arrays, so we should decide how to interpret the output of algorithms in terms of this so we can write type-stable pullbacks.

It just needs a few overloads like https://github.com/SciML/SciMLBase.jl/blob/master/src/solutions/zygote.jl

@@ -29,7 +29,8 @@ function Integrals.__solvebp(cache, alg, sensealg, lb, ub,
dfdp = function (out, x, p)
dualp = reinterpret(ForwardDiff.Dual{T, V, P}, p)
if cache.batch > 0
dx = similar(dualp, cache.nout, size(x, 2))
dx = cache.nout == 1 ? similar(dualp, size(x, ndims(x))) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't the nout = 1 case not be able to use similar because it could be scalar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since p is an array (see function signature), so are rawp (look below) and dualp, so I think similar will be defined.

Comment on lines +58 to +62
# messy, there are 4 cases, some better in forward mode than reverse
# 1: length(y) == 1 and length(p) == 1
# 2: length(y) > 1 and length(p) == 1
# 3: length(y) == 1 and length(p) > 1
# 4: length(y) > 1 and length(p) > 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment is here for? I mean, agreed these are the 4 cases and sometimes forward is better than reverse, but I don't understand why that's here 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I just left that as a note to myself and I can remove it. The lines of code below need all the if statements to handle these cases, so I found it helpful to list them

Comment on lines 127 to 130
# Zygote.@adjoint function Zygote.literal_getproperty(sol::SciMLBase.IntegralSolution,
# ::Val{:u})
# sol.u, Δ -> (SciMLBase.build_solution(sol.prob, sol.alg, Δ, sol.resid),)
# end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wanted to see if removing it broke any tests. It didn't, but I forgot to add it back. Is it used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That just means a test didn't use sol.u. This is necessary for if the user getfield's on u. It should be added back for that case, preferably with a test since that seems to have been missed.

@ChrisRackauckas
Copy link
Member

In general looks great, just a few comments.

@lxvm
Copy link
Collaborator Author

lxvm commented Sep 16, 2023

I would try to make the behaviour of the solutions as similar as possible to how this is handled for AD of ODEProblem.

Sounds good

I'm not sure if I understand what aspect of the ellipse is being nudged here.

You're right about what I was suggesting, e.g. looking at the sensitivity of the integral to making the ellipse infinitesimally flatter/taller. This would require an interface change (e.g. #160) where we replace [lb, ub] by a dom::HyperCube in the IntegralProblem.

Getting it working for the bounds of a box would already be a great improvement.

I can add limit differentiation to this pr by the method of evaluating integrals on the faces of the hypercube. I'll explain below that this does not generalize to other types of domains. Do you think I should add to this pr or make another?

To handle ellipses, manifolds, implicit domains (e.g. level sets) etc... we could relax the type assertion dom::HyperCube, but then not all algorithms will support all domains, which I think is OK. As a further step, to do AD we have two choices:

  1. Compute the integral of the function times the change in volume at the boundary. This is difficult (unless it's a hypercube) because it requires a parametrization of the boundary and an algorithm that can integrate it (possibly different from the original algorithm, since the dimension is different).
  2. Use Stoke's theorem, where we calculate the integral of the function times the change in volume at the boundary as an integral over the domain of the derivative of the boundary function (which I haven't defined) extended to the domain. This is preferable since it integrates over the same domain and so it can use the original algorithm. I'm not sure what the best way to define/compute the extension of the boundary derivative to the domain is. Is the boundary gradient the projection of the gradient of the boundary w.r.t p onto the normal vector of the boundary? Can we extend it to the domain by solving Laplace's equation with Dirichlet boundary data? Does SciML have solvers for this?

@lxvm
Copy link
Collaborator Author

lxvm commented Sep 16, 2023

Yeah this is as discussed in #169. We should change the interface, just no one has gotten around to doing it.

Should we add these prototype containers to Integrals.jl or SciMLBase.jl?

@ChrisRackauckas
Copy link
Member

Should we add these prototype containers to Integrals.jl or SciMLBase.jl?

SciMLBase.

@ChrisRackauckas ChrisRackauckas merged commit 823046c into SciML:master Sep 17, 2023
@lxvm lxvm deleted the autodiff branch December 30, 2023 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zygote tutorial broken
3 participants