-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fresh attempt at DI integration #54
Conversation
You're running DI v0.4.2 here, something is blocking it from updating. Can you add a v0.5.2 compat bound and see what it says? |
The error sounds like a type instability? |
@Vaibhavdixit02 when I replace the function inside |
Yup, I figured that. I spent some time reading up on the vector -> vector hessian discussion (and the background) you and Tim Holy were having, hoping to do this more rigorously but I don't see a natural way, asking users to pass a vector of functions is not ideal and feels unnatural to me - this will show up in multiobjective problems as well, even though that's typically majorly derivative free algorithms. I have some WIP locally this is moving slow because of other things right now, I will get back to it soon. |
So do you think you need the vector Hessian? |
No, not right now, I can work around that (as I have been doing in the implementations that exist here already) but it would be nice to have it in the future |
@Vaibhavdixit02 @ChrisRackauckas please do not drop the Enzyme extension in favor of DI here. This will lead to both more codes erring and suboptimal performance as DI will create both unnecessary closures among other issues. I would prefer for user code not to be made to break because of an unnecessary intermediate closure from an interface. See some comments about this tpapp/LogDensityProblemsAD.jl#29 (review) / tpapp/LogDensityProblemsAD.jl#29 (comment) Unfortunately when I commented "Closures were one of the reasons imo that AD.jl didn't take off, so long term I think DI.jl would be well served by both not introducing them itself (my understanding is that it does this successfully), but also not forcing users to create the same closures user-side (which would create similar issues)." but the response was "DI is unlikely to support either multiple arguments or activity specification anytime soon" |
I am aware of this limitation of DifferentiationInterface but I'm not sure it's obvious to others how much work would be required for multiple arguments or activity analysis. The reason we're able to detect bugs and suboptimalities in Enzyme before anyone else (see the 100x gradient slowdown from a month ago) is because we have a top tier test suite, which would need to grow by a factor of at least 2 to accommodate this new feature. It's 1k to 2k LOCs at least and a profound rethink, so even if I want to do it, it can't happen overnight. And since it's only essential for Enzyme, another option would be to provide the operators I need for DI on the Enzyme side. That would make it so I don't have to code every operator suboptimally with my imperfect understanding of your docs. |
Cross referencing for technical details |
And I also want to push back on the "unnecessary interface intermediate". DI fills a longstanding need in the ecosystem, especially for packages like Enzyme where the API is very hard to grasp. Not everyone is gonna be able or willing to use Enzyme directly, so I do believe there is value in even a suboptimal DI. Since I aim to support every backend and not just Enzyme, there's always gonna be a compromise on how far I support each one. I'm gonna look into activities, but I just need you to realize that it's easy to get it out quick and dirty, much harder to do it right |
Ah I apologize @gdalle I did not mean that to refer to DI as unnecessary (I think it is an amazing package and the work that you and Adrian are doing to make things easier to use is sorely needed). I meant only to refer to the intermediate closure that DI would generate for the hessian as unnecessary -- strictly in the technical sense (eg wrapping things into a closure isn't needed when you could call a multi argument version without the closure). I realize in retrospect the poor wording and I sincerely apologize. |
And I fully agree with you that it is a difficult thing to get multi argument and activities correct. However, replacing existing code that does not introduce these will create breakage so I don't think it wise for DI to replace existing codes with such support until DI vendors support as well. For the backends without this need -- or packages without generalization of backend -- it is already a strict improvement and isn't a question that its use is immediately useful (and thus would be good to replace the other backends here, for example). |
Thanks for clarifying, and sorry for overreacting on my end 😊 Maybe this PR can replace the other backends and leave Enzyme alone for the time being? |
We have no intentions to change to DI until it matches the performance of what we already have. It's on our radar, but we will not sacrifice features or performance. Instead, we will keep on top of it until DI can do exactly this, and then it's ready. |
Which benchmarks am I expected to pass for this bar to be met? |
If it's anywhere close to the performance then we'll take it as we would greatly enjoy the decreased maintanance burden. But it should be able to do the standard constrained and unconstrained tutorials with Zygote, Enzyme, and ForwardDiff and basically match the performance all of the way through. If I'm not mistaken, currently the closures are a bit of a blocker to this. |
If you're talking about the tutorials in the docs, some of them are on toy problems (see e.g. https://docs.sciml.ai/Optimization/stable/tutorials/constraints/). Is there an actual realistic benchmark suite that can be run to track and compare performance? |
The toy problems will be the hardest because that will measure pure overhead. |
What I'm suggesting above, following the discussion with Billy, is that DI doesn't necessarily have to overtake all backends at once. It seems reasonable to leave Enzyme aside for now cause that's the one where I would struggle most to milk the last drops of performance. |
Okay then. Ping me when the PR is ready for use @Vaibhavdixit02 and I'll review then help benchmark |
8212343
to
4fe1f54
Compare
406564e
to
f0b8401
Compare
Beware of the local sparsity detector: you shouldn't have to use it in general. julia> using ADTypes: jacobian_sparsity
julia> using SparseConnectivityTracer
julia> function con2_c(res, x)
res .= [x[1]^2 + x[2]^2, x[2] * sin(x[1]) - x[1]]
end
con2_c (generic function with 1 method)
julia> jacobian_sparsity(con2_c, zeros(2), zeros(2), TracerSparsityDetector()) # input-independent
2×2 SparseArrays.SparseMatrixCSC{Bool, Int64} with 4 stored entries:
1 1
1 1
julia> jacobian_sparsity(con2_c, zeros(2), zeros(2), TracerLocalSparsityDetector()) # input-dependent
2×2 SparseArrays.SparseMatrixCSC{Bool, Int64} with 3 stored entries:
1 1
1 ⋅ |
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.