Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link with ChainRules.jl #39

Open
ChrisRackauckas opened this issue Mar 24, 2020 · 4 comments
Open

Link with ChainRules.jl #39

ChrisRackauckas opened this issue Mar 24, 2020 · 4 comments

Comments

@ChrisRackauckas
Copy link

ChainRules.jl is a language-wide AD definition library. https://github.com/JuliaDiff/ChainRules.jl Plugging into it will give compatibility with a lot of operations for free. You might want to use this for generating calls for tensorflow, instead just redirecting back to Julia.

@oxinabox maintains both TensorFlow.jl and ChainRules.jl so he might know the specifics on how to do this.

@oxinabox
Copy link

oxinabox commented Mar 25, 2020

@malmaud and I talked with a engineer on the tensorflow team about using julia AD to work out the calls for TensorFlow eager mode,
and conculded it wasn't worth the effort because julia semantics and tensorflow semantics, esp around broadcasting are subtly different (rows vs columns) and things like that would lead to too much pain, so for eager mode @malmaud implemented a tiny tape based AD inside TensorFlow.jl.
We may have been wrong there.

For graph-mode, getting the derivative graph is basically the only thing we use PyCall for.
We build the graph for the primal computation in julia, then send it over to python to get the derivative graph back, then hook them all together and and train / run it in julia (with the libtensorflow C bindings).


If you have a compation graph, and do AD on it, I imagine it is quiet feasible to use ChainRules as part of that.
Because for that kind of AD one needs rules for everything in the graph anyway.
And ChainRules can specify arbitary rules.

@kailaix
Copy link
Owner

kailaix commented Mar 25, 2020

Thanks for the discussion. The design idea for ADCME is that we split the computation into two parts:

  1. The first part does not require gradients so it is solely computed in Julia, leveraging Julia JIT and existing packages.
  2. The second part requires AD. The solution is to throw the data and (static) computational graph (built using PyCall) to TensorFlow. This step has nothing to do with Julia and all the computations are migrated to TensorFlow C++ kernels.

The strategy is kind of different from what @oxinabox described, where all the computations are sent back to Julia. Using the current strategy, the differences in data structures of Julia and TensorFlow don't really matter. Neither do the semantics, because you can wrap the APIs in TensorFlow in a Julia style and the semantics appear just like usual Julia to users. There is only minor cost to tranfer data between Julia and TensorFlow before and after the whole computation. Of course the drawback is that you can hardly leverage the Julia JIT and packages for AD-related computations.

I do not know much about ChainRules but I'd like to dig deeper into ChainRules in the next few weeks. My experience with a TensorFlow backend is that the performance is really remarkable. For example, if multiple operators are independent in the computational graph, TensorFlow will automatically executes them concurrently. Also it is easy to split the model on multi-CPUs and multi-GPUs. These parallism features are very important for many physical modeling related applications I have worked on in the past. What is the current status regarding the performance of ChainRules?

@ChrisRackauckas
Copy link
Author

It won't do that automatically. Indeed, TensorFlow is good for deployment, but what you lose is the ability to do difficult things, like solve stiff ODEs with high order methods or utilize a quadratic program solver. At some point trying to write a tensorflow op for every little detail means rewriting not only a whole programming language but also every single package in the programming language. If it's possible to define a tensorflow op that does a Julia call and asks for its gradient (which already is defined in packages like DifferentialEquations.jl), then it should "just work" and you'd then be able to piece those in with the rest of the AD.

@kailaix
Copy link
Owner

kailaix commented Mar 25, 2020

At some point, I was trying to let TensorFlow call Julia directly via a similar mechanism as py_func. Unfortunately, due to a problem related to calling a Julia function from a non-Julia thread. The same solution did not work very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants