-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link with ChainRules.jl #39
Comments
@malmaud and I talked with a engineer on the tensorflow team about using julia AD to work out the calls for TensorFlow eager mode, For graph-mode, getting the derivative graph is basically the only thing we use PyCall for. If you have a compation graph, and do AD on it, I imagine it is quiet feasible to use ChainRules as part of that. |
Thanks for the discussion. The design idea for ADCME is that we split the computation into two parts:
The strategy is kind of different from what @oxinabox described, where all the computations are sent back to Julia. Using the current strategy, the differences in data structures of Julia and TensorFlow don't really matter. Neither do the semantics, because you can wrap the APIs in TensorFlow in a Julia style and the semantics appear just like usual Julia to users. There is only minor cost to tranfer data between Julia and TensorFlow before and after the whole computation. Of course the drawback is that you can hardly leverage the Julia JIT and packages for AD-related computations. I do not know much about ChainRules but I'd like to dig deeper into ChainRules in the next few weeks. My experience with a TensorFlow backend is that the performance is really remarkable. For example, if multiple operators are independent in the computational graph, TensorFlow will automatically executes them concurrently. Also it is easy to split the model on multi-CPUs and multi-GPUs. These parallism features are very important for many physical modeling related applications I have worked on in the past. What is the current status regarding the performance of ChainRules? |
It won't do that automatically. Indeed, TensorFlow is good for deployment, but what you lose is the ability to do difficult things, like solve stiff ODEs with high order methods or utilize a quadratic program solver. At some point trying to write a tensorflow op for every little detail means rewriting not only a whole programming language but also every single package in the programming language. If it's possible to define a tensorflow op that does a Julia call and asks for its gradient (which already is defined in packages like DifferentialEquations.jl), then it should "just work" and you'd then be able to piece those in with the rest of the AD. |
At some point, I was trying to let TensorFlow call Julia directly via a similar mechanism as py_func. Unfortunately, due to a problem related to calling a Julia function from a non-Julia thread. The same solution did not work very well. |
ChainRules.jl is a language-wide AD definition library. https://github.com/JuliaDiff/ChainRules.jl Plugging into it will give compatibility with a lot of operations for free. You might want to use this for generating calls for tensorflow, instead just redirecting back to Julia.
@oxinabox maintains both TensorFlow.jl and ChainRules.jl so he might know the specifics on how to do this.
The text was updated successfully, but these errors were encountered: