-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MKLDNN: Fully Connected layer. #9197
Comments
No, You can just add a big operator of FC, and only implement the MKLDNN kernel.
If you are familiar with Eigen Tensor, implement a CPU/GPU FC kernel is similar to MKLDNN kernel. The user only call the kernel through the Python, we can fall back to small op combines(mul + addition) when there is no MKLDNN available.
I do not fully understand your point. The multiplication and sum operator is fundamental in algebra, it used everywhere. I think the FC kernel can not replace these two operators, it's just a speed up when you want to do the fully connected operation.
Yes. See the first comment. |
There is one point I want to clarify that why I didn't implement the FC CPU/GPU kernel.
These two reason force TensorFlow team choose the XLA https://www.tensorflow.org/performance/xla/ way. But AFAIK, it will make debugging like a nightmare, because you can not imagine what happenend in your code. We will follow the tvm or similar tech later. Currently the multi-node multi-gpu performance hurts, I am focuing on that topic. |
I agree with it. You can add |
@luotao1, @dzhwinter, Thank you very much. |
I will use the Fully Connected layer as an example to describe this problem. So, during the implementation of the Fully Connected layer with the use of MKLDNN algorithm, I have encountered a few difficulties. The current version of the fully connected layer of Paddle is splitting into two operations: multiplication and addition. Basically, these operations are used in the current version of Paddle. Subsequently, MKLDNN version of algorithm gives us the opportunity to combine these operations into one. So, If I wanted to kill two birds with one stone I should have made a new kernel to this layer. Thus, I should make a stand-alone version of FC's algorithm. However, when I implemented new kernel, I picked up a few problems:
First of all, Am I forced to make three versions of the same algorithm on a CPU, GPU and MKLDNN, in order to register the new MKLDNN's op kernel?
Can I use the new Fc's kernel when I don't have a full implementation of FC's kernels on a CPU and GPU place, but I have only two fake kernels on CPU and GPU place?
By fake kernel I mean that this kernel is registered in the system but when it is called then the system gets the message that the kernel is not available at this time. I worked out that there are fake objects because the PaddlePaddle platform needs to have all kernels on all platforms.
Referring to the point above, Can I integrate single FC's kernel and all fake CPU's and GPU's kernels with current platform, when I have the old version of algorithm (multiplication and sum of matrix)?
Also, what can I do to link some of algorithms to one. Should we remove the old version of the algorithm (multiplication and sum) or should we replace this solution with a new algorithm (fully connected on MKLDNN) or is it not possible to touch it, and we need to add a new op kernel to the current solution?
Can we have a special kernel only to one specific platform, i.e MKLDNN, without a need to register new kernel for other platform i.e CPU (naive) and GPU?
Thank you.
The text was updated successfully, but these errors were encountered: