You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
I have experienced significantly slow training speed when using PReLU activation instead of using ReLU activation with model composed using the gluon API. The speed of PReLU over ReLU is about 1/5 on GPU when measuring the number of samples processed per second.
Finally I managed to bring the speed of gluon's PReLU back to normal by the following modifications:
Following is the original init function of PReLU:
The key is to pass in the expected number of channels to the PReLU block, so that it does not share the negative slope among the channels. The downside of this solution is that you need to pass in the number of channels every time.
I don't know why the two settings (shared vs. non-shared) have so drastically different performance. The contributors of mxnet should investigate this issue in further depth.
The text was updated successfully, but these errors were encountered:
I have experienced significantly slow training speed when using PReLU activation instead of using ReLU activation with model composed using the gluon API. The speed of PReLU over ReLU is about 1/5 on GPU when measuring the number of samples processed per second.
Finally I managed to bring the speed of gluon's PReLU back to normal by the following modifications:
Following is the original init function of PReLU:
Following is the modified init function of PReLU:
The key is to pass in the expected number of channels to the PReLU block, so that it does not share the negative slope among the channels. The downside of this solution is that you need to pass in the number of channels every time.
I don't know why the two settings (shared vs. non-shared) have so drastically different performance. The contributors of mxnet should investigate this issue in further depth.
The text was updated successfully, but these errors were encountered: