Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a chapter comparing activations #14

Open
wangkuiyi opened this issue Dec 17, 2017 · 1 comment
Open

Need a chapter comparing activations #14

wangkuiyi opened this issue Dec 17, 2017 · 1 comment

Comments

@wangkuiyi
Copy link
Owner

wangkuiyi commented Dec 17, 2017

Not just sigmoid and tanh -- both of them are not sufficiently good.

We should include ReLU its variants, like

  • leaky ReLU: add a constant.
  • PReLU: add a scalar parameter for each ReLU neuron.
  • maxout: double the number of parameters for each ReLU neuron.

A comparison is here https://datascience.stackexchange.com/questions/14349/difference-of-activation-functions-in-neural-networks-in-general

An additional notice is that the maximum slope of sigmoid is 1/4, but that of the tanh is 1, which is four times larger than that of the sigmoid. A larger gradient is preferred primarily because gradients are multiplied along the chain rule.

@wangkuiyi
Copy link
Owner Author

ReLU need to work with batch norm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant