Need a chapter comparing activations #14

wangkuiyi · 2017-12-17T01:45:22Z

Not just sigmoid and tanh -- both of them are not sufficiently good.

We should include ReLU its variants, like

leaky ReLU: add a constant.
PReLU: add a scalar parameter for each ReLU neuron.
maxout: double the number of parameters for each ReLU neuron.

A comparison is here https://datascience.stackexchange.com/questions/14349/difference-of-activation-functions-in-neural-networks-in-general

An additional notice is that the maximum slope of sigmoid is 1/4, but that of the tanh is 1, which is four times larger than that of the sigmoid. A larger gradient is preferred primarily because gradients are multiplied along the chain rule.

wangkuiyi · 2017-12-17T02:24:02Z

ReLU need to work with batch norm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need a chapter comparing activations #14

Need a chapter comparing activations #14

wangkuiyi commented Dec 17, 2017 •

edited

Loading

wangkuiyi commented Dec 17, 2017

Need a chapter comparing activations #14

Need a chapter comparing activations #14

Comments

wangkuiyi commented Dec 17, 2017 • edited Loading

wangkuiyi commented Dec 17, 2017

wangkuiyi commented Dec 17, 2017 •

edited

Loading