You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An additional notice is that the maximum slope of sigmoid is 1/4, but that of the tanh is 1, which is four times larger than that of the sigmoid. A larger gradient is preferred primarily because gradients are multiplied along the chain rule.
The text was updated successfully, but these errors were encountered:
Not just sigmoid and tanh -- both of them are not sufficiently good.
We should include ReLU its variants, like
A comparison is here https://datascience.stackexchange.com/questions/14349/difference-of-activation-functions-in-neural-networks-in-general
An additional notice is that the maximum slope of sigmoid is 1/4, but that of the tanh is 1, which is four times larger than that of the sigmoid. A larger gradient is preferred primarily because gradients are multiplied along the chain rule.
The text was updated successfully, but these errors were encountered: