Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Swish #799

Merged
merged 1 commit into from
Apr 1, 2019
Merged

Swish #799

merged 1 commit into from
Apr 1, 2019

Conversation

sethtroisi
Copy link
Contributor

Based on https://arxiv.org/pdf/1710.05941.pdf

This needs to be rebased after #798

@tommadams
Copy link
Contributor

From the paper: "For training Swish networks, we found that slightly lowering the learning rate used to train ReLU networks works well."

It would have been nice if they provided actual numbers here :/

@amj
Copy link
Contributor

amj commented Apr 1, 2019

LGTM

@amj amj self-requested a review April 1, 2019 18:36
@sethtroisi
Copy link
Contributor Author

@tommadams Per personal communication slight is 20-30% on the initial learning rate (which is followed by a 10x cut at the first decay)

@sethtroisi sethtroisi merged commit 306d1cf into tensorflow:master Apr 1, 2019
@sethtroisi sethtroisi deleted the swish branch April 1, 2019 18:37
@sethtroisi sethtroisi mentioned this pull request Apr 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants