Swish #799

sethtroisi · 2019-04-01T08:46:12Z

This needs to be rebased after #798

tommadams · 2019-04-01T16:50:05Z

From the paper: "For training Swish networks, we found that slightly lowering the learning rate used to train ReLU networks works well."

It would have been nice if they provided actual numbers here :/

amj · 2019-04-01T18:36:09Z

LGTM

sethtroisi · 2019-04-01T18:36:47Z

@tommadams Per personal communication slight is 20-30% on the initial learning rate (which is followed by a 10x cut at the first decay)

k8s-ci-robot added the size/M label Apr 1, 2019

Add use_swish to dual_net.

d2fdd7f

sethtroisi force-pushed the swish branch from d6faf55 to d2fdd7f Compare April 1, 2019 17:48

k8s-ci-robot added size/XS and removed size/M labels Apr 1, 2019

tommadams self-requested a review April 1, 2019 18:20

tommadams approved these changes Apr 1, 2019

View reviewed changes

amj self-requested a review April 1, 2019 18:36

amj approved these changes Apr 1, 2019

View reviewed changes

sethtroisi merged commit 306d1cf into tensorflow:master Apr 1, 2019

sethtroisi deleted the swish branch April 1, 2019 18:37

sethtroisi mentioned this pull request Apr 2, 2019

[Experiment] Swish #803

Open

Provide feedback