Simple Tensorflow implementation of On the Convergence of Adam and Beyond
- For the default hyperparameter, we set it to the best value in the experiment
learning_rate
= 0.01beta1
= 0.9beta2
= 0.99- Depending on which network you are using, performance may be good at
beta2 = 0.99 (default)
from AMSGrad import AMSGrad
train_op = AMSGrad(learning_rate=0.01, beta1=0.9, beta2=0.99, epsilon=1e-8).minimize(loss)
x = fully_connected(inputs=images, units=100)
x = relu(x)
logits = fully_connected(inputs=x, units=10)
Junho Kim