-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why multiply by 0.1 in the residual block? #11
Comments
It just reduces the learning rate for those blocks by a factor of 10 (due to the adaptive optimizer RMSProp). We haven't played around with it too much and I think it might also work fine without the 0.1. |
I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason. |
Thanks for reporting your experimental results. What architecture + dataset did you use? I quickly tried on celebA + LSUN churches at resolution 128^2 and there it appears to work fine without the 0.1 and a lr of 1e-5.
to the |
Thanks for your reply! I used celebA-HQ and the image size is 1024*1024. I just changed the lr in configs/celebA-HQ and removed the factor 0.1 in gan_training/models/resnet.py. I will try the initialization change. Thanks! |
The 0.1 factor made more sense to me after reading the Fixup paper - it explains why standard initialization methods are poorly suited for ResNets and can cause immediate gradient explosion. The 0.1 factor is a rough approximation of the fix they suggest, which is down-scaling the initializations in the resnet blocks, and then potentially initializing the last conv layer of each block to 0 (as @LMescheder mentions above), along with a few other changes. |
In the paper and code (eg here), the output of the resnet blocks is multiplied by 0.1. I'm curious of the purpose of this. Does it have to do with the absence of batch-norm?
The text was updated successfully, but these errors were encountered: