Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why multiply by 0.1 in the residual block? #11

Open
zplizzi opened this issue Mar 12, 2019 · 5 comments
Open

Why multiply by 0.1 in the residual block? #11

zplizzi opened this issue Mar 12, 2019 · 5 comments

Comments

@zplizzi
Copy link

zplizzi commented Mar 12, 2019

In the paper and code (eg here), the output of the resnet blocks is multiplied by 0.1. I'm curious of the purpose of this. Does it have to do with the absence of batch-norm?

@LMescheder
Copy link
Owner

It just reduces the learning rate for those blocks by a factor of 10 (due to the adaptive optimizer RMSProp). We haven't played around with it too much and I think it might also work fine without the 0.1.

@LuChengTHU
Copy link

LuChengTHU commented Mar 23, 2019

I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.

@LMescheder
Copy link
Owner

I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.

Thanks for reporting your experimental results. What architecture + dataset did you use? I quickly tried on celebA + LSUN churches at resolution 128^2 and there it appears to work fine without the 0.1 and a lr of 1e-5.
One possible reason why it did not work for you could be that the 0.1 also changes the initialization, which can be quite important (for deep learning in general and our method in particular), as it only has local guarantees. What you can try is to add a

nn.init.zeros_(self.conv_1.weight)
nn.init.zeros_(self.conv_1.bias)

to the __init__ function of the ResNet blocks when removing the 0.1 and set both learning rates to 1e-5.

@LuChengTHU
Copy link

Thanks for your reply! I used celebA-HQ and the image size is 1024*1024. I just changed the lr in configs/celebA-HQ and removed the factor 0.1 in gan_training/models/resnet.py. I will try the initialization change. Thanks!

@zplizzi
Copy link
Author

zplizzi commented Mar 29, 2019

The 0.1 factor made more sense to me after reading the Fixup paper - it explains why standard initialization methods are poorly suited for ResNets and can cause immediate gradient explosion. The 0.1 factor is a rough approximation of the fix they suggest, which is down-scaling the initializations in the resnet blocks, and then potentially initializing the last conv layer of each block to 0 (as @LMescheder mentions above), along with a few other changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants