Why multiply by 0.1 in the residual block? #11

zplizzi · 2019-03-12T23:04:20Z

In the paper and code (eg here), the output of the resnet blocks is multiplied by 0.1. I'm curious of the purpose of this. Does it have to do with the absence of batch-norm?

LMescheder · 2019-03-13T13:17:23Z

It just reduces the learning rate for those blocks by a factor of 10 (due to the adaptive optimizer RMSProp). We haven't played around with it too much and I think it might also work fine without the 0.1.

LuChengTHU · 2019-03-23T03:54:41Z

I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.

LMescheder · 2019-03-25T09:20:19Z

I removed the factor 0.1 and changed g_lr and d_lr from 1e-4 to 1e-5, but it cannot converge at all. I don't know the reason.

Thanks for reporting your experimental results. What architecture + dataset did you use? I quickly tried on celebA + LSUN churches at resolution 128^2 and there it appears to work fine without the 0.1 and a lr of 1e-5.
One possible reason why it did not work for you could be that the 0.1 also changes the initialization, which can be quite important (for deep learning in general and our method in particular), as it only has local guarantees. What you can try is to add a

nn.init.zeros_(self.conv_1.weight)
nn.init.zeros_(self.conv_1.bias)

to the __init__ function of the ResNet blocks when removing the 0.1 and set both learning rates to 1e-5.

LuChengTHU · 2019-03-27T14:30:53Z

Thanks for your reply! I used celebA-HQ and the image size is 1024*1024. I just changed the lr in configs/celebA-HQ and removed the factor 0.1 in gan_training/models/resnet.py. I will try the initialization change. Thanks!

zplizzi · 2019-03-29T20:29:23Z

The 0.1 factor made more sense to me after reading the Fixup paper - it explains why standard initialization methods are poorly suited for ResNets and can cause immediate gradient explosion. The 0.1 factor is a rough approximation of the fix they suggest, which is down-scaling the initializations in the resnet blocks, and then potentially initializing the last conv layer of each block to 0 (as @LMescheder mentions above), along with a few other changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why multiply by 0.1 in the residual block? #11

Why multiply by 0.1 in the residual block? #11

zplizzi commented Mar 12, 2019

LMescheder commented Mar 13, 2019

LuChengTHU commented Mar 23, 2019 •

edited

Loading

LMescheder commented Mar 25, 2019

LuChengTHU commented Mar 27, 2019

zplizzi commented Mar 29, 2019

Why multiply by 0.1 in the residual block? #11

Why multiply by 0.1 in the residual block? #11

Comments

zplizzi commented Mar 12, 2019

LMescheder commented Mar 13, 2019

LuChengTHU commented Mar 23, 2019 • edited Loading

LMescheder commented Mar 25, 2019

LuChengTHU commented Mar 27, 2019

zplizzi commented Mar 29, 2019

LuChengTHU commented Mar 23, 2019 •

edited

Loading