weight decay for the resweight? #14

Kyeongpil · 2020-11-24T02:15:47Z

Hello, I read the paper, and it is interesting to me.
I have a question.

Many implements including Huggingface exclude LayerNorm and biases when decaying weights for convergence.
(huggingface/transformers#492)
Is it helpful to exclude the resweight parameters when decaying weights??

calclavia · 2020-11-28T05:08:27Z

Yes, it would seem reasonable to not decay resweights since other parameters are already being decayed.

fightnyy · 2021-02-17T16:35:01Z

@calclavia I have the same question, but did this prove to be better? Or is it just to speed up calculations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weight decay for the resweight? #14

weight decay for the resweight? #14

Kyeongpil commented Nov 24, 2020

calclavia commented Nov 28, 2020

fightnyy commented Feb 17, 2021

weight decay for the resweight? #14

weight decay for the resweight? #14

Comments

Kyeongpil commented Nov 24, 2020

calclavia commented Nov 28, 2020

fightnyy commented Feb 17, 2021