You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure that this is a bug or if it is a deliberate design decision, but right now the learning rate schedule gets updated at every "step" which actually corresponds to every forward pass. I think a more standard implementation would have the learning rate scheduler "step" interval correspond to being updated every backwards pass. This has caused me a lot of problems with instability as I did not realize that using standard learning rate warmups of say 16000 steps would actually only warm up for 1000 steps if I set accumulate_grad_batches=16.
The text was updated successfully, but these errors were encountered:
🐛 Bug
I'm not sure that this is a bug or if it is a deliberate design decision, but right now the learning rate schedule gets updated at every "step" which actually corresponds to every forward pass. I think a more standard implementation would have the learning rate scheduler "step" interval correspond to being updated every backwards pass. This has caused me a lot of problems with instability as I did not realize that using standard learning rate warmups of say 16000 steps would actually only warm up for 1000 steps if I set
accumulate_grad_batches=16
.The text was updated successfully, but these errors were encountered: