-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add learning rate visualisation and manual parameter #161
Conversation
Hi Luca ! I realize that the default learning rate should depend on the architecture and that was not well done - low learning rates like 0.0001 or lower are typical for for BERT, but RNN need much higher value. So the older default (0.001) was too high for BERT I think, but this new default one (0.0001) in the PR is now too low for RNN. It's definitively useful to add it as command line parameter, but I think we should set the default learning rate in the |
Ok I double check: for both sequence labeling and text classification the learning rate for all transformer architectures is hard coded at Only RNN models were using the config learning rate value, and the default (0.001) was set for this. |
So that was my assumption when I added the decay optimizers:
|
Thanks for the clarification.
We can set the value also in the application, but at least we dont' risk to run it with the wrong default value. Let me know if this makes sense |
…ferent default for transformers-based and non-transformers-based models
I've fixed the default values (also in the classification trainer). I've added a callback that prints the LR decayed at each epoch, however I have the following:
The initial learning rate is
or, for non transformers, with the Adam optimizer:
For this case should I assume |
By removing the warmup steps, the learning rate does not float around. |
Yes normally warm-up is important when fine-tuning with transformers and ELMo (if I remember well, warmup is more important than the decay in learning rate!). The |
In line with the use of the incremental training, knowing the final learning rate of the "previous training", and the ability to set it manually can be helpful
This PR (updated list):
--learning-rate
to override the default learning rate value to the*Tagging
applications2e-5
) and0.0001
)