-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT support for BF16 grad reductions #5920
Conversation
|
bd9176d
to
ee8500b
Compare
24ef1f6
to
17cc4ae
Compare
Signed-off-by: Tim Moon <[email protected]>
5476cf1
to
59429ce
Compare
Signed-off-by: Tim Moon <[email protected]>
44bcd5f
to
68d7441
Compare
Signed-off-by: Tim Moon <[email protected]>
5b9b09d
to
b80a1f4
Compare
Signed-off-by: Tim Moon <[email protected]>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
|
||
# Compute norm of local gradients for explicit FP32 optimizer | ||
if self._fp32_optim is not None: | ||
_fp32_optim_grad_sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._fp32_optim_grad_sync()
?
if getattr(param, '_with_fp32_optimizer', False): | ||
main_param = param.detach().clone().float() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
param -> model_param
?
* Add support for BF16 grad reductions with distopt Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
* Add support for BF16 grad reductions with distopt Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]>
* GPT support for BF16 grad reductions (#5920) * Add support for BF16 grad reductions with distopt Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> * Add custom functions to launch distopt communication in interleaved pipeline parallelism (#6183) Signed-off-by: Tim Moon <[email protected]> * Bugfix for BF16 grad reductions with distopt (#6340) * Debug distopt support for BF16 grad reductions Signed-off-by: Tim Moon <[email protected]> * Dump and load FP32 main params Signed-off-by: Tim Moon <[email protected]> * Style tweaks Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Mikołaj Błaż <[email protected]>
* Add support for BF16 grad reductions with distopt Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Fix style issues Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: hsiehjackson <[email protected]>
What does this PR do ?
Adds GPT support for BF16/FP16 gradient reductions, with embedding grad reductions in FP32.
Collection: NLP
Changelog
Usage
Set the optimizer to
distributed_fused_adam
in the config file:NeMo/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml
Line 193 in 65c277b
Configure the optimizer with
grad_sync_dtype: bf16
.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information