-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve communication overlapping in FP8 distributed optimizer #8221
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
for more information, see https://pre-commit.ci
jenkins |
Signed-off-by: Tim Moon <[email protected]>
jenkins |
jenkins |
Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]>
jenkins |
dimapihtar
approved these changes
Feb 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
biscayan
pushed a commit
to biscayan/NeMo
that referenced
this pull request
Feb 15, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: biscayan <[email protected]>
ssh-meister
pushed a commit
to ssh-meister/NeMo
that referenced
this pull request
Feb 15, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]>
vasunvidia
pushed a commit
to vasunvidia/NeMo
that referenced
this pull request
Feb 19, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
vasunvidia
added a commit
to vasunvidia/NeMo
that referenced
this pull request
Feb 19, 2024
NVIDIA#8221)" This reverts commit 5521687.
layalir
added a commit
to layalir/NeMo
that referenced
this pull request
Feb 28, 2024
NVIDIA#8221)" This reverts commit c84121a.
layalir
added a commit
to layalir/NeMo
that referenced
this pull request
Feb 29, 2024
NVIDIA#8221)" This reverts commit c84121a.
ftxj
pushed a commit
to ftxj/NeMo
that referenced
this pull request
Feb 29, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
minitu
pushed a commit
to minitu/NeMo
that referenced
this pull request
Mar 7, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
pablo-garay
pushed a commit
that referenced
this pull request
Mar 19, 2024
* Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Pablo Garay <[email protected]>
8 tasks
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
…A#8221) * Only reduce amaxes after fp8 cast for last distopt bucket Signed-off-by: Tim Moon <[email protected]> * Handle case with FP8 and contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Support distopt buckets with mixed dtypes Signed-off-by: Tim Moon <[email protected]> * Fix bug where fp8 casts were being skipped Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * Separate non-FP8 params into leftover distopt bucket Signed-off-by: Tim Moon <[email protected]> * Debug FP8 params with contiguous param buffer Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make sure to update FP8 transpose cache Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Avoid unnecessary FP8 weight transposes. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
When training GPT, the Apex distributed Adam optimizer overlaps its first parameter all-gather with the optimizer step. This optimization has been applied to both FP8 and non-FP8 models.
Collection: NLP
Changelog
Usage
Run GPT, e.g. with the config at https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.
Enable FP8 support with
model.fp8=True
, FP8 parameters withmodel.fp8_params=True
, the distributed optimizer withmodel.optim.name=distributed_fused_adam
, and overlapped param all-gathers withmodel.optim.overlap_param_sync=True
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information