forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renwuli/disable te qwen #26
Open
amdrenwuli
wants to merge
3,583
commits into
rocm_dev
Choose a base branch
from
renwuli/disable_te_qwen
base: rocm_dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix for newer apex version See merge request ADLR/megatron-lm!1021
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
…istributed optimizer checkpoint
Add assert for overlap_param_gather See merge request ADLR/megatron-lm!1029
Signed-off-by: Xiaowei Ren <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
added mainfest See merge request ADLR/megatron-lm!1035
Fix checkpointing with TransformerEngine See merge request ADLR/megatron-lm!1038
Moved Dataloader processing to the CPU for pin memory usage/ Added a custom torch.split BProp implementation See merge request ADLR/megatron-lm!911
Save checkpoint whenever batch size ramps up See merge request ADLR/megatron-lm!1034
Signed-off-by: Xiaowei Ren <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
…/selvaraja/megatron-lm into atomic_gemm_switch
Signed-off-by: Selvaraj Anandaraj <[email protected]>
…/selvaraja/megatron-lm into atomic_gemm_switch
Fix `qkv_format` in TEDotProductAttention See merge request ADLR/megatron-lm!1078
Add support for masked WordPiece datasets BERT and T5 See merge request ADLR/megatron-lm!1041
Distributed checkpointing implementation for MoE See merge request ADLR/megatron-lm!1055
# Conflicts: # pretrain_bert.py
Fix the case when none token is allocated for local expert(s) with EP>1. See merge request ADLR/megatron-lm!1063
Generate causal mask for local layer spec See merge request ADLR/megatron-lm!1047
Update minor version See merge request ADLR/megatron-lm!1086
Signed-off-by: Jimmy Zhang <[email protected]>
Adding bert local spec test See merge request ADLR/megatron-lm!1072
Feature/Add E2E metrics logging See merge request ADLR/megatron-lm!1049
JET Migration Updates See merge request ADLR/megatron-lm!1066
use TE checkpointing when FP8 See merge request ADLR/megatron-lm!1080
@amdrenwuli : Is this PR targetted to be merged into rocm_dev ? Because I see this PR has a different baseline commit. |
Marking as stale. No activity in 60 days. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR a new branch that supports Qwen1.0 and TE is disabled