Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renwuli/disable te qwen #26

Open
wants to merge 3,583 commits into
base: rocm_dev
Choose a base branch
from
Open

Renwuli/disable te qwen #26

wants to merge 3,583 commits into from

Conversation

amdrenwuli
Copy link
Member

PR a new branch that supports Qwen1.0 and TE is disabled

mikolajblaz and others added 30 commits December 20, 2023 10:59
Fix for newer apex version

See merge request ADLR/megatron-lm!1021
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Add assert for overlap_param_gather

See merge request ADLR/megatron-lm!1029
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
added mainfest

See merge request ADLR/megatron-lm!1035
Fix checkpointing with TransformerEngine

See merge request ADLR/megatron-lm!1038
Moved Dataloader processing to the CPU for pin memory usage/ Added a custom torch.split BProp implementation

See merge request ADLR/megatron-lm!911
Save checkpoint whenever batch size ramps up

See merge request ADLR/megatron-lm!1034
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
ericharper and others added 28 commits January 29, 2024 20:28
Fix `qkv_format` in TEDotProductAttention

See merge request ADLR/megatron-lm!1078
Add support for masked WordPiece datasets BERT and T5

See merge request ADLR/megatron-lm!1041
Distributed checkpointing implementation for MoE

See merge request ADLR/megatron-lm!1055
# Conflicts:
#   pretrain_bert.py
Fix the case when none token is allocated for local expert(s) with EP>1.

See merge request ADLR/megatron-lm!1063
Generate causal mask for local layer spec

See merge request ADLR/megatron-lm!1047
Update minor version

See merge request ADLR/megatron-lm!1086
Adding bert local spec test

See merge request ADLR/megatron-lm!1072
Feature/Add E2E metrics logging

See merge request ADLR/megatron-lm!1049
JET Migration Updates

See merge request ADLR/megatron-lm!1066
use TE checkpointing when FP8

See merge request ADLR/megatron-lm!1080
@gurpreet-dhami
Copy link
Collaborator

@amdrenwuli : Is this PR targetted to be merged into rocm_dev ? Because I see this PR has a different baseline commit.

Copy link

Marking as stale. No activity in 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.