-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: move model loader functionality to augmentation #119
feat: move model loader functionality to augmentation #119
Conversation
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
rank, world_size = 0, 1 | ||
if torch.distributed.is_initialized(): | ||
world_size = torch.distributed.get_world_size() | ||
rank = torch.distributed.get_rank() | ||
|
||
# shard the MOE, and store the component names, eventually needed | ||
# to configure the FSDP | ||
model_name = model.config.name_or_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say add a check for the prescence of name_or_path
in model.config
, and if not there, raise a ValueError
explaining that for scattermoe, we require a name_or_path
to point to the model in the config
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
maybe before merging this will be good to test a multi-gpu run |
This change seems to not work on multi-GPU:
|
|
Description
Step 1 of 3 for enabling LoRA on ScatterMoE: move model loader functionality to augmentation. This makes it so the plugin doesn't have to be standalone as well.
Testing
Testing on fms-hf-tuning with augmentation function instead of model loader shows similar results to #390:
Results:
model location:
/testing/tuning/output/granite-3b-moe/ft/20240107_1014-tone/save_model