Skip to content

Commit

Permalink
Update on "support TP-only parallelism"
Browse files Browse the repository at this point in the history
With simply fixes, we can use TP/SP alone without FSDP:
1. enable gradient scaling only when FSDP is enabled
2. use `torch.nn.utils.clip_grad_norm_` for gradient clipping when FSDP is disabled

<img width="1515" alt="Screenshot 2024-03-13 at 3 12 49 PM" src="https://github.com/pytorch/torchtrain/assets/150487191/75c2706d-d7c8-46c2-aee1-e401cbceef69">


[ghstack-poisoned]
  • Loading branch information
tianyu-l committed Mar 13, 2024
1 parent 544edb8 commit 29d99cd
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,8 @@ def main(job_config: JobConfig):
optimizer = build_optimizer(model, job_config)
scheduler = get_lr_scheduler(optimizer, job_config)

# build grad scaler which is effective only when mixed precision training is enabled under FSDP
# build grad scaler which is effective only when mixed precision training
# is enabled with fp16 param dtype under FSDP
scaler = build_grad_scaler(model)

metric_logger = build_metric_logger(job_config)
Expand Down

0 comments on commit 29d99cd

Please sign in to comment.