Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update on "support TP-only parallelism"
With simply fixes, we can use TP/SP alone without FSDP: 1. enable gradient scaling only when FSDP is enabled 2. use `torch.nn.utils.clip_grad_norm_` for gradient clipping when FSDP is disabled <img width="1515" alt="Screenshot 2024-03-13 at 3 12 49 PM" src="https://github.com/pytorch/torchtrain/assets/150487191/75c2706d-d7c8-46c2-aee1-e401cbceef69"> [ghstack-poisoned]
- Loading branch information