(6/n) Support 2D Parallelism - Trainer example #19879

awaelchli · 2024-05-16T20:37:22Z

What does this PR do?

Adds an example for 2D parallelism (Tensor Parallelism + FSDP2) for the Trainer. It is equivalent to the one for Fabric added in #19846, so the model code etc. is copy-pasted. The main file to review is examples/pytorch/tensor_parallel/train.py.

This PR depends on #19878

📚 Documentation preview 📚: https://pytorch-lightning--19879.org.readthedocs.build/en/19879/

cc @Borda @carmocca @justusschock @awaelchli

examples/pytorch/tensor_parallel/README.md

examples/pytorch/tensor_parallel/model.py

github-actions · 2024-05-18T16:51:31Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 lightning_fabric: Azure GPU

Check ID	Status
lightning-fabric (GPUs) (testing Fabric \| latest)	success	✅
lightning-fabric (GPUs) (testing Lightning \| latest)	success	✅

These checks are required after the changes to examples/fabric/tensor_parallel/model.py, examples/fabric/tensor_parallel/parallelism.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

lantiga

🚀

carmocca · 2024-05-20T04:10:35Z

examples/pytorch/tensor_parallel/train.py

+        inputs = batch[:, :-1]
+        labels = batch[:, 1:]
+        output = self.model(inputs)
+        with loss_parallel():


TIL. Are you also enabling this for backward? https://github.com/pytorch/pytorch/blob/5fb11cda4fe60c1a7b30e6c844f84ce8933ef953/torch/distributed/tensor/parallel/loss.py#L35

github-actions bot added the pl Generic label for PyTorch Lightning package label May 16, 2024

awaelchli added this to the 2.3 milestone May 16, 2024

awaelchli added the example label May 16, 2024

Add 2D parallel example

d81249b

awaelchli force-pushed the feature/tp-pl-strategy-example branch from 92c6d16 to d81249b Compare May 16, 2024 20:41

awaelchli mentioned this pull request May 17, 2024

(5/n) Support 2D Parallelism in Lightning Trainer #19878

Merged

justusschock approved these changes May 17, 2024

View reviewed changes

examples/pytorch/tensor_parallel/README.md Outdated Show resolved Hide resolved

lantiga reviewed May 17, 2024

View reviewed changes

examples/pytorch/tensor_parallel/model.py Outdated Show resolved Hide resolved

Merge branch 'master' into feature/tp-pl-strategy-example

b6d0fe2

github-actions bot added the fabric lightning.fabric.Fabric label May 18, 2024

replace with torchtitan code

23a6848

awaelchli force-pushed the feature/tp-pl-strategy-example branch from 82a417f to 23a6848 Compare May 18, 2024 16:46

awaelchli marked this pull request as ready for review May 18, 2024 16:51

awaelchli requested review from Borda and tchaton as code owners May 18, 2024 16:51

awaelchli requested a review from lantiga May 18, 2024 16:53

lantiga approved these changes May 18, 2024

View reviewed changes

mergify bot added the ready PRs ready to be merged label May 18, 2024

lantiga merged commit c8059d7 into master May 19, 2024
20 checks passed

lantiga deleted the feature/tp-pl-strategy-example branch May 19, 2024 00:35

carmocca reviewed May 20, 2024

View reviewed changes

awaelchli mentioned this pull request May 20, 2024

Enable loss-parallel in example #19882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(6/n) Support 2D Parallelism - Trainer example #19879

(6/n) Support 2D Parallelism - Trainer example #19879

awaelchli commented May 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented May 18, 2024 •

edited

Loading

lantiga left a comment

carmocca May 20, 2024

(6/n) Support 2D Parallelism - Trainer example #19879

(6/n) Support 2D Parallelism - Trainer example #19879

Conversation

awaelchli commented May 16, 2024 • edited by github-actions bot Loading

What does this PR do?

github-actions bot commented May 18, 2024 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

lantiga left a comment

Choose a reason for hiding this comment

carmocca May 20, 2024

Choose a reason for hiding this comment

awaelchli commented May 16, 2024 •

edited by github-actions bot

Loading

github-actions bot commented May 18, 2024 •

edited

Loading