torchrun support #396
hahahannes
started this conversation in
General
Replies: 1 comment 2 replies
-
Thanks, we didn't really investigate multi-node running so far AFAIK, as we don't generally have such nodes available for testing. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to run MLPF with Torchrun for multi node multi GPU training and I needed to make some changes. I am following the guide at https://pytorch.org/tutorials/beginner/ddp_series_fault_tolerance.html.
Let me know if that might interesting for you. I am tracking my changes in my fork. There is one commit with the changes so far. I added an argument
use-torchrun
but I think torchrun could also replace themp.spawn
partBeta Was this translation helpful? Give feedback.
All reactions