-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FYI: Notes on using nccl and MPS with Torch... #46
Comments
YulunW
added a commit
to YulunW/nccl
that referenced
this issue
Mar 21, 2024
Summary: Pull Request resolved: facebookresearch#46 Differential Revision: D55168758
YulunW
added a commit
to YulunW/nccl
that referenced
this issue
Mar 21, 2024
Summary: Pull Request resolved: facebookresearch#46 Differential Revision: D55168758
YulunW
added a commit
to YulunW/nccl
that referenced
this issue
Mar 23, 2024
Summary: Pull Request resolved: facebookresearch#46 Add start time information in CollTrace. Now worker thread will wait for the start event of each collective as well. This could help post analysis during hang to figure out dependency between collectives. Reviewed By: minsii Differential Revision: D55168758 fbshipit-source-id: df908efae5d96c03f31b3672640c2e001ae68af9
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have written up some notes and cookbook examples of using MPS and nccl with Torch, which may help Torch users who are new to multi-process, multi-GPU environments.
My notes can be found at:
https://github.com/CCorfield/Torch-parallel-nccl-MPS-Example
Please advise on corrections and additions.
The text was updated successfully, but these errors were encountered: