-
Notifications
You must be signed in to change notification settings - Fork 23.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release-1.9.1] [torch] Various improvements to torch.distributed.launch
and torch.distributed.run
(#60925)
#64797
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Summary: The current example code does not work. The correct one is like this: https://github.com/pytorch/pytorch/blob/cb7d813275a13a4233951e7cbcbb8351dbb0fd87/torch/distributed/run.py#L266 Pull Request resolved: pytorch#61127 Reviewed By: cbalioglu Differential Revision: D29572003 Pulled By: mrshenli fbshipit-source-id: 05b470230f3d70f8a6164edb5f92894a1112069f
Summary: Pull Request resolved: pytorch#59152 Small change for https://fb.workplace.com/groups/319878845696681 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D28773682 Pulled By: H-Huang fbshipit-source-id: acf82273e8622b7ffd3088d8d766bdf49273754c
….distributed.run` (pytorch#61294) Summary: Pull Request resolved: pytorch#61294 Pull Request resolved: pytorch#60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: pytorch#60716 pytorch#60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: cbalioglu Differential Revision: D29559553 fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 8c1be47 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Job | Step | Action |
---|---|---|
Download PyTorch Test Reports | 🔁 rerun | |
Fail if there were any warnings | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
torch.distributed.launch
and torch.distributed.run
(#60925)torch.distributed.launch
and torch.distributed.run
(#60925)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Pull Request resolved: #60925
torch.distributed.launch
restarts to 0-use_env
warning, move-use_env
warnings-use_env
warnings totorch.distributed.launch
torch.distributed.run
torch.distributed.launch
not use error-propagationnull
that does not print events to consoletorch.distributed.launch
totorch.distributed.run
Issues resolved:
#60716
#60754
Test Plan:
sandcastle
Output of running torch.distributed.launch without --use_env:
New section:
{F628923078}
{F628974089}
Differential Revision: D29559553