Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title: Fix setup_env_ranks to Properly Set Environment Variables Instead of Raising Error #6979

Merged
merged 3 commits into from
Jan 30, 2025

Conversation

fabiosanger
Copy link
Contributor

@fabiosanger fabiosanger commented Jan 29, 2025

This pull request addresses an issue in setup_env_ranks where, under certain conditions, the function raises an error instead of setting the necessary MPI-related environment variables (LOCAL_RANK, RANK, and WORLD_SIZE). The intended behavior is to properly map Open MPI variables (OMPI_COMM_WORLD_*) to the variables expected by DeepSpeed/PyTorch, but the code currently raises an EnvironmentError if these Open MPI variables are not found.

With this fix, setup_env_ranks will:

  • Correctly detect and map the required Open MPI environment variables.
  • Only raise an error if there is genuinely no valid way to obtain rank information from the environment (e.g., both Open MPI variables and any fallback mechanism are unavailable).

Fix #6895

@fabiosanger
Copy link
Contributor Author

@fabiosanger please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@tjruwase tjruwase added this pull request to the merge queue Jan 30, 2025
Merged via the queue into deepspeedai:master with commit 065ca8a Jan 30, 2025
10 checks passed
tjruwase added a commit that referenced this pull request Feb 6, 2025
…ead of Raising Error (#6979)

This pull request addresses an issue in setup_env_ranks where, under
certain conditions, the function raises an error instead of setting the
necessary MPI-related environment variables (LOCAL_RANK, RANK, and
WORLD_SIZE). The intended behavior is to properly map Open MPI variables
(OMPI_COMM_WORLD_*) to the variables expected by DeepSpeed/PyTorch, but
the code currently raises an EnvironmentError if these Open MPI
variables are not found.

With this fix, setup_env_ranks will:

- Correctly detect and map the required Open MPI environment variables.
- Only raise an error if there is genuinely no valid way to obtain rank
information from the environment (e.g., both Open MPI variables and any
fallback mechanism are unavailable).

Fix #6895

Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
siqi654321 pushed a commit to siqi654321/DeepSpeed that referenced this pull request Feb 7, 2025
…ead of Raising Error (deepspeedai#6979)

This pull request addresses an issue in setup_env_ranks where, under
certain conditions, the function raises an error instead of setting the
necessary MPI-related environment variables (LOCAL_RANK, RANK, and
WORLD_SIZE). The intended behavior is to properly map Open MPI variables
(OMPI_COMM_WORLD_*) to the variables expected by DeepSpeed/PyTorch, but
the code currently raises an EnvironmentError if these Open MPI
variables are not found.

With this fix, setup_env_ranks will:

- Correctly detect and map the required Open MPI environment variables.
- Only raise an error if there is genuinely no valid way to obtain rank
information from the environment (e.g., both Open MPI variables and any
fallback mechanism are unavailable).

Fix deepspeedai#6895

Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: siqi <[email protected]>
traincheck-team pushed a commit to traincheck-team/DeepSpeed that referenced this pull request Feb 9, 2025
…ead of Raising Error (deepspeedai#6979)

This pull request addresses an issue in setup_env_ranks where, under
certain conditions, the function raises an error instead of setting the
necessary MPI-related environment variables (LOCAL_RANK, RANK, and
WORLD_SIZE). The intended behavior is to properly map Open MPI variables
(OMPI_COMM_WORLD_*) to the variables expected by DeepSpeed/PyTorch, but
the code currently raises an EnvironmentError if these Open MPI
variables are not found.

With this fix, setup_env_ranks will:

- Correctly detect and map the required Open MPI environment variables.
- Only raise an error if there is genuinely no valid way to obtain rank
information from the environment (e.g., both Open MPI variables and any
fallback mechanism are unavailable).

Fix deepspeedai#6895

Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI environment variables are not set
4 participants