Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong local_world_size fallbacks to global world size with slurm #2667

Open
JacoCheung opened this issue Jan 7, 2025 · 0 comments
Open

wrong local_world_size fallbacks to global world size with slurm #2667

JacoCheung opened this issue Jan 7, 2025 · 0 comments

Comments

@JacoCheung
Copy link

JacoCheung commented Jan 7, 2025

Hi team, TorchRec has defined local_world_size and local_rank which gets initialized through several ENVs.

However, if the job is launched via slurm ( say, sbatch or srun), none of those is ENV defined.

I'm wondering if torchrec could add slurm env support rather than manually export LOCAL_WORLD_SIZE . Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant