Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DistributedSampler not used when using DDPPlugin #7813

Closed
ajtritt opened this issue Jun 3, 2021 · 0 comments · Fixed by #7814
Closed

DistributedSampler not used when using DDPPlugin #7813

ajtritt opened this issue Jun 3, 2021 · 0 comments · Fixed by #7814
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@ajtritt
Copy link
Contributor

ajtritt commented Jun 3, 2021

🐛 Bug

When running with DDPPlugin, the DataLoaders are not configured with a DistributedSampler.

To Reproduce

Run this code in an LSFEnvironment. Regardless of the number of gpus specified (i.e. the first argument), the number of batches is always 3750. After doing some digging, it looks like auto_add_sampler is not setting the sampler correctly.

Expected behavior

When running with DDP training, the sampler for the DataLoaders should be set to a DistributedSampler.

Environment

  • PyTorch Version (e.g., 1.0): 1.7
  • OS (e.g., Linux): RedHat
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python setup.py develop
  • Python version: 3.8
  • CUDA/cuDNN version:
  • GPU models and configuration: 1 or 6 GPUs
  • Any other relevant information:

Additional context

@ajtritt ajtritt added bug Something isn't working help wanted Open to be worked on labels Jun 3, 2021
@ajtritt ajtritt changed the title DistributedSampler not used when using DDPlugin DistributedSampler not used when using DDPPlugin Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant