DistributedSampler not used when using DDPPlugin #7813

ajtritt · 2021-06-03T04:00:05Z

🐛 Bug

When running with DDPPlugin, the DataLoaders are not configured with a DistributedSampler.

To Reproduce

Run this code in an LSFEnvironment. Regardless of the number of gpus specified (i.e. the first argument), the number of batches is always 3750. After doing some digging, it looks like auto_add_sampler is not setting the sampler correctly.

Expected behavior

When running with DDP training, the sampler for the DataLoaders should be set to a DistributedSampler.

Environment

PyTorch Version (e.g., 1.0): 1.7
OS (e.g., Linux): RedHat
How you installed PyTorch (conda, pip, source): source
Build command you used (if compiling from source): python setup.py develop
Python version: 3.8
CUDA/cuDNN version:
GPU models and configuration: 1 or 6 GPUs
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

ajtritt added bug Something isn't working help wanted Open to be worked on labels Jun 3, 2021

ajtritt changed the title ~~DistributedSampler not used when using DDPlugin~~ DistributedSampler not used when using DDPPlugin Jun 3, 2021

ajtritt mentioned this issue Jun 3, 2021

Use DistributedSampler when running with custom accelerator #7814

Merged

11 tasks

awaelchli closed this as completed in #7814 Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DistributedSampler not used when using DDPPlugin #7813

DistributedSampler not used when using DDPPlugin #7813

ajtritt commented Jun 3, 2021

DistributedSampler not used when using DDPPlugin #7813

DistributedSampler not used when using DDPPlugin #7813

Comments

ajtritt commented Jun 3, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context