Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pin_memory should be true only if gpus are specified #245

Closed
igormq opened this issue Sep 22, 2020 · 5 comments · Fixed by #388
Closed

pin_memory should be true only if gpus are specified #245

igormq opened this issue Sep 22, 2020 · 5 comments · Fixed by #388
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Milestone

Comments

@igormq
Copy link

igormq commented Sep 22, 2020

Hey folks,

First of all, thank you for this amazing package. It has saved many hours of coding :)

I was using for the first time the pl data modules when I hit the following line
cifar10_datamodule.py#L132. The problem is: when I am training/evaluating/testing on CPU, the data module try to pin the data and try to reserve a batch size memory amount in gpu. Nothing goes wrong if I have GPU with enough memory ran, but, when I try to run in CPU mode with all my GPUs in use I get the following error:

RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
  File "/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 31, in _pin_memory_loop
    data = pin_memory(data)
  File "/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
    return [pin_memory(sample) for sample in data]
  File "lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in <listcomp>
    return [pin_memory(sample) for sample in data]
  File "lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 47, in pin_memory
    return data.pin_memory()
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCCachingHostAllocator.cpp:278

The fix is simple and we can borrow the code from the pl core implementation as in pytorch_lightning/accelerators/accelerator_connector.py#L90

Bests.

@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@igormq igormq changed the title pin_memory should be true only if it is a gpu training/test/eval pin_memory should be true only if gpus is specified Sep 22, 2020
@igormq igormq changed the title pin_memory should be true only if gpus is specified pin_memory should be true only if gpus are specified Sep 22, 2020
@nateraw
Copy link
Contributor

nateraw commented Sep 22, 2020

@igormq mind sending a PR? if not I can address this.

@nateraw
Copy link
Contributor

nateraw commented Sep 22, 2020

I think the main thing here is to just include the pin_memory flag in the datamodules. I think by default it makes sense to set it to True either way? please correct me if I'm wrong 😄

@Borda Borda added fix fixing issues... good first issue Good for newcomers help wanted Extra attention is needed labels Oct 15, 2020
@briankosw
Copy link
Contributor

@nateraw I think even beyond the pin_memory flag, some other flags should also be included such as the shuffle and drop_last flag. What do you think? I can work on a pull request for updating the flags in the datamodules.

@nateraw
Copy link
Contributor

nateraw commented Nov 18, 2020

@briankosw sure! for the shuffle and drop_last flags, lets just make sure we use sensible defaults. 😄 Thanks!!

@Borda Borda added this to the v0.3 milestone Jan 18, 2021
@Borda Borda added bug Something isn't working and removed fix fixing issues... labels Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants