You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug was mentioned on Slack to @jeremyjordan and I have also seen it. Basically, when you enable DDP and set profiler=True for the trainer args it results in the error: TypeError: can't pickle _thread.RLock objects. I am unsure if this is the intended behavior given the parallel backend, but thought I would mention this.
To Reproduce
Set the distributed_backend to ddp and profiler to True. This will result in the bellow error:
File "/home/anthony/robotics/learning/cheap_robot/cli.py", line 89, in main
trainer.fit(model, train_dataloader=td)
File "/home/anthony/.cache/pypoetry/virtualenvs/robotics-zp-60jGk-py3.6/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 751, in fit
mp.spawn(self.ddp_train, nprocs=self.num_processes, args=(model,))
File "/home/anthony/.cache/pypoetry/virtualenvs/robotics-zp-60jGk-py3.6/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn
process.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Expected behavior
Either a warning saying that the profiler does not currently work with the ddp backend or the profiled time report with the ddp backend.
Environment
CUDA:
- GPU:
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- available: True
- version: 10.1
Related, if you set profiler=AdvancedProfiler() you get a TypeError: can't pickle Profile objects whereas with profiler=True it works.
Additionally, the doc mentions the possibility to call trainer = Trainer(..., profiler="advanced") whereas this seems not to be possible (also because the argument is of type profiler: Optional[Union[BaseProfiler, bool]] = None,.
🐛 Bug
This bug was mentioned on Slack to @jeremyjordan and I have also seen it. Basically, when you enable DDP and set profiler=True for the trainer args it results in the error:
TypeError: can't pickle _thread.RLock objects
. I am unsure if this is the intended behavior given the parallel backend, but thought I would mention this.To Reproduce
Set the distributed_backend to ddp and profiler to True. This will result in the bellow error:
Expected behavior
Either a warning saying that the profiler does not currently work with the ddp backend or the profiled time report with the ddp backend.
Environment
- GPU:
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- GeForce RTX 2080 Ti
- available: True
- version: 10.1
- numpy: 1.18.4
- pyTorch_debug: False
- pyTorch_version: 1.4.0
- pytorch-lightning: 0.7.5
- tensorboard: 2.1.1
- tqdm: 4.46.0
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.6.8
The text was updated successfully, but these errors were encountered: