Four multi-processing tests fail in environments with CUDA available when the test suite is run (or when at least one other test initializes CUDA) #14979

speediedan opened this issue Oct 3, 2022 · 4 comments · Fixed by #14982


Bug description

The four multi-processing tests below (two versions for each PL and Lite) are failing in environments with CUDA available when the test suite is run (or at least one other test initializes CUDA). I'll be submitting a PR shortly to fix:


How to reproduce the bug

pytest -v \
tests/tests_pytorch/accelerators/ \
tests/tests_lite/strategies/launchers/[fork] \
tests/tests_lite/strategies/launchers/[fork] \
tests/tests_pytorch/strategies/launchers/[fork] \

==================================================================================== short test summary info ====================================================================================
FAILED tests/tests_lite/strategies/launchers/[fork] - RuntimeError: Lightning can't create new processes if CUDA is already...
FAILED tests/tests_lite/strategies/launchers/[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/[fork] - RuntimeError: Lightning can't create new processes if CUDA is a...
=========================================================================== 4 failed, 1 passed ...

Error messages and logs

(error due to fork after CUDA already initialized)

RuntimeError: Lightning can't create new processes if CUDA is already...
	- GPU:
		- NVIDIA GeForce RTX 2070 SUPER
		- NVIDIA GeForce RTX 2070
	- available:         True
	- version:           11.6
* Lightning:
	- lightning:         2022.10.1
	- lightning-utilities: 0.3.0
	- pt-lightning-sphinx-theme: 0.0.30
	- pytorch-lightning: 1.8.0rc0
	- torch:             1.12.1
	- torchmetrics:      0.9.3
	- torchvision:       0.13.1
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.9.13
	- version:           #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022

More info

@speediedan Thanks for noticing and proposing the fix. I noticed it too and wanted to address it for a while.

@carmocca You closed #14982 but I think we should still land it, as it fixes the two remaining tests (on lite side) that were failing and not covered in #14550

carmocca commented Oct 4, 2022

My bad!

My bad!

No worries, thanks for your all your work man!

@speediedan Thanks for noticing and proposing the fix. I noticed it too and wanted to address it for a while.

@carmocca You closed #14982 but I think we should still land it, as it fixes the two remaining tests (on lite side) that were failing and not covered in #14550

Happy to help, thanks for all your work guys!

