Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Four multi-processing tests fail in environments with CUDA available when the test suite is run (or when at least one other test initializes CUDA) #14979

Closed
speediedan opened this issue Oct 3, 2022 · 4 comments · Fixed by #14982
Labels

Comments

@speediedan
Copy link
Contributor

Bug description

The four multi-processing tests below (two versions for each PL and Lite) are failing in environments with CUDA available when the test suite is run (or at least one other test initializes CUDA). I'll be submitting a PR shortly to fix:

tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork]
tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork]
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork]
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork]

How to reproduce the bug

pytest -v \
tests/tests_pytorch/accelerators/test_gpu.py::test_set_cuda_device \
tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] \
tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] \
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] \
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork]

...
==================================================================================== short test summary info ====================================================================================
FAILED tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] - RuntimeError: Lightning can't create new processes if CUDA is already...
FAILED tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] - RuntimeError: Lightning can't create new processes if CUDA is a...
=========================================================================== 4 failed, 1 passed ...

Error messages and logs

(error due to fork after CUDA already initialized)

RuntimeError: Lightning can't create new processes if CUDA is already...
pytest -v \
tests/tests_pytorch/accelerators/test_gpu.py::test_set_cuda_device \
tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] \
tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] \
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] \
tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork]

...

==================================================================================== short test summary info ====================================================================================
FAILED tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] - RuntimeError: Lightning can't create new processes if CUDA is already...
FAILED tests/tests_lite/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_start_method[fork] - RuntimeError: Lightning can't create new processes if CUDA is alre...
FAILED tests/tests_pytorch/strategies/launchers/test_multiprocessing.py::test_multiprocessing_launcher_restore_globals[fork] - RuntimeError: Lightning can't create new processes if CUDA is a...
=========================================================================== 4 failed, 1 passed

Environment

* CUDA:
	- GPU:
		- NVIDIA GeForce RTX 2070 SUPER
		- NVIDIA GeForce RTX 2070
	- available:         True
	- version:           11.6
* Lightning:
	- lightning:         2022.10.1
	- lightning-utilities: 0.3.0
	- pt-lightning-sphinx-theme: 0.0.30
	- pytorch-lightning: 1.8.0rc0
	- torch:             1.12.1
	- torchmetrics:      0.9.3
	- torchvision:       0.13.1
* Packages:
	- absl-py:           1.2.0
	- aiohttp:           3.8.3
	- aiosignal:         1.2.0
	- alabaster:         0.7.12
	- alembic:           1.8.1
	- antlr4-python3-runtime: 4.9.3
	- anyio:             3.6.1
	- argon2-cffi:       21.3.0
	- argon2-cffi-bindings: 21.2.0
	- asttokens:         2.0.8
	- async-generator:   1.10
	- async-timeout:     4.0.2
	- attrs:             22.1.0
	- babel:             2.10.3
	- backcall:          0.2.0
	- base58:            2.1.1
	- beautifulsoup4:    4.11.1
	- black:             22.8.0
	- bleach:            5.0.1
	- boto3:             1.24.84
	- botocore:          1.27.84
	- bracex:            2.3.post1
	- bravado:           11.0.3
	- bravado-core:      5.17.1
	- brotlipy:          0.7.0
	- cachetools:        5.2.0
	- certifi:           2022.9.24
	- cffi:              1.15.1
	- cfgv:              3.3.1
	- charset-normalizer: 2.1.1
	- click:             8.1.3
	- cloudpickle:       2.2.0
	- codecov:           2.1.12
	- coloredlogs:       15.0.1
	- comet-ml:          3.31.14
	- commonmark:        0.9.1
	- configargparse:    1.5.3
	- configobj:         5.0.6
	- contourpy:         1.0.5
	- coverage:          6.5.0
	- cryptography:      35.0.0
	- curio:             1.5
	- cycler:            0.11.0
	- databricks-cli:    0.17.3
	- debugpy:           1.6.3
	- decorator:         5.1.1
	- deepspeed:         0.7.3
	- defusedxml:        0.7.1
	- distlib:           0.3.6
	- docker:            6.0.0
	- docker-pycreds:    0.4.0
	- docstring-parser:  0.15
	- docutils:          0.17.1
	- dulwich:           0.20.46
	- entrypoints:       0.4
	- everett:           3.0.0
	- exceptiongroup:    1.0.0rc9
	- executing:         1.1.0
	- fairscale:         0.4.11
	- fastapi:           0.85.0
	- fastjsonschema:    2.16.2
	- filelock:          3.8.0
	- fire:              0.4.0
	- flask:             2.2.2
	- flatbuffers:       22.9.24
	- fonttools:         4.37.4
	- frozenlist:        1.3.1
	- fsspec:            2022.8.2
	- future:            0.18.2
	- gitdb:             4.0.9
	- gitpython:         3.1.27
	- google-auth:       2.12.0
	- google-auth-oauthlib: 0.4.6
	- greenlet:          1.1.3
	- grpcio:            1.49.1
	- grpcio-tools:      1.48.2
	- gunicorn:          20.1.0
	- gym:               0.26.1
	- gym-notices:       0.0.8
	- h11:               0.14.0
	- hivemind:          1.1.1
	- hjson:             3.1.0
	- horovod:           0.25.0
	- humanfriendly:     10.0
	- hydra-core:        1.2.0
	- identify:          2.5.5
	- idna:              3.4
	- imagesize:         1.4.1
	- importlib-metadata: 4.13.0
	- iniconfig:         1.1.1
	- ipykernel:         6.16.0
	- ipyparallel:       8.4.1
	- ipython:           8.5.0
	- ipython-genutils:  0.2.0
	- ipywidgets:        8.0.2
	- itsdangerous:      2.1.2
	- jedi:              0.18.1
	- jinja2:            3.0.3
	- jmespath:          1.0.1
	- joblib:            1.2.0
	- jsonargparse:      4.15.0
	- jsonpointer:       2.3
	- jsonref:           0.2
	- jsonschema:        3.2.0
	- jupyter-client:    7.3.5
	- jupyter-core:      4.11.1
	- jupyterlab-pygments: 0.2.2
	- jupyterlab-widgets: 3.0.3
	- kiwisolver:        1.4.4
	- lightning:         2022.10.1
	- lightning-utilities: 0.3.0
	- lxml:              4.9.1
	- mako:              1.2.3
	- markdown:          3.4.1
	- markdown-it-py:    2.1.0
	- markupsafe:        2.1.1
	- matplotlib:        3.6.0
	- matplotlib-inline: 0.1.6
	- mdit-py-plugins:   0.3.1
	- mdurl:             0.1.2
	- mistune:           2.0.4
	- mkl-fft:           1.3.1
	- mkl-random:        1.2.2
	- mkl-service:       2.4.0
	- mlflow:            1.29.0
	- monotonic:         1.6
	- mpmath:            1.2.1
	- msgpack:           1.0.4
	- multiaddr:         0.0.9
	- multidict:         6.0.2
	- mypy:              0.971
	- mypy-extensions:   0.4.3
	- myst-parser:       0.16.1
	- nbclient:          0.6.8
	- nbconvert:         7.0.0
	- nbformat:          5.6.1
	- nbsphinx:          0.8.9
	- neptune-client:    0.16.9
	- nest-asyncio:      1.5.6
	- netaddr:           0.8.0
	- ninja:             1.10.2.4
	- nodeenv:           1.7.0
	- notebook:          6.4.12
	- numpy:             1.23.1
	- oauthlib:          3.2.1
	- olefile:           0.46
	- omegaconf:         2.2.3
	- onnxruntime:       1.12.1
	- outcome:           1.2.0
	- packaging:         21.3
	- pandas:            1.5.0
	- pandoc:            2.2
	- pandocfilters:     1.5.0
	- parso:             0.8.3
	- pathspec:          0.10.1
	- pathtools:         0.1.2
	- pexpect:           4.8.0
	- pickleshare:       0.7.5
	- pillow:            7.2.0
	- pip:               22.2.2
	- platformdirs:      2.5.2
	- pluggy:            1.0.0
	- plumbum:           1.7.2
	- ply:               3.11
	- pre-commit:        2.20.0
	- prefetch-generator: 1.0.1
	- prometheus-client: 0.14.1
	- prometheus-flask-exporter: 0.20.3
	- promise:           2.3
	- prompt-toolkit:    3.0.31
	- protobuf:          3.19.6
	- psutil:            5.9.2
	- pt-lightning-sphinx-theme: 0.0.30
	- ptyprocess:        0.7.0
	- pure-eval:         0.2.2
	- py:                1.11.0
	- py-cpuinfo:        8.0.0
	- pyasn1:            0.4.8
	- pyasn1-modules:    0.2.8
	- pycparser:         2.21
	- pydantic:          1.10.2
	- pygame:            2.1.0
	- pygments:          2.13.0
	- pyjwt:             2.5.0
	- pymultihash:       0.8.2
	- pyopenssl:         22.0.0
	- pyparsing:         3.0.9
	- pyrsistent:        0.18.1
	- pysocks:           1.7.1
	- pytest:            7.0.1
	- pytest-asyncio:    0.19.0
	- pytest-cov:        4.0.0
	- pytest-forked:     1.4.0
	- pytest-rerunfailures: 10.2
	- python-dateutil:   2.8.2
	- pytorch-lightning: 1.8.0rc0
	- pytz:              2022.2.1
	- pyyaml:            6.0
	- pyzmq:             24.0.1
	- qtconsole:         5.3.2
	- qtpy:              2.2.0
	- querystring-parser: 1.2.4
	- requests:          2.28.1
	- requests-oauthlib: 1.3.1
	- requests-toolbelt: 0.9.1
	- rfc3987:           1.3.8
	- rich:              12.5.1
	- rsa:               4.9
	- s3transfer:        0.6.0
	- scikit-learn:      1.1.2
	- scipy:             1.9.1
	- semantic-version:  2.10.0
	- send2trash:        1.8.0
	- sentry-sdk:        1.9.9
	- setproctitle:      1.3.2
	- setuptools:        63.4.1
	- shortuuid:         1.0.9
	- simplejson:        3.17.6
	- six:               1.16.0
	- smmap:             5.0.0
	- sniffio:           1.3.0
	- snowballstemmer:   2.2.0
	- sortedcontainers:  2.4.0
	- soupsieve:         2.3.2.post1
	- sphinx:            4.5.0
	- sphinx-autodoc-typehints: 1.14.1
	- sphinx-copybutton: 0.5.0
	- sphinx-multiproject: 1.0.0rc1
	- sphinx-paramlinks: 0.5.4
	- sphinx-togglebutton: 0.3.2
	- sphinxcontrib-applehelp: 1.0.2
	- sphinxcontrib-devhelp: 1.0.2
	- sphinxcontrib-fulltoc: 1.2.0
	- sphinxcontrib-htmlhelp: 2.0.0
	- sphinxcontrib-jsmath: 1.0.1
	- sphinxcontrib-mockautodoc: 0.0.1.dev20130518
	- sphinxcontrib-qthelp: 1.0.3
	- sphinxcontrib-serializinghtml: 1.1.5
	- sqlalchemy:        1.4.41
	- sqlparse:          0.4.3
	- stack-data:        0.5.1
	- starlette:         0.20.4
	- strict-rfc3339:    0.7
	- swagger-spec-validator: 2.7.6
	- sympy:             1.11.1
	- tabulate:          0.8.10
	- tensorboard:       2.10.1
	- tensorboard-data-server: 0.6.1
	- tensorboard-plugin-wit: 1.8.1
	- termcolor:         2.0.1
	- terminado:         0.16.0
	- testpath:          0.6.0
	- threadpoolctl:     3.1.0
	- tinycss2:          1.1.1
	- toml:              0.10.2
	- tomli:             2.0.1
	- torch:             1.12.1
	- torchmetrics:      0.9.3
	- torchvision:       0.13.1
	- tornado:           6.2
	- tqdm:              4.64.1
	- traitlets:         5.4.0
	- trio:              0.22.0
	- types-croniter:    1.3.2
	- types-protobuf:    3.20.4
	- types-python-dateutil: 2.8.19
	- types-pyyaml:      6.0.12
	- types-redis:       4.3.21
	- types-requests:    2.28.11
	- types-setuptools:  65.4.0.0
	- types-six:         1.16.21
	- types-ujson:       5.5.0
	- types-urllib3:     1.26.25
	- typing-extensions: 4.3.0
	- urllib3:           1.26.11
	- uvicorn:           0.18.3
	- uvloop:            0.17.0
	- varint:            1.0.2
	- virtualenv:        20.16.5
	- wandb:             0.13.3
	- wcmatch:           8.4.1
	- wcwidth:           0.2.5
	- webcolors:         1.12
	- webencodings:      0.5.1
	- websocket-client:  1.3.3
	- werkzeug:          2.2.2
	- wheel:             0.37.1
	- widgetsnbextension: 4.0.3
	- wrapt:             1.14.1
	- wurlitzer:         3.0.2
	- yarl:              1.8.1
	- zipp:              3.8.1
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.9.13
	- version:           #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022

More info

No response

@speediedan speediedan added the needs triage Waiting to be triaged by maintainers label Oct 3, 2022
@carmocca carmocca added tests and removed needs triage Waiting to be triaged by maintainers labels Oct 4, 2022
@awaelchli
Copy link
Contributor

@speediedan Thanks for noticing and proposing the fix. I noticed it too and wanted to address it for a while.

@carmocca You closed #14982 but I think we should still land it, as it fixes the two remaining tests (on lite side) that were failing and not covered in #14550

@carmocca
Copy link
Contributor

carmocca commented Oct 4, 2022

My bad!

@speediedan
Copy link
Contributor Author

My bad!

No worries, thanks for your all your work man!

@speediedan
Copy link
Contributor Author

@speediedan Thanks for noticing and proposing the fix. I noticed it too and wanted to address it for a while.

@carmocca You closed #14982 but I think we should still land it, as it fixes the two remaining tests (on lite side) that were failing and not covered in #14550

Happy to help, thanks for all your work guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants