Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: libcupti.so.10.2: cannot open shared object file: No such file or directory #5635

Closed
datumbox opened this issue Mar 17, 2022 · 5 comments · Fixed by #5648
Closed

Comments

@datumbox
Copy link
Contributor

datumbox commented Mar 17, 2022

🐛 Describe the bug

Multiple binary_linux_conda_*_cu* jobs are currently failing on the latest main with the following error:

Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    import torch
  File "/opt/conda/conda-bld/torchvision_1647515141078/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/torch/__init__.py", line 199, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcupti.so.10.2: cannot open shared object file: No such file or directory

Seems like a missing dependency problem. Same applies for libcupti.so.11.3 here.

Versions

Latest main branch 39772ec

@malfet
Copy link
Contributor

malfet commented Mar 20, 2022

Thank you for reporting. Likely caused by the change that makes cupti a dynamic dependency.
[edit] Hmm, it's a bit more nuanced then that: cudatoolkit from nvidia channel includes libcupti, while the one from anaconda does not. As a short term workaround, lets see if adding this channel as dependency will fix things.
cc: @ezyang

@ezyang
Copy link
Contributor

ezyang commented Mar 20, 2022

We can switch back to statically linking cupti as well, I just need to do what malfet suggested when I changed the defaults anyway!

@malfet
Copy link
Contributor

malfet commented Mar 21, 2022

@ezyang imo we should just bundle cupti into the conda package

@ezyang
Copy link
Contributor

ezyang commented Mar 21, 2022

works too. Do you need me to try to cook this up?

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Jul 18, 2022
A new version of `pytorch` has been released, `1.12.0`. This version's packages don't statically link to `libcupti`(More explanation on that [here](pytorch/vision#5635)). Until that is patched, we are going to run into the `libcupti` not found error: pytorch/pytorch#74473 (comment)

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #11289
@pickles-bread-and-butter
Copy link

pickles-bread-and-butter commented Aug 8, 2022

Hej @malfet I'm facing the same issue with my build system, error below

/home/ubuntu/.cache/bazel/_bazel_ubuntu/8de0a1069de8d166c668173ca21c04ae/execroot/com_lyft_avsoftware/bazel-out/k8-fastbuild/bin/src/chronos/tasks/feed_task_cuda_test.runfiles/com_lyft_avsoftware/src/chronos/tasks/feed_task_cuda_test: 

error while loading shared libraries: libcupti.so.11.0: cannot open shared object file: No such file or directory

To clarify I build my own pytorch and vision distros using our internal cuda == 11.0, lately when upgrading from torch 1.9 -> 1.12 and vision 1.10 -> 1.13 this same error has been coming up. We cannot add on to the LD_LIBRARY_PATH in parts our build system as we rely on bazel and it's tedious to add these files as dependency to every build they're needed on. Do you not plan on supporting the static links again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants