-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch 2.4.0 Package Not Installable w/ CUDA 12 on Python 3.12 Linux x86_64 #254
Comments
I think it might be because our builds stalled... |
I expect it to take like 13 hours. Please check and report! thanks! |
No problem, thanks for the quick response! Will test tomorrow. |
Thanks Mark! 🙏 Looks like one failed. Unfortunately this appears to be after the build, but during the conda-build DSO checking phase Are these kinds of CI issue common here? If so, what things would you recommend (say to a provider) to address the reliability issues? |
not sure if that is true, the other seemed to have failed during hte building phase. I had to restart the aarch64 jobs. |
That was what the last part of the log that I could see in GitHub last night. Perhaps they had trouble loading? The log files are quite long Looking today using the raw log to get them to load fully (attached in compressed form below to meet size limitations), am seeing the following in those jobs From the CUDA 12 Linux ARM job ( attached compressed log ): + python -c 'import torch; torch.tensor(1).to('\''cpu'\'').numpy(); print('\''numpy support enabled!!!'\'')'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/conda/feedstock_root/build_artifacts/libtorch_1724888760332/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.8/site-packages/torch/__init__.py", line 290, in <module>
from torch._C import * # noqa: F403
ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home/conda/feedstock_root/build_artifacts/libtorch_1724888760332/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.8/site-packages/torch/../../.././libcurand.so.10) Unfortunately some CUDA libraries moving to EL8: conda-forge/cuda-feedstock#28 So to run this test we likely need to use the AlmaLinux 8 image. An example would be PR: conda-forge/faiss-split-feedstock#75 Alternatively we could just skip this test on CUDA ARM. Presumably if the CPU one passes, this is a pretty good indication of whether this one will pass From the CPU-only Linux ARM job ( attached compressed log ): + python -c 'import torch; torch.tensor(1).to('\''cpu'\'').numpy(); print('\''numpy support enabled!!!'\'')'
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: PyTorch was compiled without NumPy support Though it looks like the CPU ARM test doesn't pass atm. Think you understand this better than I. Guessing we need to broaden this workaround to cover ARM: #252 ? |
I'm glad my test worked.... |
Possibly relevant: We encountered a "PyTorch was compiled without NumPy support" error when running on Linux aarch64 + CUDA (on NVIDIA GH200) using the conda-forge build of PyTorch 2.4.0. Relevant output from
Rolling back to 2.3.0 remedied this issue. Looking at the build number, it seems that the build we installed preceded merging PR #252. |
Thanks James! Yep this is expected In PR ( #252 ), Mark worked around a bug in CMake to fix ensure PyTorch builds with NumPy and tested it in the recipe. These packages would show up with a As noted above ( #254 (comment) ), this test appears to be working correctly. However it shows that the Linux ARM builds are failing. So no packages are available with Am guessing fixing this would be taking this code pytorch-cpu-feedstock/recipe/meta.yaml Lines 100 to 101 in 6dd85b3
...and changing it like so... - - cmake !=3.30.0,!=3.30.1,!=3.30.2 # [osx and blas_impl == "mkl"]
- - cmake # [not (osx and blas_impl == "mkl")]
+ - cmake !=3.30.0,!=3.30.1,!=3.30.2 # [unix]
+ - cmake # [not unix] @jcwomack is this something you would be willing to try in a new PR? 🙂 |
Hi @jakirkham, thanks for the quick response! Apologies, but I've got quite limited availability for the next week or so, so would not be able to work on a PR myself at this time. |
The original issue is resolved. I opened #266 to track the aarch + numpy issue. |
As the issue around PyTorch being built without NumPy was fixed in conda-forge, we can now relax these upper bounds to allow PyTorch 2.4. xref: conda-forge/pytorch-cpu-feedstock#254 xref: conda-forge/pytorch-cpu-feedstock#266 xref: rapidsai/cugraph#4615 xref: rapidsai/cugraph#4703 xref: #59 Authors: - https://github.com/jakirkham Approvers: - Jake Awe (https://github.com/AyodeAwe) - Tingyu Wang (https://github.com/tingyu66) URL: #75
As the issue around PyTorch being built without NumPy was fixed in conda-forge, we can now relax these upper bounds to allow PyTorch 2.4. xref: conda-forge/pytorch-cpu-feedstock#254 xref: conda-forge/pytorch-cpu-feedstock#266 xref: #4615 Authors: - https://github.com/jakirkham - Alex Barghi (https://github.com/alexbarghi-nv) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) - James Lamb (https://github.com/jameslamb) URL: #4703
Solution to issue cannot be found in the documentation.
Issue
On a Linux x86_64 machine:
Interestingly, the CUDA 11.8 variant is picked when using this solve. I ran this using the
libmamba
solver but it's also an issue with the classicsolver
(which ends up ignoringCONDA_OVERRIDE_CUDA
and picks the cpu_generic_py312 variant).2.3.1 does not have this issue. That is, if I run
CONDA_OVERRIDE_CUDA=12 conda install "pytorch<2.4.0"
I get a CUDA 12 version of PyTorch in the solve.Installed packages
Environment info
The text was updated successfully, but these errors were encountered: