-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA_Runtime_Discovery Did not find cupti on Arm system with nvhpc #2433
CUDA_Runtime_Discovery Did not find cupti on Arm system with nvhpc #2433
Comments
Please post the full debug output. From the partial output, it looks like CUDA_Runtime_Discovery.jl has discovered |
@maleadt thanks for the quick response and guidance. We can confirm that cupti is found when setting the proper CUDA_HOME path. As you pointed out, this is non-standard as different CUDA versions needed to coexist on this platform. I will go ahead and close this issue. Thanks a lot. |
FWIW, I ran into the same thing on DelftBlue (not an ARM system). Fixed by setting |
Does |
Looks like it: tree_nvhpc_cuda_121.txt
Good question. Maybe CUDA_Runtime_Discovery.jl can also look for |
Since I don't have access to a system with NVHPC set-up like that, could you maybe create a PR to CUDA_Runtime_Discovery.jl that works on your cluster? |
In my case I had used |
I'm busy with other stuff in the next couple of days, but I'll try to find some time for it. |
For future readers, we now try to deduce the CUDA path from the |
Describe the bug
CUDA.jl can't find cupti even though the nvhpc 24.5 search location for path
extras/CUPTI
is supported byCUDA_Runtime_Discovery
as in this lineThis is happening on an Arm cluster at OLCF - Wombat. Any help would be appreciated.
To reproduce
$ JULIA_DEBUG=CUDA_Runtime_Discovery julia
This is reproducible with the CUDA.jl master branch and the libraries exist:
Expected behavior
CUDA_Runtime_Discovery should be able to find the existing nvhpc libraries
Version info
Details on Julia: v1.10.4 for Arm
Details on CUDA:
Additional context
Tried also using
> add CUDA_Runtime_Discovery#master
but with the same outcome as above.Also setting
CUDA.set_runtime_version!(v"12.4"; local_toolkit=true)
did not help.CC @cwinogrodzki
nvidia-smi can see the GPUs:
The text was updated successfully, but these errors were encountered: