Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile CUDA libraries for aarch64-linux #10223

Open
imciner2 opened this issue Jan 9, 2025 · 6 comments
Open

Compile CUDA libraries for aarch64-linux #10223

imciner2 opened this issue Jan 9, 2025 · 6 comments

Comments

@imciner2
Copy link
Member

imciner2 commented Jan 9, 2025

That's correct, we currently can only generate CUDA binaries for targets that match the host system (i.e., x86_64). In fact, I recently investigated exactly this for libxc, but didn't manage to get it working. I'll copy my conclusions here, before Slack swallows them:

I took a brief look at libxc for aarch64, and there's a bunch of issues preventing us to move forward:

  • CUDA_SDK_jll is currently installed as a BuildDependency, so can't be executed by the host arch. We could switch this to a HostBuildDependency, however the host environment is musl, while the CUDA SDK is glibc. Often that works out OK-ish, but Pkg refuses to download the glibc artifact when instantiating the musl env
  • I tried switching the compiler to Clang, which is easy enough by doing -DCMAKE_CUDA_COMPILER=clang, however that exposes a couple of issues. one, somehow --target= leaks into the command line flags when CMake identifies the compiler, breaking all sort of stuff. Fixing that to say --target=aarch64-..., a header isn't found (__config_site). This seems caused by the fact that LLVM has a bug, looking into the wrong locations, as noted here: https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/ac6831078a4241d85ff891e6067a06a9e6dc1052/src/Runner.jl#L431-L442. apparently that needs to be generalized to all Clang-based platforms, which I verified works by jerry rigging the invocation to include nostdinc++. The header location added in the linked change doesn't seem to exist on aarch64, which may be problematic later down the line, but I didn't get that far because:
  • using Clang as the CUDA compiler still wants to execute ptxas, which brings us back to the initial issue of CUDA_SDK_jll not being executable. so we would probably need to fix that anyway, i.e., support either overriding the platform to allow using glibc binaries on musl so that HostBuildDependency works, or making sure foreign binaries are executable.
  • I decided to try the former using qemu-use-static, however, our Qemu_static_jll isn't built for musl either, meaning it can't be installed as a HostBuildDependency either. I started fixing that by attempting a rebuild of Qemu for musl, however, we're using musl 1.2.2 in the musl rootfs which doesn't yet have MAP_FIXED_NOREPLACE as used by qemu.
  • it also should be said that even with qemu-user-static as a HostBuildDependency in the container, not everything is fixed, because the current version of the sandbox doesn't grant you access to proc/binfmt, meaning you can't register qemu-user-static as an interpreter for foreign binaries, but would need to replace tools like nvcc and ptxas with wrappers that invoke under qemu-user-static. But I didn't get to that part because of not managing to upgrade qemu

With BinaryBuilder2.jl, the qemu/binfmt solution will be integrated, and we should be able to automatically execute foreign binaries and depend on the target-specific CUDA SDK. Given the amount of work it would require to get it working right now, I decided to wait for BinaryBuilder2.jl.

Originally posted by @maleadt in #10217 (comment)

@imciner2
Copy link
Member Author

imciner2 commented Jan 9, 2025

CUDA_SDK_jll is currently installed as a BuildDependency, so can't be executed by the host arch. We could switch this to a HostBuildDependency, however the host environment is musl, while the CUDA SDK is glibc. Often that works out OK-ish, but Pkg refuses to download the glibc artifact when instantiating the musl env

We already are running the glibc-based programs for the current compilation flow, so I think the main problem is the tag matching and Pkg. Perhaps we could make a "fake" musl version of the CUDA_SDK_jll that just includes all the glibc files again. Then the HostBuildDependency should have a package matching its specification.

Originally posted by @imciner2 in #10217 (comment)

@imciner2
Copy link
Member Author

imciner2 commented Jan 9, 2025

Yes, but even then I'm not sure that the x86_64 version of the CUDA SDK will know how to target ARM, because they're pretty clear it's not a cross compiler. I think it's best to wait until we can execute the ARM version under qemu (and the same for Windows using Wine).

Originally posted by @maleadt in #10217 (comment)

@giordano
Copy link
Member

  • I tried switching the compiler to Clang, which is easy enough by doing -DCMAKE_CUDA_COMPILER=clang, however that exposes a couple of issues. one, somehow --target= leaks into the command line flags when CMake identifies the compiler, breaking all sort of stuff. Fixing that to say --target=aarch64-..., a header isn't found (__config_site). This seems caused by the fact that LLVM has a bug, looking into the wrong locations, as noted here: https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/ac6831078a4241d85ff891e6067a06a9e6dc1052/src/Runner.jl#L431-L442. apparently that needs to be generalized to all Clang-based platforms, which I verified works by jerry rigging the invocation to include nostdinc++. The header location added in the linked change doesn't seem to exist on aarch64, which may be problematic later down the line

That's addressed by #10322.

@giordano
Copy link
Member

Actually, libc++ may not even be needed (but now we have it anyway), the order of the headers in the search paths was messed up: JuliaPackaging/BinaryBuilderBase.jl#405

@giordano
Copy link
Member

giordano commented Jan 24, 2025

Some updates from what I understand:

  • [Runner] Adjust clang flags to make it work slightly better on Linux BinaryBuilderBase.jl#405 seems to give us a usable Clang for compiling CUDA code, haven't had problems with that specifically after that PR
  • ptxas/fatbinary are indeed still called, as predicted. we need to be able to use the x86_64 ones (they seem to run fine once you manage to pull them in)
  • this is still to understand better, but it seems we need to have aarch64 header files of nvcc, otherwise if we use the x86_64 distribution of nvcc we get also the x86_64 headers which conflict with C++ stdlib header files.

Edit: error I was facing related to the last point was

In file included from external/xla/xla/stream_executor/cuda/delay_kernel_cuda.cu.cc:18:
In file included from external/xla/xla/stream_executor/cuda/delay_kernel.h:19:
In file included from external/com_google_absl/absl/status/statusor.h:49:
In file included from external/com_google_absl/absl/status/internal/statusor_internal.h:22:
In file included from external/com_google_absl/absl/status/status.h:58:
In file included from external/com_google_absl/absl/functional/function_ref.h:54:
In file included from external/com_google_absl/absl/functional/internal/function_ref.h:23:
In file included from external/com_google_absl/absl/functional/any_invocable.h:42:
In file included from external/com_google_absl/absl/functional/internal/any_invocable.h:62:
In file included from /opt/aarch64-linux-gnu/aarch64-linux-gnu/include/c++/13.2.0/memory:80:
In file included from /opt/aarch64-linux-gnu/aarch64-linux-gnu/include/c++/13.2.0/bits/shared_ptr.h:53:
/opt/aarch64-linux-gnu/aarch64-linux-gnu/include/c++/13.2.0/bits/shared_ptr_base.h:196:22: error: expected expression
external/cuda_nvcc/include/crt/host_defines.h:91:33: note: expanded from macro '__noinline__'
   91 |         __attribute__((noinline))
      |                                 ^
9 errors generated when compiling for sm_50.

but that's not actually related to header files for aarch64 vs x86_64 (they're actually identical, compare them in https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-12.6.77-archive.tar.xz vs https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-sbsa/cuda_nvcc-linux-sbsa-12.6.77-archive.tar.xz for example), that's instead NVIDIA/thrust#1703 and due to the fact I'm using libstdc++ 13, which conflicts with CUDA 12.6. Sounds like there's hope.

@giordano
Copy link
Member

Update: using nvcc as GPU compiler (to avoid the issue at the last point) and clang as CPU compiler I can successfully compile and link a CUDA package. Next, I need to make sure this actually works 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants