Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CUDA 12.2 conda packages #8

Closed
2 tasks done
Tracked by #6
vyasr opened this issue Dec 15, 2023 · 8 comments
Closed
2 tasks done
Tracked by #6

Add support for CUDA 12.2 conda packages #8

vyasr opened this issue Dec 15, 2023 · 8 comments
Assignees

Comments

@vyasr
Copy link
Contributor

vyasr commented Dec 15, 2023

We would like to start publishing conda packages that support versions of CUDA newer than CUDA 12.0. At the moment, this is blocked on efforts to get the CTK on conda-forge updated to a sufficiently new version. As of this writing, we are currently updating the conda-forge CTK to 12.1.1. Our plan is to continue the cf update process, and whatever the latest version of the CTK is that's available via cf on Jan 8, 2024, we will use that version for building RAPIDS 24.02 packages.

Assuming that #7 is completed before this, the main tasks will be to:

  • Modify the conda shared-workflows to use the new images in building conda packages These jobs should be set up to continue on error
  • Follow up with different RAPIDS developer teams to address any issues with builds that arise in CUDA 12.x builds. This job will mostly involve coordinating a response from the teams; the assignee of this issue is not responsible for actually fixing said builds.

Step 2 above will likely involve making updates to dependency files in various RAPIDS repos.

This issue will be filled out more and updated once the conda-forge updates are completed and the version finalized.

@jameslamb
Copy link
Member

conda-forge/cuda-feedstock#13 is tracking the rollout of CUDA 12.2 to conda-forge.

@bdice
Copy link
Contributor

bdice commented Jan 2, 2024

We will need to resolve a blocking issue, since RAPIDS requires gcc 11 and that conflicts with cuda-nvcc which requires gcc 12. I have filed a PR with a fix.

@jakirkham
Copy link
Member

Both of those issues are resolved

@jakirkham
Copy link
Member

Should add the gcc constraint only affected cuda-nvcc (usually used in dev environments)

cuda-nvcc_{{ target_platform }} is what is used by {{ compiler("cuda") }} and did not have this issue

@jameslamb jameslamb self-assigned this Jan 11, 2024
@jakirkham
Copy link
Member

We merged pynvjitlink's Conda package builds yesterday ( rapidsai/pynvjitlink#33 )

Have started a PR to release pynvjitlink 0.1.7 ( rapidsai/pynvjitlink#42 ), which will be needed to build and upload the Conda packages

@jakirkham
Copy link
Member

Also James has submitted PRs adding Conda & Wheels to RAPIDS projects

A full listing of the PRs with current status is in this comment ( #7 (comment) )

@bdice
Copy link
Contributor

bdice commented Jan 30, 2024

We will need to modify conda recipes to enforce CUDA Minor Version Compatibility (MVC). This stems from a discussion I started here: rapidsai/raft#2092 (comment)

There are three (two?) basic issues.

1. Ignore run-exports from compiler('cuda')

rmm example commit: rapidsai/rmm@ff8ea2d

The compiler('cuda') has a strong run-export of cuda-version. We discussed this and decided this is a good and intentional behavior for the cuda-nvcc compiler package, because not all CUDA software obeys the rules for Minor Version Compatibility. However, RAPIDS does. We need to ignore this run-export, because it will prevent packages built with CUDA 12.2 from being installed with cuda-version=12.0 or similar.

Concretely, this means updating the existing sections that are ignoring run exports from the CUDA 11 compiler package, and adding the CUDA 12 compiler package as shown in the rmm example.

2. Fix host/run dependencies so we do not inherit incorrect run-exports

rmm example commit: rapidsai/rmm@135c259

Initially, when we created CUDA 12 conda packages for RAPIDS, we relied on putting -dev packages in the host environment so that they would create run_exports and add the non-dev package to the run dependencies. For example, we added cuda-cudart-dev to rmm's host section, and it created a run dependency on cuda-cudart. This was fine for CUDA 12.0 but this strategy is not compatible with CUDA Minor Version Compatibility. If we build with CUDA 12.2, the cuda-cudart-dev package will export the subpackage cuda-cudart from the same recipe with a max pin. This means that recipes built with CUDA 12.2's cuda-cudart-dev cannot be installed with cuda-version=12.0.

This rule applies to all -dev libraries, including cuda-cudart and math libraries. For some of these libraries, we do not need them in host to build the RAPIDS package. That is true for rmm's usage of cuda-cudart-dev in host (the compiler itself also adds cuda-cudart-dev to the build dependencies, but the runtime library is not a strong run-export so it doesn't cross from build to run). For other cases, like libcuml's usage of math libraries, we will need to add ignore_run_exports_from and list those dev libraries in libcuml's recipe, to ensure MVC works as intended.

3. Force cuda-version==${RAPIDS_CUDA_VERSION%.*} in all conda commands?

This one is more questionable, and may require no action. We saw a problem in cucim's CI where cupy 13 caused cucim to need an explicit cuda-version specification while installing cucim/libcucim. My hope was that adding cuda-version==${RAPIDS_CUDA_VERSION%.*} to the installation commands in CI would cause an error during the solve if either of the problems described above in (1) and (2) were encountered. I tried this here while playing with rmm, and found that pinning cuda-version didn't actually cause a solve error -- it just forced fallback to the latest nvidia channel CUDA packages (CI logs). This was attempted before I applied the changes for (1) and (2), so I would want this to fail in CI. It succeeded but with the wrong channels/packages.

  Package             Version  Build                       Channel               Size
───────────────────────────────────────────────────────────────────────────────────────
  Install:
───────────────────────────────────────────────────────────────────────────────────────

  + cuda-cudart      12.3.101  0                           nvidia               214kB
  + gtest              1.14.0  h2a328a1_1                  conda-forge          394kB
  + fmt                10.2.1  h2a328a1_0                  conda-forge          190kB
  + gmock              1.14.0  h8af1aa0_1                  conda-forge            7kB
  + spdlog             1.12.0  h6b8df57_2                  conda-forge          183kB
  + librmm        24.04.00a20  cuda12_240130_gd4f8aa23_20  /tmp/cpp_channel       2MB
  + librmm-tests  24.04.00a20  cuda12_240130_gd4f8aa23_20  /tmp/cpp_channel       4MB

This isn't desirable, so I don't think pinning cuda-version will do anything to help us enforce that CI test jobs are using the versions we intend. Therefore, I propose that we act on items (1) and (2), and skip (3) for now.

@bdice
Copy link
Contributor

bdice commented Feb 21, 2024

I think this can be closed, since conda support is complete. Final tasks are being tracked here: #7 (comment)

@bdice bdice closed this as completed Feb 21, 2024
rapids-bot bot pushed a commit to rapidsai/docker that referenced this issue Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants