Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CUDA 12.2 #672

Merged
merged 23 commits into from
Feb 9, 2024
Merged

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Jan 11, 2024

Description

  • switches to CUDA 12.2.2 for building conda packages and wheels
  • adds new tests running against CUDA 12.2.2
  • removes some unnecessary cuda-version={major}.{minor} stuff in dependencies.yaml that was missed in refactor CUDA versions in dependencies.yaml #671

Notes for Reviewers

This is part of ongoing work to build and test packages against CUDA 12.2.2 across all of RAPIDS.

For more details see:

Planning a second round of PRs to revert these references back to a proper branch-24.{nn} release branch of shared-workflows once rapidsai/shared-workflows#166 is merged.

(created with rapids-reviser)

@jameslamb jameslamb marked this pull request as ready for review January 11, 2024 17:45
@jameslamb jameslamb requested a review from a team as a code owner January 11, 2024 17:45
@jameslamb jameslamb changed the title add CUDA 12.2 support for conda packages and wheels WIP: add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb marked this pull request as draft January 11, 2024 19:21
@jameslamb jameslamb changed the title WIP: add CUDA 12.2 support for conda packages and wheels WIP: use CUDA 12.2 for building and testing wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: use CUDA 12.2 for building and testing wheels WIP: add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: add CUDA 12.2 support for conda packages and wheels WIP: (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb changed the title WIP: (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Jan 11, 2024
@jameslamb jameslamb marked this pull request as ready for review January 11, 2024 22:51
@jakirkham jakirkham added feature request New feature or request non-breaking Introduces a non-breaking change do not merge labels Jan 13, 2024
@jakirkham jakirkham mentioned this pull request Jan 18, 2024
@jameslamb jameslamb changed the base branch from branch-24.02 to branch-24.04 January 22, 2024 15:38
@jakirkham
Copy link
Member

The error seen here doesn't appear to be CUDA 12.2 specific

Reproduced here: #675 (comment)

Discussing offline on how to resolve

@jakirkham
Copy link
Member

Updating branch to pull in recent CI fixes ( #680 )

Maybe that helps clear things up

@jakirkham
Copy link
Member

jakirkham commented Jan 24, 2024

The good news is CUDA 12.2 passes! 🎉

The bad news is it looks like the CUDA 11.8 Conda test is running into a bunch of test failures. Unfortunately the job dies around 12% of the way through the test suite. So we don't learn any more about what happened

Noticing that there are some CUDA 12 packages getting installed in the CUDA 11.8 build on CI. Looking at the PR, notice we are making some changes to the CUDA 11.8 environment. Maybe this is related?

Edit: Adding snippet of CTK packages below

  + libcufile         1.8.1.2  0                                nvidia                  1MB
  + libnvjpeg       12.3.0.81  0                                nvidia                  3MB
  + libcublas        12.3.4.1  0                                nvidia                368MB
  + libcufft        11.0.12.1  0                                nvidia                 87MB
  + libcurand      10.3.4.107  0                                nvidia                 54MB
  + libcusolver    11.5.4.101  0                                nvidia                117MB
  + libcusparse    12.2.0.103  0                                nvidia                179MB

Comment on lines 35 to 40
rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
"cuda-version=${RAPIDS_CUDA_VERSION%.*}" \
"libcucim=${RAPIDS_VERSION_NUMBER}" \
"cucim=${RAPIDS_VERSION_NUMBER}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion offline, we determine the CUDA 11.8 build was failing as the packages were being upgraded in this step to CUDA 12.3, which was unexpected

To try and fix this, have pinned cuda-version while installing libcucim & cucim. It appears that resolves the upgrade issue and allows the tests to pass

That said, we didn't expect to need a cuda-version pinning here. That may deserve some additional investigation on its own (with possible follow up here and in other RAPIDS projects)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be the root cause of what we see here? conda-forge/cupy-feedstock#247 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With cuda-version added to cupy in PR ( conda-forge/cupy-feedstock#249 ), think we can now try dropping cuda-version

Suggested change
rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
"cuda-version=${RAPIDS_CUDA_VERSION%.*}" \
"libcucim=${RAPIDS_VERSION_NUMBER}" \
"cucim=${RAPIDS_VERSION_NUMBER}"
rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
"libcucim=${RAPIDS_VERSION_NUMBER}" \
"cucim=${RAPIDS_VERSION_NUMBER}"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Thanks @jakirkham

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, would note that when we saw the issue previously (before adding the cuda-version workaround above), we do see cuda-version=11.8 in the specs from the environment update on CI

Transaction

  Prefix: /opt/conda/envs/test

  Updating specs:

   - gputil[version='>=1.4.0']
   - cuda-version=11.8
   - imagecodecs[version='>=2021.6.8']
   - matplotlib-base
   - openslide-python[version='>=1.3.0']
   - pip
   - pooch[version='>=1.6.0']
   - psutil[version='>=5.8.0']
   - pytest-cov[version='>=2.12.1']
   - pytest-lazy-fixture[version='>=0.6.3']
   - pytest-xdist
   - pytest[version='>=6.2.4']
   - python=3.10
   - tifffile[version='>=2022.7.28']

IOW the solver recognizes we've explicitly requested cuda-version with a specific version constraint

Despite this the solver later ignores this constraint and updates cuda-version anyways later in the same CI log:

  - cuda-version         11.8  h70ddcb2_2                       conda-forge          Cached
  + cuda-version         12.3  h32bc705_2                       conda-forge            21kB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we still have this issue. However it is now with CUDA 12.0. Here is a relevant snippet below (also when cupy is installed with the PR build of cucim) taken from CI:

  - cuda-version         12.0  hffde075_2                       conda-forge             Cached
  + cuda-version         12.3  h32bc705_2                       conda-forge               21kB

Copy link
Contributor

@bdice bdice Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA 12 problems should be resolved by the fixes discussed here: rapidsai/build-planning#8 (comment)

@jameslamb jameslamb changed the title (DO NOT MERGE) add CUDA 12.2 support for conda packages and wheels Support CUDA 12.2 Jan 25, 2024
dependencies.yaml Outdated Show resolved Hide resolved
ci/test_python.sh Outdated Show resolved Hide resolved
@bdice bdice removed the do not merge label Feb 9, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI logs look fine. I will file a follow-up PR to make libcufile dependencies included on only x86_64 (this was a pre-existing problem so I don't want to put it in-scope for this PR).

@bdice
Copy link
Contributor

bdice commented Feb 9, 2024

/merge

@jakirkham
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit b0cdce1 into rapidsai:branch-24.04 Feb 9, 2024
39 checks passed
@bdice
Copy link
Contributor

bdice commented Feb 9, 2024

The promised follow-up PR is here: #699

rapids-bot bot pushed a commit that referenced this pull request Feb 10, 2024
Follow-up from #672. This fixes an issue where libcufile-dev could be included in aarch64 environments (this path was never called in CI so it wasn't a huge problem).

I also fixed some duplication in dependencies.yaml. The CUDA compilers (for 11 and 12) are now included in the `build` dependency list, and all CUDA libraries are included in the `cuda` dependency list. As before, the CUDA version is constrained by the `cuda_version` dependency list. This is more aligned with how cudf's dependency list is structured.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)
  - https://github.com/jakirkham

URL: #699
rapids-bot bot pushed a commit that referenced this pull request Feb 20, 2024
Follow-up to #672

For all GitHub Actions configs, replaces uses of the `test-cuda-12.2` branch on `shared-workflows`
with `branch-24.04`, now that rapidsai/shared-workflows#166 has been merged.

### Notes for Reviewers

This is part of ongoing work to build and test packages against CUDA 12.2 across all of RAPIDS.

For more details see:

* rapidsai/build-planning#7

*(created with `rapids-reviser`)*

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #702
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants