-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CUDA 12.2 #672
Support CUDA 12.2 #672
Conversation
The error seen here doesn't appear to be CUDA 12.2 specific Reproduced here: #675 (comment) Discussing offline on how to resolve |
Updating branch to pull in recent CI fixes ( #680 ) Maybe that helps clear things up |
The good news is CUDA 12.2 passes! 🎉 The bad news is it looks like the CUDA 11.8 Conda test is running into a bunch of test failures. Unfortunately the job dies around 12% of the way through the test suite. So we don't learn any more about what happened Noticing that there are some CUDA 12 packages getting installed in the CUDA 11.8 build on CI. Looking at the PR, notice we are making some changes to the CUDA 11.8 environment. Maybe this is related? Edit: Adding snippet of CTK packages below
|
578f517
to
45f9785
Compare
ci/test_python.sh
Outdated
rapids-mamba-retry install \ | ||
--channel "${CPP_CHANNEL}" \ | ||
--channel "${PYTHON_CHANNEL}" \ | ||
"cuda-version=${RAPIDS_CUDA_VERSION%.*}" \ | ||
"libcucim=${RAPIDS_VERSION_NUMBER}" \ | ||
"cucim=${RAPIDS_VERSION_NUMBER}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion offline, we determine the CUDA 11.8 build was failing as the packages were being upgraded in this step to CUDA 12.3, which was unexpected
To try and fix this, have pinned cuda-version
while installing libcucim
& cucim
. It appears that resolves the upgrade issue and allows the tests to pass
That said, we didn't expect to need a cuda-version
pinning here. That may deserve some additional investigation on its own (with possible follow up here and in other RAPIDS projects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be the root cause of what we see here? conda-forge/cupy-feedstock#247 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With cuda-version
added to cupy
in PR ( conda-forge/cupy-feedstock#249 ), think we can now try dropping cuda-version
rapids-mamba-retry install \ | |
--channel "${CPP_CHANNEL}" \ | |
--channel "${PYTHON_CHANNEL}" \ | |
"cuda-version=${RAPIDS_CUDA_VERSION%.*}" \ | |
"libcucim=${RAPIDS_VERSION_NUMBER}" \ | |
"cucim=${RAPIDS_VERSION_NUMBER}" | |
rapids-mamba-retry install \ | |
--channel "${CPP_CHANNEL}" \ | |
--channel "${PYTHON_CHANNEL}" \ | |
"libcucim=${RAPIDS_VERSION_NUMBER}" \ | |
"cucim=${RAPIDS_VERSION_NUMBER}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Thanks @jakirkham
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, would note that when we saw the issue previously (before adding the cuda-version
workaround above), we do see cuda-version=11.8
in the specs from the environment update on CI
Transaction
Prefix: /opt/conda/envs/test
Updating specs:
- gputil[version='>=1.4.0']
- cuda-version=11.8
- imagecodecs[version='>=2021.6.8']
- matplotlib-base
- openslide-python[version='>=1.3.0']
- pip
- pooch[version='>=1.6.0']
- psutil[version='>=5.8.0']
- pytest-cov[version='>=2.12.1']
- pytest-lazy-fixture[version='>=0.6.3']
- pytest-xdist
- pytest[version='>=6.2.4']
- python=3.10
- tifffile[version='>=2022.7.28']
IOW the solver recognizes we've explicitly requested cuda-version
with a specific version constraint
Despite this the solver later ignores this constraint and updates cuda-version
anyways later in the same CI log:
- cuda-version 11.8 h70ddcb2_2 conda-forge Cached
+ cuda-version 12.3 h32bc705_2 conda-forge 21kB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we still have this issue. However it is now with CUDA 12.0. Here is a relevant snippet below (also when cupy
is installed with the PR build of cucim
) taken from CI:
- cuda-version 12.0 hffde075_2 conda-forge Cached
+ cuda-version 12.3 h32bc705_2 conda-forge 21kB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CUDA 12 problems should be resolved by the fixes discussed here: rapidsai/build-planning#8 (comment)
Co-authored-by: Bradley Dice <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI logs look fine. I will file a follow-up PR to make libcufile dependencies included on only x86_64 (this was a pre-existing problem so I don't want to put it in-scope for this PR).
/merge |
/merge |
The promised follow-up PR is here: #699 |
Follow-up from #672. This fixes an issue where libcufile-dev could be included in aarch64 environments (this path was never called in CI so it wasn't a huge problem). I also fixed some duplication in dependencies.yaml. The CUDA compilers (for 11 and 12) are now included in the `build` dependency list, and all CUDA libraries are included in the `cuda` dependency list. As before, the CUDA version is constrained by the `cuda_version` dependency list. This is more aligned with how cudf's dependency list is structured. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) - https://github.com/jakirkham URL: #699
Follow-up to #672 For all GitHub Actions configs, replaces uses of the `test-cuda-12.2` branch on `shared-workflows` with `branch-24.04`, now that rapidsai/shared-workflows#166 has been merged. ### Notes for Reviewers This is part of ongoing work to build and test packages against CUDA 12.2 across all of RAPIDS. For more details see: * rapidsai/build-planning#7 *(created with `rapids-reviser`)* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #702
Description
cuda-version={major}.{minor}
stuff independencies.yaml
that was missed in refactor CUDA versions in dependencies.yaml #671Notes for Reviewers
This is part of ongoing work to build and test packages against CUDA 12.2.2 across all of RAPIDS.
For more details see:
Planning a second round of PRs to revert these references back to a proper
branch-24.{nn}
release branch ofshared-workflows
once rapidsai/shared-workflows#166 is merged.(created with
rapids-reviser
)