-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with nvrtc compilation #3666
Conversation
NVRTC does not support it properly, so remove it to unblock cuPy
__remove_reference_t
for nvrtc below 12.4
🟩 CI finished in 1h 35m: Pass: 100%/151 | Total: 1d 06h | Avg: 12m 15s | Max: 1h 16m | Hits: 396%/23525
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 151)
# | Runner |
---|---|
108 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(discussing offline)
Confirmed this fixes NVRTC 12.3. Reproducers with latest CuPy main: # for __remove_cvref
import cupy as cp
code = r"""
#include <cub/thread/thread_reduce.cuh>
extern "C" __global__ void my_kernel() {
}
"""
ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile() # for __remove_reference_t
import cupy as cp
code = r"""
#include <cuda/std/__memory/builtin_new_allocator.h>
extern "C" __global__ void my_kernel() {
cuda::std::__builtin_new_allocator();
}
"""
ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile() Discussed with @miscco offline, he wants to apply this patch to 12.0-12.3 instead of just 12.3, because not using built-in functions is just a micro optimization. Remark: I really want to see a thorough test against NVRTC in the CCCL CI, instead of one-off, manual tests... |
Regarding import cupy as cp
import sys
code = r"""
extern "C" __global__ void my_kernel() {
#pragma message("abcde")
#prgama GCC warning "okok"
}
"""
ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile(sys.stdout) Output:
|
🟩 CI finished in 1h 48m: Pass: 100%/151 | Total: 1d 10h | Avg: 13m 33s | Max: 1h 19m | Hits: 239%/23525
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 151)
# | Runner |
---|---|
108 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
8 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
1 | linux-amd64-gpu-h100-latest-1 |
* Disable use of builtin `__remove_reference_t` for nvrtc below 12.4 NVRTC does not support it properly, so remove it to unblock cuPy * Use correct warning pragma for nvrtc * Also suppress `__remove_cvref` (cherry picked from commit 7229e0b)
Successfully created backport PR for |
* Disable use of builtin `__remove_reference_t` for nvrtc below 12.4 NVRTC does not support it properly, so remove it to unblock cuPy * Use correct warning pragma for nvrtc * Also suppress `__remove_cvref` (cherry picked from commit 7229e0b) Co-authored-by: Michael Schellenberger Costa <[email protected]>
Disable use of builtin
__remove_reference_t
for nvrtc below 12.4NVRTC does not support it properly, so remove it to unblock cuPy
Also nvrtc uses the msvc like
#pragma message
rather than#pragma GCC warning