Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with nvrtc compilation #3666

Merged
merged 3 commits into from
Feb 4, 2025

Conversation

miscco
Copy link
Collaborator

@miscco miscco commented Feb 4, 2025

Disable use of builtin __remove_reference_t for nvrtc below 12.4
NVRTC does not support it properly, so remove it to unblock cuPy

Also nvrtc uses the msvc like #pragma message rather than #pragma GCC warning

NVRTC does not support it properly, so remove it to unblock cuPy
@miscco miscco requested a review from a team as a code owner February 4, 2025 08:24
@miscco miscco requested a review from wmaxey February 4, 2025 08:24
@miscco miscco requested review from a team as code owners February 4, 2025 08:45
@miscco miscco requested a review from elstehle February 4, 2025 08:46
@miscco miscco changed the title Disable use of builtin __remove_reference_t for nvrtc below 12.4 Fix issues with nvrtc compilation Feb 4, 2025
Copy link
Contributor

github-actions bot commented Feb 4, 2025

🟩 CI finished in 1h 35m: Pass: 100%/151 | Total: 1d 06h | Avg: 12m 15s | Max: 1h 16m | Hits: 396%/23525
  • 🟩 cub: Pass: 100%/44 | Total: 11h 46m | Avg: 16m 02s | Max: 1h 16m | Hits: 386%/3500

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 11h 36m | Avg: 16m 34s | Max:  1h 16m | Hits: 386%/3500  
      🟩 arm64              Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 08s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 01m | Hits: 399%/875   
      🟩 12.5               Pass: 100%/2   | Total:  1h 55m | Avg: 57m 59s | Max: 58m 31s
      🟩 12.8               Pass: 100%/37  | Total:  8h 27m | Avg: 13m 42s | Max:  1h 16m | Hits: 382%/2625  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 41s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 22m | Avg: 16m 33s | Max:  1h 01m | Hits: 399%/875   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 55m | Avg: 57m 59s | Max: 58m 31s
      🟩 nvcc12.8           Pass: 100%/35  | Total:  8h 18m | Avg: 14m 14s | Max:  1h 16m | Hits: 382%/2625  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 41s
      🟩 nvcc               Pass: 100%/42  | Total: 11h 37m | Avg: 16m 35s | Max:  1h 16m | Hits: 386%/3500  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 39s | Avg:  5m 24s | Max:  5m 44s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 15s | Avg:  5m 37s | Max:  5m 39s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 26s | Avg:  5m 43s | Max:  5m 56s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 10s | Avg:  5m 35s | Max:  5m 44s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 09m | Avg:  9m 56s | Max: 24m 22s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 51s | Avg:  5m 25s | Max:  5m 26s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 30s | Avg:  5m 45s | Max:  6m 12s
      🟩 GCC10              Pass: 100%/2   | Total: 12m 11s | Avg:  6m 05s | Max:  6m 06s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 50s | Avg:  5m 55s | Max:  6m 00s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 12s | Avg:  6m 06s | Max:  6m 20s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 13m | Avg: 13m 18s | Max: 24m 24s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 02m | Hits: 391%/1750  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 16m | Hits: 382%/1750  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 55m | Avg: 57m 59s | Max: 58m 31s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 05m | Avg:  7m 21s | Max: 24m 22s
      🟩 GCC                Pass: 100%/21  | Total:  3h 17m | Avg:  9m 24s | Max: 24m 24s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 27m | Avg:  1h 06m | Max:  1h 16m | Hits: 386%/3500  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 55m | Avg: 57m 59s | Max: 58m 31s
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 53s | Avg: 14m 26s | Max: 24m 24s
      🟩 rtx2080            Pass: 100%/34  | Total:  9h 00m | Avg: 15m 54s | Max:  1h 16m | Hits: 386%/3500  
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 16m | Avg: 17m 04s | Max: 24m 22s
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  9h 17m | Avg: 15m 03s | Max:  1h 16m | Hits: 386%/3500  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 21m 45s | Avg: 21m 45s | Max: 21m 45s
      🟩 GraphCapture       Pass: 100%/1   | Total: 16m 48s | Avg: 16m 48s | Max: 16m 48s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 10m | Avg: 23m 37s | Max: 24m 24s
      🟩 TestGPU            Pass: 100%/2   | Total: 39m 26s | Avg: 19m 43s | Max: 20m 03s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 53s | Avg: 14m 26s | Max: 24m 24s
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 16s | Avg:  6m 16s | Max:  6m 16s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 40m | Avg: 17m 00s | Max:  1h 06m | Hits: 388%/2625  
      🟩 20                 Pass: 100%/24  | Total:  6h 05m | Avg: 15m 14s | Max:  1h 16m | Hits: 382%/875   
    
  • 🟩 thrust: Pass: 100%/43 | Total: 10h 06m | Avg: 14m 05s | Max: 1h 10m | Hits: 245%/9230

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 09s | Avg:  8m 34s | Max: 11m 03s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total:  9h 56m | Avg: 14m 32s | Max:  1h 10m | Hits: 245%/9230  
      🟩 arm64              Pass: 100%/2   | Total: 10m 05s | Avg:  5m 02s | Max:  5m 15s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 10m | Avg: 14m 04s | Max: 49m 23s | Hits: 262%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
      🟩 12.8               Pass: 100%/36  | Total:  6h 39m | Avg: 11m 05s | Max: 55m 28s | Hits: 241%/7384  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 39s | Avg:  5m 19s | Max:  5m 32s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 10m | Avg: 14m 04s | Max: 49m 23s | Hits: 262%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
      🟩 nvcc12.8           Pass: 100%/34  | Total:  6h 28m | Avg: 11m 25s | Max: 55m 28s | Hits: 241%/7384  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 39s | Avg:  5m 19s | Max:  5m 32s
      🟩 nvcc               Pass: 100%/41  | Total:  9h 55m | Avg: 14m 31s | Max:  1h 10m | Hits: 245%/9230  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 35s | Avg:  5m 23s | Max:  5m 57s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 41s | Avg:  5m 50s | Max:  6m 09s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 46s | Avg:  5m 53s | Max:  5m 57s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 29s | Avg:  5m 44s | Max:  5m 56s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 52s | Avg:  6m 24s | Max: 10m 46s
      🟩 GCC7               Pass: 100%/2   | Total: 11m 12s | Avg:  5m 36s | Max:  5m 48s
      🟩 GCC8               Pass: 100%/1   | Total:  6m 01s | Avg:  6m 01s | Max:  6m 01s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 34s | Avg:  5m 47s | Max:  5m 59s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 52s | Avg:  5m 56s | Max:  6m 09s
      🟩 GCC11              Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 20s
      🟩 GCC12              Pass: 100%/2   | Total: 12m 06s | Avg:  6m 03s | Max:  6m 13s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 37s | Max: 11m 20s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 38m | Avg: 49m 12s | Max: 49m 23s | Hits: 252%/3692  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 23m | Avg: 47m 58s | Max: 55m 28s | Hits: 240%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 41m | Avg:  5m 57s | Max: 10m 46s
      🟩 GCC                Pass: 100%/19  | Total:  2h 05m | Avg:  6m 37s | Max: 11m 20s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 02m | Avg: 48m 28s | Max: 55m 28s | Hits: 245%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 10m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  7h 30m | Avg: 13m 39s | Max:  1h 10m | Hits: 233%/5538  
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 35m | Avg: 15m 33s | Max: 54m 28s | Hits: 263%/3692  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  8h 43m | Avg: 14m 09s | Max:  1h 10m | Hits: 215%/7384  
      🟩 TestCPU            Pass: 100%/3   | Total: 49m 26s | Avg: 16m 28s | Max: 34m 00s | Hits: 365%/1846  
      🟩 TestGPU            Pass: 100%/3   | Total: 33m 09s | Avg: 11m 03s | Max: 11m 20s
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 08s | Avg:  6m 08s | Max:  6m 08s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 12m | Avg: 15m 38s | Max:  1h 06m | Hits: 233%/5538  
      🟩 20                 Pass: 100%/21  | Total:  4h 36m | Avg: 13m 09s | Max:  1h 10m | Hits: 263%/3692  
    
  • 🟩 libcudacxx: Pass: 100%/41 | Total: 6h 49m | Avg: 9m 59s | Max: 35m 36s | Hits: 535%/10273

    🟩 cpu
      🟩 amd64              Pass: 100%/39  | Total:  6h 42m | Avg: 10m 18s | Max: 35m 36s | Hits: 535%/10273 
      🟩 arm64              Pass: 100%/2   | Total:  7m 15s | Avg:  3m 37s | Max:  3m 52s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 39m 40s | Avg:  7m 56s | Max: 25m 23s | Hits: 555%/2523  
      🟩 12.5               Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 42s
      🟩 12.8               Pass: 100%/34  | Total:  5h 05m | Avg:  8m 59s | Max: 35m 36s | Hits: 529%/7750  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 41m 42s | Avg: 20m 51s | Max: 22m 51s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 39m 40s | Avg:  7m 56s | Max: 25m 23s | Hits: 555%/2523  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 42s
      🟩 nvcc12.8           Pass: 100%/32  | Total:  4h 23m | Avg:  8m 14s | Max: 35m 36s | Hits: 529%/7750  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 41m 42s | Avg: 20m 51s | Max: 22m 51s
      🟩 nvcc               Pass: 100%/39  | Total:  6h 07m | Avg:  9m 25s | Max: 35m 36s | Hits: 535%/10273 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 16m 05s | Avg:  4m 01s | Max:  4m 24s
      🟩 Clang15            Pass: 100%/2   | Total:  8m 41s | Avg:  4m 20s | Max:  4m 27s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 06s | Avg:  4m 33s | Max:  4m 47s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 11s
      🟩 Clang18            Pass: 100%/6   | Total:  1h 03m | Avg: 10m 33s | Max: 22m 51s
      🟩 GCC7               Pass: 100%/2   | Total:  7m 02s | Avg:  3m 31s | Max:  3m 48s
      🟩 GCC8               Pass: 100%/1   | Total: 20m 54s | Avg: 20m 54s | Max: 20m 54s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 48s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  4m 04s
      🟩 GCC11              Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 08s
      🟩 GCC12              Pass: 100%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  4m 10s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 42s | Max: 18m 58s
      🟩 MSVC14.29          Pass: 100%/2   | Total: 53m 37s | Avg: 26m 48s | Max: 28m 14s | Hits: 550%/5056  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 24s | Max: 35m 36s | Hits: 521%/5217  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 42s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  1h 45m | Avg:  6m 35s | Max: 22m 51s
      🟩 GCC                Pass: 100%/19  | Total:  2h 01m | Avg:  6m 23s | Max: 20m 54s
      🟩 MSVC               Pass: 100%/4   | Total:  1h 58m | Avg: 29m 36s | Max: 35m 36s | Hits: 535%/10273 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 04m | Avg: 32m 04s | Max: 32m 42s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/41  | Total:  6h 49m | Avg:  9m 59s | Max: 35m 36s | Hits: 535%/10273 
    🟩 jobs
      🟩 Build              Pass: 100%/36  | Total:  5h 54m | Avg:  9m 50s | Max: 35m 36s | Hits: 535%/10273 
      🟩 NVRTC              Pass: 100%/2   | Total: 34m 52s | Avg: 17m 26s | Max: 18m 58s
      🟩 Test               Pass: 100%/2   | Total: 17m 45s | Avg:  8m 52s | Max:  8m 58s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 34m 52s | Avg: 17m 26s | Max: 18m 58s
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 25s | Avg:  4m 25s | Max:  4m 25s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 49m | Avg: 10m 55s | Max: 32m 42s | Hits: 541%/7589  
      🟩 20                 Pass: 100%/19  | Total:  2h 57m | Avg:  9m 21s | Max: 35m 36s | Hits: 521%/2684  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 37m | Avg: 4m 51s | Max: 12m 55s | Hits: 388%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 26m | Avg:  5m 23s | Max: 12m 55s | Hits: 388%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 45s | Avg:  2m 41s | Max:  2m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total:  8m 35s | Avg:  8m 35s | Max:  8m 35s | Hits: 388%/261   
      🟩 12.5               Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 37s
      🟩 12.8               Pass: 100%/17  | Total:  1h 17m | Avg:  4m 33s | Max: 12m 55s | Hits: 388%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total:  8m 35s | Avg:  8m 35s | Max:  8m 35s | Hits: 388%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 37s
      🟩 nvcc12.8           Pass: 100%/17  | Total:  1h 17m | Avg:  4m 33s | Max: 12m 55s | Hits: 388%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 37m | Avg:  4m 51s | Max: 12m 55s | Hits: 388%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 31s | Avg:  3m 31s | Max:  3m 31s
      🟩 Clang18            Pass: 100%/4   | Total: 21m 26s | Avg:  5m 21s | Max: 12m 38s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
      🟩 GCC11              Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
      🟩 GCC12              Pass: 100%/2   | Total: 16m 28s | Avg:  8m 14s | Max: 12m 55s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 07s | Avg:  2m 46s | Max:  2m 59s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 35s | Avg:  8m 35s | Max:  8m 35s | Hits: 388%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 03s | Avg:  9m 03s | Max:  9m 03s | Hits: 388%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 37s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 34m 43s | Avg:  4m 20s | Max: 12m 38s
      🟩 GCC                Pass: 100%/8   | Total: 33m 39s | Avg:  4m 12s | Max: 12m 55s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 38s | Avg:  8m 49s | Max:  9m 03s | Hits: 388%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 37s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 37m | Avg:  4m 51s | Max: 12m 55s | Hits: 388%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 11m | Avg:  3m 58s | Max:  9m 03s | Hits: 388%/522   
      🟩 Test               Pass: 100%/2   | Total: 25m 33s | Avg: 12m 46s | Max: 12m 55s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 59s | Avg:  2m 59s | Max:  2m 59s
      🟩 90a                Pass: 100%/1   | Total:  2m 47s | Avg:  2m 47s | Max:  2m 47s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 13m 50s | Avg:  3m 27s | Max:  5m 25s
      🟩 20                 Pass: 100%/16  | Total:  1h 23m | Avg:  5m 12s | Max: 12m 55s | Hits: 388%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 43s | Avg: 3m 51s | Max: 5m 21s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  5m 21s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 22s | Avg:  2m 22s | Max:  2m 22s
      🟩 Test               Pass: 100%/1   | Total:  5m 21s | Avg:  5m 21s | Max:  5m 21s
    
  • 🟩 python: Pass: 100%/1 | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 24m 58s | Avg: 24m 58s | Max: 24m 58s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 151)

# Runner
108 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(discussing offline)

@leofang
Copy link
Member

leofang commented Feb 4, 2025

Confirmed this fixes NVRTC 12.3. Reproducers with latest CuPy main:

# for __remove_cvref
import cupy as cp

code = r"""
#include <cub/thread/thread_reduce.cuh>

extern "C" __global__ void my_kernel() {

}
"""

ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile()
# for __remove_reference_t
import cupy as cp

code = r"""
#include <cuda/std/__memory/builtin_new_allocator.h>

extern "C" __global__ void my_kernel() {
  cuda::std::__builtin_new_allocator();
}
"""

ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile()

Discussed with @miscco offline, he wants to apply this patch to 12.0-12.3 instead of just 12.3, because not using built-in functions is just a micro optimization.

Remark: I really want to see a thorough test against NVRTC in the CCCL CI, instead of one-off, manual tests...

@leofang
Copy link
Member

leofang commented Feb 4, 2025

Regarding #pragma, confirmed NVRTC seems to follow MSVC behavior even on Linux:

import cupy as cp
import sys

code = r"""
extern "C" __global__ void my_kernel() {
  #pragma message("abcde")
  #prgama GCC warning "okok"
}
"""

ker = cp.RawKernel(code, "my_kernel", options=("-std=c++17",))
ker.compile(sys.stdout)

Output:

(...)
cupy.cuda.compiler.CompileException: /tmp/tmpjicvwlc1/567a57167cda49bbddd93b666d02c35fd60c1cca.cubin.cu(3): remark #20200-D: #pragma message: "abcde"
    #pragma message("abcde")
                            ^

/tmp/tmpjicvwlc1/567a57167cda49bbddd93b666d02c35fd60c1cca.cubin.cu(4): error: unrecognized preprocessing directive
    #prgama GCC warning "okok"
     ^
(...)

@leofang leofang added 2.8.0 target for 2.8.0 release backport branch/2.8.x and removed 2.8.0 target for 2.8.0 release labels Feb 4, 2025
@miscco miscco enabled auto-merge (squash) February 4, 2025 17:22
Copy link
Contributor

github-actions bot commented Feb 4, 2025

🟩 CI finished in 1h 48m: Pass: 100%/151 | Total: 1d 10h | Avg: 13m 33s | Max: 1h 19m | Hits: 239%/23525
  • 🟩 cub: Pass: 100%/44 | Total: 14h 01m | Avg: 19m 07s | Max: 1h 13m | Hits: 38%/3500

    🟩 cpu
      🟩 amd64              Pass: 100%/42  | Total: 13h 51m | Avg: 19m 47s | Max:  1h 13m | Hits:  38%/3500  
      🟩 arm64              Pass: 100%/2   | Total:  9m 52s | Avg:  4m 56s | Max:  5m 08s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 20m | Avg: 16m 08s | Max: 59m 52s | Hits:  38%/875   
      🟩 12.5               Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
      🟩 12.8               Pass: 100%/37  | Total: 10h 22m | Avg: 16m 48s | Max:  1h 13m | Hits:  38%/2625  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 42s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 20m | Avg: 16m 08s | Max: 59m 52s | Hits:  38%/875   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
      🟩 nvcc12.8           Pass: 100%/35  | Total: 10h 12m | Avg: 17m 30s | Max:  1h 13m | Hits:  38%/2625  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  9m 16s | Avg:  4m 38s | Max:  4m 42s
      🟩 nvcc               Pass: 100%/42  | Total: 13h 52m | Avg: 19m 48s | Max:  1h 13m | Hits:  38%/3500  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 30s | Avg:  5m 22s | Max:  5m 42s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 43s | Avg:  5m 51s | Max:  6m 00s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  5m 52s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 24s | Avg:  5m 42s | Max:  5m 48s
      🟩 Clang18            Pass: 100%/7   | Total:  1h 08m | Avg:  9m 45s | Max: 23m 13s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 54s | Avg:  5m 27s | Max:  5m 42s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 41s | Avg:  5m 41s | Max:  5m 41s
      🟩 GCC9               Pass: 100%/2   | Total: 59m 42s | Avg: 29m 51s | Max: 54m 16s
      🟩 GCC10              Pass: 100%/2   | Total:  1h 07m | Avg: 33m 55s | Max:  1h 01m
      🟩 GCC11              Pass: 100%/2   | Total: 12m 11s | Avg:  6m 05s | Max:  6m 11s
      🟩 GCC12              Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 10s
      🟩 GCC13              Pass: 100%/10  | Total:  2h 13m | Avg: 13m 19s | Max: 25m 11s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 11m | Hits:  38%/1750  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 13m | Hits:  38%/1750  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  2h 04m | Avg:  7m 19s | Max: 23m 13s
      🟩 GCC                Pass: 100%/21  | Total:  5h 01m | Avg: 14m 21s | Max:  1h 01m
      🟩 MSVC               Pass: 100%/4   | Total:  4h 36m | Avg:  1h 09m | Max:  1h 13m | Hits:  38%/3500  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 48s | Avg: 14m 54s | Max: 25m 11s
      🟩 rtx2080            Pass: 100%/34  | Total: 11h 16m | Avg: 19m 54s | Max:  1h 13m | Hits:  38%/3500  
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 14m | Avg: 16m 49s | Max: 23m 52s
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total: 11h 33m | Avg: 18m 43s | Max:  1h 13m | Hits:  38%/3500  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 20m 05s | Avg: 20m 05s | Max: 20m 05s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 08s | Avg: 15m 08s | Max: 15m 08s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 05s | Max: 25m 11s
      🟩 TestGPU            Pass: 100%/2   | Total: 40m 41s | Avg: 20m 20s | Max: 21m 00s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 48s | Avg: 14m 54s | Max: 25m 11s
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 12s | Avg:  6m 12s | Max:  6m 12s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  7h 51m | Avg: 23m 34s | Max:  1h 13m | Hits:  38%/2625  
      🟩 20                 Pass: 100%/24  | Total:  6h 09m | Avg: 15m 24s | Max:  1h 11m | Hits:  38%/875   
    
  • 🟩 thrust: Pass: 100%/43 | Total: 10h 45m | Avg: 15m 00s | Max: 1h 19m | Hits: 141%/9230

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 31s | Avg:  8m 45s | Max: 11m 04s
    🟩 cpu
      🟩 amd64              Pass: 100%/41  | Total: 10h 35m | Avg: 15m 29s | Max:  1h 19m | Hits: 141%/9230  
      🟩 arm64              Pass: 100%/2   | Total:  9m 59s | Avg:  4m 59s | Max:  5m 14s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 13m | Avg: 14m 39s | Max: 52m 12s | Hits: 114%/1846  
      🟩 12.5               Pass: 100%/2   | Total:  2h 36m | Avg:  1h 18m | Max:  1h 19m
      🟩 12.8               Pass: 100%/36  | Total:  6h 54m | Avg: 11m 31s | Max:  1h 05m | Hits: 147%/7384  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 10m 41s | Avg:  5m 20s | Max:  5m 30s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 13m | Avg: 14m 39s | Max: 52m 12s | Hits: 114%/1846  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 36m | Avg:  1h 18m | Max:  1h 19m
      🟩 nvcc12.8           Pass: 100%/34  | Total:  6h 44m | Avg: 11m 53s | Max:  1h 05m | Hits: 147%/7384  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 41s | Avg:  5m 20s | Max:  5m 30s
      🟩 nvcc               Pass: 100%/41  | Total: 10h 34m | Avg: 15m 28s | Max:  1h 19m | Hits: 141%/9230  
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 21m 19s | Avg:  5m 19s | Max:  5m 32s
      🟩 Clang15            Pass: 100%/2   | Total: 11m 54s | Avg:  5m 57s | Max:  6m 09s
      🟩 Clang16            Pass: 100%/2   | Total: 11m 29s | Avg:  5m 44s | Max:  5m 59s
      🟩 Clang17            Pass: 100%/2   | Total: 11m 07s | Avg:  5m 33s | Max:  5m 36s
      🟩 Clang18            Pass: 100%/7   | Total: 44m 48s | Avg:  6m 24s | Max: 10m 24s
      🟩 GCC7               Pass: 100%/2   | Total: 10m 56s | Avg:  5m 28s | Max:  5m 34s
      🟩 GCC8               Pass: 100%/1   | Total:  5m 36s | Avg:  5m 36s | Max:  5m 36s
      🟩 GCC9               Pass: 100%/2   | Total: 11m 37s | Avg:  5m 48s | Max:  6m 13s
      🟩 GCC10              Pass: 100%/2   | Total: 11m 49s | Avg:  5m 54s | Max:  6m 12s
      🟩 GCC11              Pass: 100%/2   | Total: 11m 56s | Avg:  5m 58s | Max:  6m 13s
      🟩 GCC12              Pass: 100%/2   | Total: 13m 01s | Avg:  6m 30s | Max:  6m 41s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 38s | Max: 11m 28s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 44m | Avg: 52m 09s | Max: 52m 12s | Hits:  94%/3692  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 37m | Avg: 52m 24s | Max:  1h 05m | Hits: 171%/5538  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 36m | Avg:  1h 18m | Max:  1h 19m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  1h 40m | Avg:  5m 55s | Max: 10m 24s
      🟩 GCC                Pass: 100%/19  | Total:  2h 06m | Avg:  6m 38s | Max: 11m 28s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 21m | Avg: 52m 18s | Max:  1h 05m | Hits: 141%/9230  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 36m | Avg:  1h 18m | Max:  1h 19m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/33  | Total:  7h 56m | Avg: 14m 25s | Max:  1h 19m | Hits:  88%/5538  
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 49m | Avg: 16m 54s | Max:  1h 05m | Hits: 220%/3692  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  9h 19m | Avg: 15m 06s | Max:  1h 19m | Hits:  84%/7384  
      🟩 TestCPU            Pass: 100%/3   | Total: 53m 00s | Avg: 17m 40s | Max: 37m 31s | Hits: 365%/1846  
      🟩 TestGPU            Pass: 100%/3   | Total: 32m 56s | Avg: 10m 58s | Max: 11m 28s
    🟩 sm
      🟩 90;90a;100         Pass: 100%/1   | Total:  6m 41s | Avg:  6m 41s | Max:  6m 41s
    🟩 std
      🟩 17                 Pass: 100%/20  | Total:  5h 30m | Avg: 16m 31s | Max:  1h 19m | Hits:  88%/5538  
      🟩 20                 Pass: 100%/21  | Total:  4h 57m | Avg: 14m 08s | Max:  1h 17m | Hits: 220%/3692  
    
  • 🟩 libcudacxx: Pass: 100%/41 | Total: 6h 58m | Avg: 10m 12s | Max: 40m 36s | Hits: 404%/10273

    🟩 cpu
      🟩 amd64              Pass: 100%/39  | Total:  6h 51m | Avg: 10m 33s | Max: 40m 36s | Hits: 404%/10273 
      🟩 arm64              Pass: 100%/2   | Total:  7m 04s | Avg:  3m 32s | Max:  3m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 47m 00s | Avg:  9m 24s | Max: 32m 03s | Hits: 404%/2523  
      🟩 12.5               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 39m 58s
      🟩 12.8               Pass: 100%/34  | Total:  4h 58m | Avg:  8m 46s | Max: 40m 36s | Hits: 403%/7750  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 39m 38s | Avg: 19m 49s | Max: 21m 26s
      🟩 nvcc12.0           Pass: 100%/5   | Total: 47m 00s | Avg:  9m 24s | Max: 32m 03s | Hits: 404%/2523  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 39m 58s
      🟩 nvcc12.8           Pass: 100%/32  | Total:  4h 18m | Avg:  8m 05s | Max: 40m 36s | Hits: 403%/7750  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 39m 38s | Avg: 19m 49s | Max: 21m 26s
      🟩 nvcc               Pass: 100%/39  | Total:  6h 19m | Avg:  9m 43s | Max: 40m 36s | Hits: 404%/10273 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 16m 36s | Avg:  4m 09s | Max:  4m 25s
      🟩 Clang15            Pass: 100%/2   | Total:  8m 53s | Avg:  4m 26s | Max:  4m 41s
      🟩 Clang16            Pass: 100%/2   | Total:  9m 13s | Avg:  4m 36s | Max:  4m 44s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 31s | Avg:  4m 15s | Max:  4m 16s
      🟩 Clang18            Pass: 100%/6   | Total:  1h 01m | Avg: 10m 17s | Max: 21m 26s
      🟩 GCC7               Pass: 100%/2   | Total:  7m 08s | Avg:  3m 34s | Max:  3m 35s
      🟩 GCC8               Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 47s
      🟩 GCC10              Pass: 100%/2   | Total:  8m 04s | Avg:  4m 02s | Max:  4m 06s
      🟩 GCC11              Pass: 100%/2   | Total:  7m 48s | Avg:  3m 54s | Max:  4m 06s
      🟩 GCC12              Pass: 100%/2   | Total:  8m 11s | Avg:  4m 05s | Max:  4m 19s
      🟩 GCC13              Pass: 100%/8   | Total:  1h 01m | Avg:  7m 42s | Max: 18m 34s
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 03m | Avg: 31m 44s | Max: 32m 03s | Hits: 404%/5056  
      🟩 MSVC14.39          Pass: 100%/2   | Total:  1h 13m | Avg: 36m 31s | Max: 40m 36s | Hits: 403%/5217  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 39m 58s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/16  | Total:  1h 45m | Avg:  6m 33s | Max: 21m 26s
      🟩 GCC                Pass: 100%/19  | Total:  1h 43m | Avg:  5m 28s | Max: 18m 34s
      🟩 MSVC               Pass: 100%/4   | Total:  2h 16m | Avg: 34m 07s | Max: 40m 36s | Hits: 404%/10273 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 13m | Avg: 36m 35s | Max: 39m 58s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/41  | Total:  6h 58m | Avg: 10m 12s | Max: 40m 36s | Hits: 404%/10273 
    🟩 jobs
      🟩 Build              Pass: 100%/36  | Total:  6h 03m | Avg: 10m 06s | Max: 40m 36s | Hits: 404%/10273 
      🟩 NVRTC              Pass: 100%/2   | Total: 34m 45s | Avg: 17m 22s | Max: 18m 34s
      🟩 Test               Pass: 100%/2   | Total: 17m 55s | Avg:  8m 57s | Max:  9m 04s
      🟩 VerifyCodegen      Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
    🟩 sm
      🟩 75                 Pass: 100%/2   | Total: 34m 45s | Avg: 17m 22s | Max: 18m 34s
      🟩 90;90a;100         Pass: 100%/1   | Total:  4m 29s | Avg:  4m 29s | Max:  4m 29s
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 46m | Avg: 10m 46s | Max: 33m 13s | Hits: 404%/7589  
      🟩 20                 Pass: 100%/19  | Total:  3h 10m | Avg: 10m 01s | Max: 40m 36s | Hits: 403%/2684  
    
  • 🟩 cudax: Pass: 100%/20 | Total: 1h 49m | Avg: 5m 27s | Max: 13m 03s | Hits: 81%/522

    🟩 cpu
      🟩 amd64              Pass: 100%/16  | Total:  1h 38m | Avg:  6m 09s | Max: 13m 03s | Hits:  81%/522   
      🟩 arm64              Pass: 100%/4   | Total: 10m 39s | Avg:  2m 39s | Max:  2m 43s
    🟩 ctk
      🟩 12.0               Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits:  81%/261   
      🟩 12.5               Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 26s
      🟩 12.8               Pass: 100%/17  | Total:  1h 19m | Avg:  4m 41s | Max: 13m 03s | Hits:  81%/261   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits:  81%/261   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 26s
      🟩 nvcc12.8           Pass: 100%/17  | Total:  1h 19m | Avg:  4m 41s | Max: 13m 03s | Hits:  81%/261   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/20  | Total:  1h 49m | Avg:  5m 27s | Max: 13m 03s | Hits:  81%/522   
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s
      🟩 Clang15            Pass: 100%/1   | Total:  3m 27s | Avg:  3m 27s | Max:  3m 27s
      🟩 Clang16            Pass: 100%/1   | Total:  3m 19s | Avg:  3m 19s | Max:  3m 19s
      🟩 Clang17            Pass: 100%/1   | Total:  3m 07s | Avg:  3m 07s | Max:  3m 07s
      🟩 Clang18            Pass: 100%/4   | Total: 20m 41s | Avg:  5m 10s | Max: 12m 02s
      🟩 GCC10              Pass: 100%/1   | Total:  3m 08s | Avg:  3m 08s | Max:  3m 08s
      🟩 GCC11              Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
      🟩 GCC12              Pass: 100%/2   | Total: 16m 36s | Avg:  8m 18s | Max: 13m 03s
      🟩 GCC13              Pass: 100%/4   | Total: 11m 04s | Avg:  2m 46s | Max:  2m 53s
      🟩 MSVC14.36          Pass: 100%/1   | Total: 11m 17s | Avg: 11m 17s | Max: 11m 17s | Hits:  81%/261   
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 26s | Avg: 11m 26s | Max: 11m 26s | Hits:  81%/261   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 26s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/8   | Total: 34m 24s | Avg:  4m 18s | Max: 12m 02s
      🟩 GCC                Pass: 100%/8   | Total: 33m 53s | Avg:  4m 14s | Max: 13m 03s
      🟩 MSVC               Pass: 100%/2   | Total: 22m 43s | Avg: 11m 21s | Max: 11m 26s | Hits:  81%/522   
      🟩 NVHPC              Pass: 100%/2   | Total: 18m 13s | Avg:  9m 06s | Max:  9m 26s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/20  | Total:  1h 49m | Avg:  5m 27s | Max: 13m 03s | Hits:  81%/522   
    🟩 jobs
      🟩 Build              Pass: 100%/18  | Total:  1h 24m | Avg:  4m 40s | Max: 11m 26s | Hits:  81%/522   
      🟩 Test               Pass: 100%/2   | Total: 25m 05s | Avg: 12m 32s | Max: 13m 03s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  2m 51s | Avg:  2m 51s | Max:  2m 51s
      🟩 90a                Pass: 100%/1   | Total:  2m 53s | Avg:  2m 53s | Max:  2m 53s
    🟩 std
      🟩 17                 Pass: 100%/4   | Total: 17m 02s | Avg:  4m 15s | Max:  8m 47s
      🟩 20                 Pass: 100%/16  | Total:  1h 32m | Avg:  5m 45s | Max: 13m 03s | Hits:  81%/522   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 7m 34s | Avg: 3m 47s | Max: 5m 11s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total:  7m 34s | Avg:  3m 47s | Max:  5m 11s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 23s | Avg:  2m 23s | Max:  2m 23s
      🟩 Test               Pass: 100%/1   | Total:  5m 11s | Avg:  5m 11s | Max:  5m 11s
    
  • 🟩 python: Pass: 100%/1 | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 25m 03s | Avg: 25m 03s | Max: 25m 03s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 151)

# Runner
108 linux-amd64-cpu16
15 windows-amd64-cpu16
10 linux-arm64-cpu16
8 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
1 linux-amd64-gpu-h100-latest-1

@miscco miscco merged commit 7229e0b into NVIDIA:main Feb 4, 2025
163 of 166 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 4, 2025
* Disable use of builtin `__remove_reference_t` for nvrtc below 12.4

NVRTC does not support it properly, so remove it to unblock cuPy

* Use correct warning pragma for nvrtc

* Also suppress `__remove_cvref`

(cherry picked from commit 7229e0b)
Copy link
Contributor

github-actions bot commented Feb 4, 2025

Successfully created backport PR for branch/2.8.x:

miscco added a commit that referenced this pull request Feb 4, 2025
* Disable use of builtin `__remove_reference_t` for nvrtc below 12.4

NVRTC does not support it properly, so remove it to unblock cuPy

* Use correct warning pragma for nvrtc

* Also suppress `__remove_cvref`

(cherry picked from commit 7229e0b)

Co-authored-by: Michael Schellenberger Costa <[email protected]>
@miscco miscco deleted the fix_misscompilation_nvrtc_12_3 branch February 26, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants