Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__` #278

brycelelbach · 2021-03-29T19:35:12Z

No description provided.

seanbaxter · 2021-04-15T09:34:54Z

This is relevant to my interests. I been blocked by this since 2019. What's the new mechanism?

alliepiper · 2021-04-15T17:05:40Z

@seanbaxter There's a work-in-progress in #276.

cub/device/dispatch/dispatch_reduce.cuh is an example of how the new mechanism is used, and the implementation is in cub/util_ptx_dispatch.cuh (For now...I'm planning to move it to cub/detail/ptx_dispatch.cuh today).

Basically:

Tag each policy class with a constexpr static int ptx_arch = XXX member.
Add all policies to a cub::detail::type_list.
Use cub::detail::ptx_dispatch to invoke a functor with the best matching policy.

I'm curious, though -- what are you doing with this? Are you using the CUB dispatch in another project?

seanbaxter · 2021-04-15T18:39:12Z

I was compiling CUDA Toolkit samples without modification with a single translation pass in 2019:

I ended up removing this functionality because CUDA_ARCH-based dispatch was too hard to treat robustly as the projects got more complicated. I made some requests to change this dispatch mechanism, and now it looks like it's happening. I'm looking to re-enable CUDA targets in my compiler. I want to compile CUB/Thrust with a single pass, as we all do.

For background, I had an if-codegen statement which did operand substitution at codegen time rather than template substitution, and that appears to be exactly the same as nvc++'s if-target. There was an nvvm_arch_t enum with SM versions, which is like your sm_selector.
https://www.circle-lang.org/saxpy.html#how-to-dispatch

I looked over <__target_macros> closely.. Three small questions:

Are __NV_PROVIDES_SM_XX and __NV_IS_EXACTLY_SM_XX not actually macros but rather implicit bool declarations that get plugged true/false at codegen depending on the target? I'm cool if they are. If not, I have no idea what's going on.
Is nv::target::device also an implicit integer declaration holding the current SM target?
Does [[nv::__target_bitset]] implicitly declare a target_description member function along the lines of:

explicit operator bool () const noexcept { returns nv::target::device & targets; }

?

Just a bit confused by the macro name style in 1 and the C++ name style in 2. Maybe they're unrelated things.

alliepiper · 2021-04-15T19:41:23Z

Ah ok - Everything in cub/detail/nv/ is a temporary snapshot from NVIDIA/libcudacxx#144, this PR just updates Thrust/CUB to use the libcu++ NV_IF_TARGET macros. @wmaxey, @dkolsen-pgi, and @brycelelbach wrote and designed those macros and would be better able to answer your questions about the if-target design.

griwes · 2021-04-15T20:02:42Z

nv::target::device is just a set of targets. The actual codegen-known value does not manifest in user code at any point.

seanbaxter · 2021-04-15T20:07:27Z

@griwes That doesn't clarify. Is nv::target::device is the bit-field of targets specified at the command line? Where is the active target manifested? It has to be somewhere.

griwes · 2021-04-15T20:24:10Z

I'm not sure how much I can say at this point, but the short answer is that that value doesn't actually ever manifest directly in a variable, and the backend understands the bitset.

seanbaxter · 2021-04-15T20:50:16Z

I'm just trying to compile CUB/Thrust with this new dispatch mechanic. What are the implicit declarations needed to do that?

dkolsen-pgi · 2021-04-15T21:01:53Z

The value of an if target expression is a bitset that encodes all the desired targets. There is compiler magic that interprets the bitset and generates the correct code. If you want Circle to use the new if target mechanism rather than __CUDA_ARCH__, the Circle compiler will have to have the same compiler magic. This is not the right place to explain all the details of how it works; we would need to set up a meeting.

wmaxey · 2021-04-16T00:31:31Z

Are __NV_PROVIDES_SM_XX and __NV_IS_EXACTLY_SM_XX not actually macros but rather implicit bool declarations that get plugged true/false at codegen depending on the target? I'm cool if they are. If not, I have no idea what's going on.

They are used for assembling the right token when preprocessing with __CUDA_ARCH__. The whole mechanism should be uninteresting as there is nothing really different or new happening with NVCC. This is just an abstraction to allow writing compiler independent code.

In the back-end, these tokens are concatenated to obtain predefined boolean values on NVCC. It was done this way, maybe naively, because NVC++ and NVCC create the dispatch at different stages in compilation.

jrhemstad · 2023-04-26T20:12:40Z

Closing in favor of NVIDIA/cccl#65

brycelelbach added type: enhancement New feature or request. compiler: nvc++ Specific to the NVC++ compiler. P0: must have Absolutely necessary. Critical issue, major blocker, etc. labels Mar 29, 2021

brycelelbach added this to the 1.13.0 milestone Mar 29, 2021

alliepiper self-assigned this Mar 29, 2021

alliepiper mentioned this issue May 11, 2021

Fix build error for RandomBits #299

Closed

alliepiper modified the milestones: 1.13.0, 1.14.0 Jun 9, 2021

alliepiper linked a pull request Aug 17, 2021 that will close this issue

Initial port to NV_IF_TARGET. #276

Closed

alliepiper modified the milestones: 1.14.0, 1.15.0 Aug 17, 2021

alliepiper modified the milestones: 1.15.0, 1.16.0 Oct 15, 2021

alliepiper modified the milestones: 1.16.0, 1.17.0 Feb 7, 2022

alliepiper modified the milestones: 1.17.0, 2.0.0 Apr 25, 2022

alliepiper modified the milestones: 2.0.0, 2.1.0 Aug 4, 2022

jrhemstad added this to CCCL Aug 11, 2022

jrhemstad moved this to Needs Triage in CCCL Aug 14, 2022

jrhemstad removed the status in CCCL Aug 14, 2022

jrhemstad moved this to In Progress in CCCL Aug 22, 2022

jrhemstad added the cub label Feb 22, 2023

jarmak-nv assigned griwes and unassigned alliepiper Feb 23, 2023

jrhemstad mentioned this issue Apr 26, 2023

[EPIC] Update Thrust/CUB to universally use nv/target instead of CUDA_ARCH NVIDIA/cccl#65

Open

jrhemstad closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2023

github-project-automation bot moved this from In Progress to Done in CCCL Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__` #278

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__` #278

brycelelbach commented Mar 29, 2021

seanbaxter commented Apr 15, 2021

alliepiper commented Apr 15, 2021

seanbaxter commented Apr 15, 2021 •

edited

Loading

alliepiper commented Apr 15, 2021

griwes commented Apr 15, 2021

seanbaxter commented Apr 15, 2021

griwes commented Apr 15, 2021

seanbaxter commented Apr 15, 2021

dkolsen-pgi commented Apr 15, 2021

wmaxey commented Apr 16, 2021

jrhemstad commented Apr 26, 2023

Refactor Thrust/CUB dispatch mechanisms to not rely on __CUDA_ARCH__ #278

Refactor Thrust/CUB dispatch mechanisms to not rely on __CUDA_ARCH__ #278

Comments

brycelelbach commented Mar 29, 2021

seanbaxter commented Apr 15, 2021

alliepiper commented Apr 15, 2021

seanbaxter commented Apr 15, 2021 • edited Loading

alliepiper commented Apr 15, 2021

griwes commented Apr 15, 2021

seanbaxter commented Apr 15, 2021

griwes commented Apr 15, 2021

seanbaxter commented Apr 15, 2021

dkolsen-pgi commented Apr 15, 2021

wmaxey commented Apr 16, 2021

jrhemstad commented Apr 26, 2023

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__` #278

Refactor Thrust/CUB dispatch mechanisms to not rely on `__CUDA_ARCH__` #278

seanbaxter commented Apr 15, 2021 •

edited

Loading