Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Refactor Thrust/CUB dispatch mechanisms to not rely on __CUDA_ARCH__ #278

Closed
brycelelbach opened this issue Mar 29, 2021 · 11 comments
Closed
Assignees
Labels
compiler: nvc++ Specific to the NVC++ compiler. cub P0: must have Absolutely necessary. Critical issue, major blocker, etc. type: enhancement New feature or request.
Milestone

Comments

@brycelelbach
Copy link
Collaborator

No description provided.

@brycelelbach brycelelbach added type: enhancement New feature or request. compiler: nvc++ Specific to the NVC++ compiler. P0: must have Absolutely necessary. Critical issue, major blocker, etc. labels Mar 29, 2021
@brycelelbach brycelelbach added this to the 1.13.0 milestone Mar 29, 2021
@alliepiper alliepiper self-assigned this Mar 29, 2021
@seanbaxter
Copy link

This is relevant to my interests. I been blocked by this since 2019. What's the new mechanism?

@alliepiper
Copy link
Collaborator

@seanbaxter There's a work-in-progress in #276.

cub/device/dispatch/dispatch_reduce.cuh is an example of how the new mechanism is used, and the implementation is in cub/util_ptx_dispatch.cuh (For now...I'm planning to move it to cub/detail/ptx_dispatch.cuh today).

Basically:

  • Tag each policy class with a constexpr static int ptx_arch = XXX member.
  • Add all policies to a cub::detail::type_list.
  • Use cub::detail::ptx_dispatch to invoke a functor with the best matching policy.

I'm curious, though -- what are you doing with this? Are you using the CUB dispatch in another project?

@seanbaxter
Copy link

seanbaxter commented Apr 15, 2021

I was compiling CUDA Toolkit samples without modification with a single translation pass in 2019:

I ended up removing this functionality because CUDA_ARCH-based dispatch was too hard to treat robustly as the projects got more complicated. I made some requests to change this dispatch mechanism, and now it looks like it's happening. I'm looking to re-enable CUDA targets in my compiler. I want to compile CUB/Thrust with a single pass, as we all do.

For background, I had an if-codegen statement which did operand substitution at codegen time rather than template substitution, and that appears to be exactly the same as nvc++'s if-target. There was an nvvm_arch_t enum with SM versions, which is like your sm_selector.
https://www.circle-lang.org/saxpy.html#how-to-dispatch

I looked over <__target_macros> closely.. Three small questions:

  1. Are __NV_PROVIDES_SM_XX and __NV_IS_EXACTLY_SM_XX not actually macros but rather implicit bool declarations that get plugged true/false at codegen depending on the target? I'm cool if they are. If not, I have no idea what's going on.
  2. Is nv::target::device also an implicit integer declaration holding the current SM target?
  3. Does [[nv::__target_bitset]] implicitly declare a target_description member function along the lines of:
explicit operator bool () const noexcept { returns nv::target::device & targets; }

?

Just a bit confused by the macro name style in 1 and the C++ name style in 2. Maybe they're unrelated things.

@alliepiper
Copy link
Collaborator

Ah ok - Everything in cub/detail/nv/ is a temporary snapshot from NVIDIA/libcudacxx#144, this PR just updates Thrust/CUB to use the libcu++ NV_IF_TARGET macros. @wmaxey, @dkolsen-pgi, and @brycelelbach wrote and designed those macros and would be better able to answer your questions about the if-target design.

@griwes
Copy link
Collaborator

griwes commented Apr 15, 2021

nv::target::device is just a set of targets. The actual codegen-known value does not manifest in user code at any point.

@seanbaxter
Copy link

@griwes That doesn't clarify. Is nv::target::device is the bit-field of targets specified at the command line? Where is the active target manifested? It has to be somewhere.

@griwes
Copy link
Collaborator

griwes commented Apr 15, 2021

I'm not sure how much I can say at this point, but the short answer is that that value doesn't actually ever manifest directly in a variable, and the backend understands the bitset.

@seanbaxter
Copy link

I'm just trying to compile CUB/Thrust with this new dispatch mechanic. What are the implicit declarations needed to do that?

@dkolsen-pgi
Copy link
Collaborator

The value of an if target expression is a bitset that encodes all the desired targets. There is compiler magic that interprets the bitset and generates the correct code. If you want Circle to use the new if target mechanism rather than __CUDA_ARCH__, the Circle compiler will have to have the same compiler magic. This is not the right place to explain all the details of how it works; we would need to set up a meeting.

@wmaxey
Copy link
Member

wmaxey commented Apr 16, 2021

  1. Are __NV_PROVIDES_SM_XX and __NV_IS_EXACTLY_SM_XX not actually macros but rather implicit bool declarations that get plugged true/false at codegen depending on the target? I'm cool if they are. If not, I have no idea what's going on.

They are used for assembling the right token when preprocessing with __CUDA_ARCH__. The whole mechanism should be uninteresting as there is nothing really different or new happening with NVCC. This is just an abstraction to allow writing compiler independent code.

In the back-end, these tokens are concatenated to obtain predefined boolean values on NVCC. It was done this way, maybe naively, because NVC++ and NVCC create the dispatch at different stages in compilation.

@alliepiper alliepiper modified the milestones: 1.13.0, 1.14.0 Jun 9, 2021
@alliepiper alliepiper linked a pull request Aug 17, 2021 that will close this issue
@alliepiper alliepiper modified the milestones: 1.14.0, 1.15.0 Aug 17, 2021
@alliepiper alliepiper modified the milestones: 1.15.0, 1.16.0 Oct 15, 2021
@alliepiper alliepiper modified the milestones: 1.16.0, 1.17.0 Feb 7, 2022
@alliepiper alliepiper modified the milestones: 1.17.0, 2.0.0 Apr 25, 2022
@alliepiper alliepiper modified the milestones: 2.0.0, 2.1.0 Aug 4, 2022
@jrhemstad jrhemstad added this to CCCL Aug 11, 2022
@jrhemstad
Copy link
Collaborator

Closing in favor of NVIDIA/cccl#65

@jrhemstad jrhemstad closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in CCCL Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler: nvc++ Specific to the NVC++ compiler. cub P0: must have Absolutely necessary. Critical issue, major blocker, etc. type: enhancement New feature or request.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants