[RFE] Allow `init_value` to be a device pointer in `cub::DeviceScan::ExclusiveScan` #294

zasdfgbnm · 2021-05-03T20:17:02Z

per title

alliepiper · 2021-05-04T17:47:33Z

I assume this is to support a usecase where init_value isn't yet computed when launching the scan kernel? That would definitely be nice to have.

zasdfgbnm · 2021-05-04T17:54:29Z

@allisonvacanti Yes, correct.

alliepiper · 2021-05-04T18:11:17Z

Gotcha. I'm not sure when something like this would happen, it would need to go into a new overload and require some invasive changes to the implementation, but I'll keep it in mind when as we start updating algorithm implementations in the coming months.

zasdfgbnm · 2021-05-04T19:33:03Z

The use case in PyTorch is to manually split tensors larger than 2^31 elements into small tensors. In this use case, we will run the exclusive scan on the first tensor, and use the last element in the first tensor as the init_value of the second tensor.

ngimel · 2021-05-07T17:25:12Z

Turns out we can use custom type for InitValueT to do what we want

template <typename T>
struct init_val {
    __host__ __device__ init_val(T* init_ptr){
        init_p = init_ptr;
    }
    __host__ __device__ operator T() const {return *init_p;}
    T * init_p;
};

As far as I could tell inside the kernel InitValueT is implicitly converted to OutputValueT so the above struct makes it possible.

alliepiper · 2021-05-10T18:55:49Z

Thanks for pointing this out @ngimel! I think that's a much cleaner solution that works well within the design of the library.

Closing this issue. The suggestion above is preferable to adding a new overload for this usecase.

ngimel · 2021-05-10T19:09:36Z

@allisonvacanti this solution relies on cub internal implementation, i.e., that the first and last thing it does with init_value is implicitly or explicitly convert it to OutputValueT. It seems to work now, but cub could choose to do something internally that would break the above code, so it would be good to explicitly list the requirements that you have for InitValueT.
And yes, thanks for making InitValueT and OutputValueT different template types, that was a good design decision that makes cub much more flexible!

alliepiper · 2021-05-11T15:48:26Z

Now that you mention it...this usecase may have actually been broken in a CUB 1.11.0 / CTK 11.3 by #201. That PR changes the behavior of the scan to match the suggestion by https://wg21.link/P0571, which uses InitTypeT to accumulate the intermediate scan results instead of OutputValueT. Are you using an earlier version of CUB?

I'll reopen this, we'll need to think about this some more.

ngimel · 2021-05-11T16:04:35Z

Yeah, I was using 11.0 toolkit and cub version that comes with it.

alliepiper · 2021-05-11T16:15:10Z

Oops, yeah 😅 Looks like that change may have broken your usecase...

Pinging @brycelelbach for visibility.

ngimel · 2021-05-11T16:55:26Z

I agree that accumulating in InitValueT is the standard-compliant thing to do, but that's back to original question for us - what can we do to use device ptr to get initial value?

jrhemstad · 2021-05-11T17:58:15Z

I agree that accumulating in InitValueT is the standard-compliant thing to do, but that's back to original question for us - what can we do to use device ptr to get initial value?

This doesn't quite solve the immediate problem, but for similar problems I've used a combination of a thrust::permutation_iterator and thrust::constant_iterator like:

int * p = // device pointer to some to-be-computed value
auto iter = thrust::make_permutation_iterator(p, thrust::make_constant_iterator(0));

// Dereferencing `iter` will return p[0], aka `*p`

Problem is that CUB scan expects the initial value to be passed by value and not as a iterator/pointer-like thing.

zasdfgbnm · 2021-05-12T01:15:23Z

Can we add a special class to cub cub::InitValueFromDevicePointer, something like:

template<typename InitValueT>
struct InitValueFromDevicePointer {
    InitValueT *ptr;
    InitValueFromDevicePointer(InitValueT *ptr): ptr(ptr) {}
    __device__ operator InitValueT() {
        return *ptr;
    }
};

And internally when InitValueT is an InitValueFromDevicePointer, we extract the real type for computation? Something like:

template<typename InitValueT>
struct IsDevicePointer {
    static constexpr bool VALUE = false;
    using TYPE = InitValueT;
};

template<typename InitValueT>
struct IsDevicePointer<InitValueFromDevicePointer<InitValueT>> {
    static constexpr bool VALUE = true;
    using TYPE = InitValueT;
};

and

    using RealInitValueT = typename IsDevicePointer<InitValueT>::TYPE;

    // The output value type -- used as the intermediate accumulator
    // Per https://wg21.link/P0571, use InitValueT if provided, otherwise the
    // input iterator's value type.
    using OutputT =
      typename If<Equals<InitValueT, NullType>::VALUE, InputT, RealInitValueT>::Type;

See the prototype PR: #305

alliepiper · 2021-05-12T16:23:57Z

@zasdfgbnm I like this idea! I think we should generalize the implementation a bit, since this would probably be useful in other contexts.

I'll start a discussion on the PR.

alliepiper · 2022-05-09T20:55:28Z

This was fixed in #305.

alliepiper added type: enhancement New feature or request. helps: pytorch Helps or needed by PyTorch. labels May 4, 2021

alliepiper closed this as completed May 10, 2021

alliepiper reopened this May 11, 2021

alliepiper mentioned this issue May 12, 2021

Support future value for initial value for device scan #305

Merged

zasdfgbnm mentioned this issue Sep 23, 2021

Add scan by key algorithms to cub #376

Merged

alliepiper mentioned this issue Nov 8, 2023

Add cub::FutureValue support for cub::DeviceReduce::Reduce initial values NVIDIA/cccl#884

Open

alliepiper closed this as completed May 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Allow `init_value` to be a device pointer in `cub::DeviceScan::ExclusiveScan` #294

[RFE] Allow `init_value` to be a device pointer in `cub::DeviceScan::ExclusiveScan` #294

zasdfgbnm commented May 3, 2021 •

edited

Loading

alliepiper commented May 4, 2021

zasdfgbnm commented May 4, 2021

alliepiper commented May 4, 2021

zasdfgbnm commented May 4, 2021

ngimel commented May 7, 2021

alliepiper commented May 10, 2021

ngimel commented May 10, 2021

alliepiper commented May 11, 2021

ngimel commented May 11, 2021

alliepiper commented May 11, 2021

ngimel commented May 11, 2021

jrhemstad commented May 11, 2021

zasdfgbnm commented May 12, 2021 •

edited

Loading

alliepiper commented May 12, 2021

alliepiper commented May 9, 2022

[RFE] Allow init_value to be a device pointer in cub::DeviceScan::ExclusiveScan #294

[RFE] Allow init_value to be a device pointer in cub::DeviceScan::ExclusiveScan #294

Comments

zasdfgbnm commented May 3, 2021 • edited Loading

alliepiper commented May 4, 2021

zasdfgbnm commented May 4, 2021

alliepiper commented May 4, 2021

zasdfgbnm commented May 4, 2021

ngimel commented May 7, 2021

alliepiper commented May 10, 2021

ngimel commented May 10, 2021

alliepiper commented May 11, 2021

ngimel commented May 11, 2021

alliepiper commented May 11, 2021

ngimel commented May 11, 2021

jrhemstad commented May 11, 2021

zasdfgbnm commented May 12, 2021 • edited Loading

alliepiper commented May 12, 2021

alliepiper commented May 9, 2022

[RFE] Allow `init_value` to be a device pointer in `cub::DeviceScan::ExclusiveScan` #294

[RFE] Allow `init_value` to be a device pointer in `cub::DeviceScan::ExclusiveScan` #294

zasdfgbnm commented May 3, 2021 •

edited

Loading

zasdfgbnm commented May 12, 2021 •

edited

Loading