Added warp::shfl functionality. #1273

frobnitzem · 2021-03-14T22:02:41Z

These changes implement calls to:

I only added __shfl for now, but there are also __shfl_up, etc. which could be cut/paste in a similar way. Apparently these were also needed by #18 .

include/alpaka/warp/WarpSingleThread.hpp

include/alpaka/warp/WarpUniformCudaHipBuiltIn.hpp

test/unit/warp/src/Shfl.cpp

sbastrakov

Thanks for the PR!

Looking at the HIP and CUDA documentation, they also have the last parameter width which is by default = warpSize. What is the reason for omitting it from your API? If you need access to this value, there is getSize() in Traits.hpp.

One thing that we discussed quite a bit when bringing in the support for warp functions #1003 (and also intrinsics #1004, #1018 ) was to go for fix-width types vs. just int. Here it's always a bit awkward as alpaka tries using fixed-width everywhere, but the backend APIs like CUDA operate with ints. @psychocoderHPC what do you think?

frobnitzem · 2021-03-15T15:58:25Z

Thanks for the PR!

Looking at the HIP and CUDA documentation, they also have the last parameter width which is by default = warpSize. What is the reason for omitting it from your API? If you need access to this value, there is getSize() in Traits.hpp.

One thing that we discussed quite a bit when bringing in the support for warp functions #1003 (and also intrinsics #1004, #1018 ) was to go for fix-width types vs. just int. Here it's always a bit awkward as alpaka tries using fixed-width everywhere, but the backend APIs like CUDA operate with ints. @psychocoderHPC what do you think?

These can be set to int32_t, since the CUDA documentation specifies "Template type T" with 4 or 8-byte values, while HIP says int or float. Templates seem to give awful errors from inputs other than int or float, however, since the casting is not unique.

With regard to warpsize, I had not thought about using that feature, but it is possible to use groups of consecutive threads smaller than the warpsize as "mini-warps", so I'll add it.

psychocoderHPC · 2021-03-15T18:12:40Z

One thing that we discussed quite a bit when bringing in the support for warp functions #1003 (and also intrinsics #1004, #1018 ) was to go for fix-width types vs. just int. Here it's always a bit awkward as alpaka tries using fixed-width everywhere, but the backend APIs like CUDA operate with ints. @psychocoderHPC what do you think?

Yes please expose always fixed with types to the user. If needed please cast the input values to whatever the native API function call requires. Fixed with types are required to guarentee portability between different compiler and platforms.

psychocoderHPC · 2021-03-15T18:16:52Z

With regard to warpsize, I had not thought about using that feature, but it is possible to use groups of consecutive threads smaller than the warpsize as "mini-warps", so I'll add it.

Yes CUDA allows shuffling between less than warp size threads. The problem in apaka would be that our CPU backends has a warpsize of one. One case where it is useful to shuffle between 4 lanes is if you run some kind of emulated sse4 instructions on the GPU e.g. for cryptographic algorithms. This would not be portable to alpaka CPU backends, therefore I would maybe restrict alpaka to support for shuffling in a full warp only.

psychocoderHPC · 2021-03-15T18:23:09Z

include/alpaka/warp/Traits.hpp

+        //-----------------------------------------------------------------------------
+        //! Broadcasts data from one thread to all members of the warp.
+        //! Similar to MPI_Bcast, but using srcLane instead of root.
+        //!


IMO we need to add to the documentation that this function shfl is collective what means all threads need to call the function and also from the same code branch.
The reason is that for CUDA the implementation is using activemask and for HIP all threads in a warp needs to call the function. Using activemask means if threads from the if and else branch call the function they will not see each other.

I updated these docs to include this warning.

I think I forgot to add a similar warning to the previously existing warp collectives. You comment also alllies to those, right @psychocoderHPC ?

@sbastrakov Yes this should be added to other warp functions too. Currently, only CUDA allows calling warp functions from different branches. It is fine if all threads of the warp are in the same branch but as soon as the threads diverge the behavior is undefined (for HIP and CUDA devices before sm_70) .

test/unit/warp/src/Shfl.cpp

bernhardmgruber · 2021-03-18T08:44:15Z

test/unit/warp/src/Shfl.cpp

+        float ans = alpaka::warp::shfl(acc, 3.3f, 0);
+        float expect = 3.3;
+        ALPAKA_CHECK(*success, CAST(ans) == CAST(expect));


While that may work on most compilers, it is certainly undefined behavior because we are reading a float value as an int32. At that point a memcmp is probably the more correct solution.

However, I think I prefer disabling the warning here. I think we know what we are doing here and we expect the exact same float to be shuffled around, so the comparison should not need an epsilon.

#pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wfloat-equal" ALPAKA_CHECK(*success, ans == expect); #pragma GCC diagnostic pop

I don't know if we need a similar escape hatch for the other compilers.

Compiler specific pragmas should be guarded: see

alpaka/include/alpaka/atomic/Op.hpp

Lines 31 to 34 in a78a70a

#if BOOST_COMP_GNUC

# pragma GCC diagnostic push

# pragma GCC diagnostic ignored "-Wconversion"

#endif

test/unit/warp/src/Shfl.cpp

frobnitzem · 2021-03-23T02:53:56Z

It seems one of the tests spuriously failed on a network error.

psychocoderHPC · 2021-03-23T08:23:45Z

It seems one of the tests spuriously failed on a network error.

We mostly need to ignore these tests. Github is not providing us with a feature to restart only a single job :-(

sbastrakov · 2021-03-23T16:26:02Z

So should we merge this one then?

psychocoderHPC · 2021-03-24T09:21:30Z

@frobnitzem Could you please address #1273 (comment) IMO it should be a separate function else it looks like a line with many magic numbers.

psychocoderHPC

I would like to block this PR until #1273 (comment) is implemented.

Remove magic number which was epsilon by `std::numeric_limits` eplsilon.

frobnitzem added 2 commits March 14, 2021 17:57

Added warp::shfl functionality.

205666b

removed unused param in Shfl<WarpSingleThread>

beec8c2

bernhardmgruber reviewed Mar 15, 2021

View reviewed changes

sbastrakov added this to the Version 0.7.0 milestone Mar 15, 2021

sbastrakov added the Type:Enhancement label Mar 15, 2021

sbastrakov reviewed Mar 15, 2021

View reviewed changes

test/unit/warp/src/Shfl.cpp Show resolved Hide resolved

sbastrakov reviewed Mar 15, 2021

View reviewed changes

psychocoderHPC reviewed Mar 15, 2021

View reviewed changes

Specified 32-bit precision ints on shfl() and added width parameter.

8b68865

bernhardmgruber reviewed Mar 16, 2021

View reviewed changes

test/unit/warp/src/Shfl.cpp Outdated Show resolved Hide resolved

bernhardmgruber mentioned this pull request Mar 17, 2021

Remove section comments #1275

Merged

bernhardmgruber reviewed Mar 18, 2021

View reviewed changes

frobnitzem force-pushed the shfl branch 2 times, most recently from 722fd44 to 3bd10d3 Compare March 18, 2021 22:50

Compared float by error tolerance to avoid compiler warning.

70f01ba

frobnitzem force-pushed the shfl branch from 3bd10d3 to 70f01ba Compare March 19, 2021 03:03

bernhardmgruber previously approved these changes Mar 19, 2021

View reviewed changes

test/unit/warp/src/Shfl.cpp Show resolved Hide resolved

psychocoderHPC requested changes Mar 25, 2021

View reviewed changes

remove magic number

5f4e89f

Remove magic number which was epsilon by `std::numeric_limits` eplsilon.

psychocoderHPC dismissed bernhardmgruber’s stale review via 5f4e89f March 29, 2021 09:09

psychocoderHPC approved these changes Mar 29, 2021

View reviewed changes

bernhardmgruber approved these changes Mar 29, 2021

View reviewed changes

BenjaminW3 approved these changes Mar 29, 2021

View reviewed changes

bernhardmgruber merged commit 77220d9 into alpaka-group:develop Mar 29, 2021

sbastrakov mentioned this pull request Apr 20, 2021

add special kernel methods #18

Closed

3 tasks

fwyzard mentioned this pull request Dec 7, 2023

Implement ShflUp, ShflDown and ShflXor #1924

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added warp::shfl functionality. #1273

Added warp::shfl functionality. #1273

frobnitzem commented Mar 14, 2021

sbastrakov left a comment •

edited

Loading

frobnitzem commented Mar 15, 2021

psychocoderHPC commented Mar 15, 2021

psychocoderHPC commented Mar 15, 2021

psychocoderHPC Mar 15, 2021

frobnitzem Mar 16, 2021

sbastrakov Mar 16, 2021

psychocoderHPC Mar 19, 2021

bernhardmgruber Mar 18, 2021

psychocoderHPC Mar 18, 2021

frobnitzem commented Mar 23, 2021

psychocoderHPC commented Mar 23, 2021

sbastrakov commented Mar 23, 2021

psychocoderHPC commented Mar 24, 2021

psychocoderHPC left a comment

	#if BOOST_COMP_GNUC
	# pragma GCC diagnostic push
	# pragma GCC diagnostic ignored "-Wconversion"
	#endif

Added warp::shfl functionality. #1273

Added warp::shfl functionality. #1273

Conversation

frobnitzem commented Mar 14, 2021

sbastrakov left a comment • edited Loading

Choose a reason for hiding this comment

frobnitzem commented Mar 15, 2021

psychocoderHPC commented Mar 15, 2021

psychocoderHPC commented Mar 15, 2021

psychocoderHPC Mar 15, 2021

Choose a reason for hiding this comment

frobnitzem Mar 16, 2021

Choose a reason for hiding this comment

sbastrakov Mar 16, 2021

Choose a reason for hiding this comment

psychocoderHPC Mar 19, 2021

Choose a reason for hiding this comment

bernhardmgruber Mar 18, 2021

Choose a reason for hiding this comment

psychocoderHPC Mar 18, 2021

Choose a reason for hiding this comment

frobnitzem commented Mar 23, 2021

psychocoderHPC commented Mar 23, 2021

sbastrakov commented Mar 23, 2021

psychocoderHPC commented Mar 24, 2021

psychocoderHPC left a comment

Choose a reason for hiding this comment

sbastrakov left a comment •

edited

Loading