Cuda11.2 #362

liruipeng · 2021-05-13T06:03:32Z

This PR (by @pbauman) addresses Thrust 1.10 breaking changes. Thrust 1.10 landed in CUDA 11.2 and will land in ROCm 4.2.

As discussed in NVIDIA/thrust#1379 (that internally references NVIDIA/thrust#1176), the behavior of exclusive_scan and inclusive_scan changed in the case where the input types and output types were not the same. There is no deprecation warning or error thrown by the compiler. Indeed, with Thrust 1.10, before this PR, the exclusive_scan calls that had make_transform_iterator used in the input types (silently!) generated incorrect results. That means HYPRE is broken on GPUs today with CUDA 11.2 without this PR. There are a couple of way to fix, but what I did for the fix for exclusive_scan was to just use the API where one specifies the initial value and that was enough to fix the issue. It did not appear to me that there were any inclusive_scan calls affected in HYPRE. And my tests with this PR with a ROCm 4.2 release candidate pass.

In addition, Thrust 1.10 deprecated the use of C++ before C++14 so I've added -std=c++14 to the HIPCXXFLAGS argument in the configure.in (and bootstrapped).

Thrust 1.12 introduces similar breakages for the scan_by_key cousins, see NVIDIA/thrust#1376. The fixes are similar and I dropped in explicit casts to HYPRE_BigInt in the (already existing) initial value for exclusive_scan_by_key (commit 9a3bb66). I've not tried to address inclusive_scan_by_key cases, but I do believe they will be broken. I strongly recommend adding unit tests for those calls and adding them to the test suite. Thrust 1.12 is supposedly going to land in CUDA 11.4 (at least according to that thrust release page). I do not know when it will land in a ROCm release.

Thank you.

Thrust 1.10 deprecates the use of C++ < C+14.

Thrust 1.10 Fixes

liruipeng · 2021-05-13T06:35:39Z

src/IJ_mv/IJMatrix_parcsr_device.c

@@ -228,7 +228,7 @@ hypre_IJMatrixAssembleSortAndReduce1(HYPRE_Int  N0, HYPRE_BigInt  *I0, HYPRE_Big
         make_reverse_iterator(thrust::make_zip_iterator(thrust::make_tuple(I0,    J0))),
         make_reverse_iterator(thrust::device_pointer_cast<char>(X0)+N0),
         make_reverse_iterator(thrust::device_pointer_cast<char>(X) +N0),
-         0,


@pbauman I am not sure if we want InitialValueType == BigInt, since both the input and output are of type char * (the keys are BigInt). Maybe char(0) or actually just 0?

Tested with char(0), which seemed fine.

Oops. I believe what you have is correct @liruipeng. However, I think this is worth capturing in a unit test. It appears the scan_by_key API behavior changes will land in Thrust 1.12, which neither CUDA nor ROCm are using yet.

I should add, my fear of just leaving the raw 0 is that the compiler takes the type as int instead of the correct output type.

@pbauman Agree, we should use char(0).

@pbauman About unit tests, should we add tests for scan, scan_by_key by some simple example as in NVIDIA/thrust#1379, or unit tests affected hypre routines somehow? Is it enough to just run hypre's whole regression suite after module load cuda/11.2? Thanks!

If it were me, I would drop in a unit test (and a unit testing framework if you don't have one). The ij test failed for me, but very differently for CUDA and HIP. But, the downside, is then you have to go fishing for where the breakage actually occurs, which can take many hours. I suggested a unit test since we know a behavior change is coming. Just my two cents. But if there's not time/resources, the ij should show the failure.

src/IJ_mv/IJVector_parcsr_device.c

rfalgout

Looks good. Thanks!

pbauman

The last commit was the correct set of changes from my PR. Thanks!

pbauman and others added 5 commits May 3, 2021 07:37

Fix thrust::exclusive_scan calls

f313c88

explicity cast to HYPRE_Bigint in exclusive_scan_by_key

9a3bb66

Add -std=c++14 to HIP compiliation flags

9bc85bd

Thrust 1.10 deprecates the use of C++ < C+14.

Run bootstrap

f3fb655

Merge pull request #352 from pbauman/thrust-1.10-fixes

827c684

Thrust 1.10 Fixes

liruipeng commented May 13, 2021

View reviewed changes

src/IJ_mv/IJVector_parcsr_device.c Show resolved Hide resolved

ij_assembly driver w/o UVM

9c8fd92

liruipeng marked this pull request as ready for review May 13, 2021 07:16

exclusive_scan_by_key init type

d18aa88

liruipeng requested review from rfalgout, ulrikeyang and pbauman May 13, 2021 19:11

rfalgout approved these changes May 13, 2021

View reviewed changes

ulrikeyang approved these changes May 13, 2021

View reviewed changes

pbauman approved these changes May 14, 2021

View reviewed changes

liruipeng mentioned this pull request May 14, 2021

unit testing #367

Closed

liruipeng merged commit c7175a9 into master May 14, 2021

liruipeng deleted the cuda11.2 branch May 14, 2021 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda11.2 #362

Cuda11.2 #362

liruipeng commented May 13, 2021

liruipeng May 13, 2021 •

edited

Loading

liruipeng May 13, 2021

pbauman May 13, 2021

pbauman May 13, 2021

liruipeng May 13, 2021

liruipeng May 13, 2021

pbauman May 14, 2021

rfalgout left a comment

pbauman left a comment

Cuda11.2 #362

Cuda11.2 #362

Conversation

liruipeng commented May 13, 2021

liruipeng May 13, 2021 • edited Loading

Choose a reason for hiding this comment

liruipeng May 13, 2021

Choose a reason for hiding this comment

pbauman May 13, 2021

Choose a reason for hiding this comment

pbauman May 13, 2021

Choose a reason for hiding this comment

liruipeng May 13, 2021

Choose a reason for hiding this comment

liruipeng May 13, 2021

Choose a reason for hiding this comment

pbauman May 14, 2021

Choose a reason for hiding this comment

rfalgout left a comment

Choose a reason for hiding this comment

pbauman left a comment

Choose a reason for hiding this comment

liruipeng May 13, 2021 •

edited

Loading