Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda11.2 #362

Merged
merged 7 commits into from
May 14, 2021
Merged

Cuda11.2 #362

merged 7 commits into from
May 14, 2021

Conversation

liruipeng
Copy link
Contributor

This PR (by @pbauman) addresses Thrust 1.10 breaking changes. Thrust 1.10 landed in CUDA 11.2 and will land in ROCm 4.2.

As discussed in NVIDIA/thrust#1379 (that internally references NVIDIA/thrust#1176), the behavior of exclusive_scan and inclusive_scan changed in the case where the input types and output types were not the same. There is no deprecation warning or error thrown by the compiler. Indeed, with Thrust 1.10, before this PR, the exclusive_scan calls that had make_transform_iterator used in the input types (silently!) generated incorrect results. That means HYPRE is broken on GPUs today with CUDA 11.2 without this PR. There are a couple of way to fix, but what I did for the fix for exclusive_scan was to just use the API where one specifies the initial value and that was enough to fix the issue. It did not appear to me that there were any inclusive_scan calls affected in HYPRE. And my tests with this PR with a ROCm 4.2 release candidate pass.

In addition, Thrust 1.10 deprecated the use of C++ before C++14 so I've added -std=c++14 to the HIPCXXFLAGS argument in the configure.in (and bootstrapped).

Thrust 1.12 introduces similar breakages for the scan_by_key cousins, see NVIDIA/thrust#1376. The fixes are similar and I dropped in explicit casts to HYPRE_BigInt in the (already existing) initial value for exclusive_scan_by_key (commit 9a3bb66). I've not tried to address inclusive_scan_by_key cases, but I do believe they will be broken. I strongly recommend adding unit tests for those calls and adding them to the test suite. Thrust 1.12 is supposedly going to land in CUDA 11.4 (at least according to that thrust release page). I do not know when it will land in a ROCm release.

Thank you.

@@ -228,7 +228,7 @@ hypre_IJMatrixAssembleSortAndReduce1(HYPRE_Int N0, HYPRE_BigInt *I0, HYPRE_Big
make_reverse_iterator(thrust::make_zip_iterator(thrust::make_tuple(I0, J0))),
make_reverse_iterator(thrust::device_pointer_cast<char>(X0)+N0),
make_reverse_iterator(thrust::device_pointer_cast<char>(X) +N0),
0,
Copy link
Contributor Author

@liruipeng liruipeng May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbauman I am not sure if we want InitialValueType == BigInt, since both the input and output are of type char * (the keys are BigInt). Maybe char(0) or actually just 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with char(0), which seemed fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I believe what you have is correct @liruipeng. However, I think this is worth capturing in a unit test. It appears the scan_by_key API behavior changes will land in Thrust 1.12, which neither CUDA nor ROCm are using yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add, my fear of just leaving the raw 0 is that the compiler takes the type as int instead of the correct output type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbauman Agree, we should use char(0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbauman About unit tests, should we add tests for scan, scan_by_key by some simple example as in NVIDIA/thrust#1379, or unit tests affected hypre routines somehow? Is it enough to just run hypre's whole regression suite after module load cuda/11.2? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it were me, I would drop in a unit test (and a unit testing framework if you don't have one). The ij test failed for me, but very differently for CUDA and HIP. But, the downside, is then you have to go fishing for where the breakage actually occurs, which can take many hours. I suggested a unit test since we know a behavior change is coming. Just my two cents. But if there's not time/resources, the ij should show the failure.

@liruipeng liruipeng marked this pull request as ready for review May 13, 2021 07:16
Copy link
Contributor

@rfalgout rfalgout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

Copy link
Contributor

@pbauman pbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit was the correct set of changes from my PR. Thanks!

@liruipeng liruipeng mentioned this pull request May 14, 2021
@liruipeng liruipeng merged commit c7175a9 into master May 14, 2021
@liruipeng liruipeng deleted the cuda11.2 branch May 14, 2021 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants