-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda11.2 #362
Conversation
Thrust 1.10 deprecates the use of C++ < C+14.
Thrust 1.10 Fixes
@@ -228,7 +228,7 @@ hypre_IJMatrixAssembleSortAndReduce1(HYPRE_Int N0, HYPRE_BigInt *I0, HYPRE_Big | |||
make_reverse_iterator(thrust::make_zip_iterator(thrust::make_tuple(I0, J0))), | |||
make_reverse_iterator(thrust::device_pointer_cast<char>(X0)+N0), | |||
make_reverse_iterator(thrust::device_pointer_cast<char>(X) +N0), | |||
0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbauman I am not sure if we want InitialValueType == BigInt
, since both the input and output are of type char *
(the keys are BigInt
). Maybe char(0)
or actually just 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with char(0)
, which seemed fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I believe what you have is correct @liruipeng. However, I think this is worth capturing in a unit test. It appears the scan_by_key
API behavior changes will land in Thrust 1.12, which neither CUDA nor ROCm are using yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add, my fear of just leaving the raw 0
is that the compiler takes the type as int
instead of the correct output type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbauman Agree, we should use char(0)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbauman About unit tests, should we add tests for scan
, scan_by_key
by some simple example as in NVIDIA/thrust#1379, or unit tests affected hypre routines somehow? Is it enough to just run hypre's whole regression suite after module load cuda/11.2
? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it were me, I would drop in a unit test (and a unit testing framework if you don't have one). The ij
test failed for me, but very differently for CUDA and HIP. But, the downside, is then you have to go fishing for where the breakage actually occurs, which can take many hours. I suggested a unit test since we know a behavior change is coming. Just my two cents. But if there's not time/resources, the ij
should show the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit was the correct set of changes from my PR. Thanks!
This PR (by @pbauman) addresses Thrust 1.10 breaking changes. Thrust 1.10 landed in CUDA 11.2 and will land in ROCm 4.2.
As discussed in NVIDIA/thrust#1379 (that internally references NVIDIA/thrust#1176), the behavior of exclusive_scan and inclusive_scan changed in the case where the input types and output types were not the same. There is no deprecation warning or error thrown by the compiler. Indeed, with Thrust 1.10, before this PR, the exclusive_scan calls that had make_transform_iterator used in the input types (silently!) generated incorrect results. That means HYPRE is broken on GPUs today with CUDA 11.2 without this PR. There are a couple of way to fix, but what I did for the fix for exclusive_scan was to just use the API where one specifies the initial value and that was enough to fix the issue. It did not appear to me that there were any inclusive_scan calls affected in HYPRE. And my tests with this PR with a ROCm 4.2 release candidate pass.
In addition, Thrust 1.10 deprecated the use of C++ before C++14 so I've added -std=c++14 to the HIPCXXFLAGS argument in the configure.in (and bootstrapped).
Thrust 1.12 introduces similar breakages for the scan_by_key cousins, see NVIDIA/thrust#1376. The fixes are similar and I dropped in explicit casts to HYPRE_BigInt in the (already existing) initial value for exclusive_scan_by_key (commit 9a3bb66). I've not tried to address inclusive_scan_by_key cases, but I do believe they will be broken. I strongly recommend adding unit tests for those calls and adding them to the test suite. Thrust 1.12 is supposedly going to land in CUDA 11.4 (at least according to that thrust release page). I do not know when it will land in a ROCm release.
Thank you.