-
Notifications
You must be signed in to change notification settings - Fork 447
Enable more warning flags. #249
Enable more warning flags. #249
Conversation
c065b3a
to
94d78b3
Compare
06366b2
to
8983abe
Compare
8983abe
to
d588658
Compare
append_option_if_available("-Wgnu" cxx_compile_options) | ||
# Calling a variadic macro with zero args is a GNU extension until C++20, | ||
# but the THRUST_PP_ARITY macro is used with zero args. Need to see if this | ||
# is a real problem worth fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be a problem as every compiler under the sun supports the extension, including NVRTC I believe.
You should re-run this with CUB in |
The whole |
cub/grid/grid_even_share.cuh
Outdated
this->block_end = num_items_; // Initialize past-the-end | ||
this->num_items = num_items_; | ||
this->total_tiles = (num_items_ + tile_items - 1) / tile_items; | ||
this->grid_size = CUB_MIN(static_cast<int>(total_tiles), max_grid_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could overflow, because total_tiles
is OffsetT
which could be 64bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be related to #221.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer us to either make sure all the static casts don't overflow, or move them to a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this and similar issues in the thrust/cub kernel dispatch code with a new cub::DivideAndRoundUp
method that avoids these (n + d - 1) / d
overflows.
cub/grid/grid_even_share.cuh
Outdated
OffsetT avg_tiles_per_block = total_tiles / grid_size; | ||
this->big_shares = total_tiles - (avg_tiles_per_block * grid_size); // leftover grains go to big blocks | ||
// leftover grains go to big blocks | ||
this->big_shares = static_cast<int>(total_tiles - (avg_tiles_per_block * grid_size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason big_share
can't be an OffsetT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look and if anything, total_tiles
should be changed to a 32-bit int since it's used as a block dimension and those can't exceed INT32_MAX. The only usages of the variable assume it is a 32-bit int.
I'll change this so that total_tiles
is appropriately sized, which will eliminate the need for this (and a few other) static_cast
s.
#pragma GCC diagnostic push | ||
#pragma GCC diagnostic ignored "-Wdeprecated-declarations" | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would deprecate or remove this entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -114,7 +114,7 @@ struct half_t | |||
{ /* normal */ | |||
ir |= ia >> (24 - 11); | |||
ia = ia << (32 - (24 - 11)); | |||
ir = ir + ((14 + shift) << 10); | |||
ir = static_cast<uint16_t>(ir + ((14 + shift) << 10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably okay because it's just in the test code, but I'm pretty sure this is actually a bug due to potential overflow.
TestDispatch<T, OffsetT, LengthT>(num_items); | ||
unsigned int num; | ||
RandomBits(num); | ||
num = (unsigned int) ((double(num) * double(10000000)) / double(max_int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're changing this already, maybe static_cast
here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/test_device_scan.cu
Outdated
TestOp<InputT>(num_items, identity, initial_value); | ||
unsigned int num; | ||
RandomBits(num); | ||
num = (unsigned int) ((double(num) * double(10000000)) / double(max_int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe static_cast
here as a drive-by fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, I'd prefer to split the static_cast
into a separate PR, or make sure they don't overflow, maybe add a comment to each explaining why they're fine.
Updated #1175 with this recommendation. |
After fixing #221, I reviewed the remaining usages and all are safe. Added comments to any non-obviously-safe ones. |
Anonymous structs are C features. In C++, they're non-portable compiler extensions. These only seemed to pop up in CUB-style `TempStorage` objects, I just picked some reasonable sounding names for them.
We need to deprecate this class since the underlying CUDA APIs are deprecated. This suppression is a temporary workaround. Tracked by NVIDIA#191.
Changing to a `static_cast` fixes this warning.
The `cuda_std_17` compile feature is broken for MSVC when CMake < 3.18.3.
d588658
to
91e9ac7
Compare
91e9ac7
to
0fb8228
Compare
Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796
This value will always be representable with an int, and all usages of it treat it as a 32-bit int. Changing the type avoids some casts at usage sites.
0fb8228
to
e0a6736
Compare
Fixes #228.