-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested struct binop comparison #9452
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #9452 +/- ##
================================================
- Coverage 86.40% 86.31% -0.10%
================================================
Files 143 144 +1
Lines 22448 22648 +200
================================================
+ Hits 19396 19548 +152
- Misses 3052 3100 +48
Continue to review full report at Codecov.
|
This PR adds some struct utility functions. This change is needed for the eventual support of structs in binary operations. See also: PR #9452. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Jake Hemstad (https://github.com/jrhemstad) URL: #10776
…#10780) This PR changes the internal APIs used for binary operations to use `column_view` objects instead of `column_device_view` objects. This change is needed for the eventual support of structs in binary operations. See also: PR #9452. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ryan Lee (https://github.com/rwlee) - Nghia Truong (https://github.com/ttnghia) - Jake Hemstad (https://github.com/jrhemstad) URL: #10780
Co-authored-by: Bradley Dice <[email protected]>
template <typename Comparator, weak_ordering... values> | ||
struct weak_ordering_comparator_impl { | ||
__device__ bool operator()(size_type const& lhs, size_type const& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually static_assert
this requirement to be sure it doesn't happen.
Double check that I'm using the right names here.
template <typename Comparator, weak_ordering... values> | |
struct weak_ordering_comparator_impl { | |
__device__ bool operator()(size_type const& lhs, size_type const& rhs) | |
template <typename Comparator, weak_ordering... values> | |
struct weak_ordering_comparator_impl { | |
static_assert( not (sizeof...(values) == 1 and values == weak_ordering::equivalent), "weak_ordering_comparator should not be used for pure equality comparisons. The `row_equality_comparator` should be used instead"); | |
__device__ bool operator()(size_type const& lhs, size_type const& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, the compiler complains about values == weak_ordering::equivalent
in the context of the static assert. This forces an expansion --> ((weak_ordering::EQUIVALENT == values) && ...)
, which subsequently makes the sizeof...(values) == 1
check irrelevant.
binary_op_device_dispatcher<ops::NotEqual>{ | ||
*common_dtype, *outd, *lhsd, *rhsd, is_lhs_scalar, is_rhs_scalar}); | ||
} | ||
if (is_struct(lhs.type()) && is_struct(rhs.type())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is about 65 lines long. Could we separate out the struct logic into another function and call it here / return early, to reduce the long if/else branches in this function body?
void dispatch_equality_op(...) {
CUDF_EXPECTS(...);
if (is_struct(lhs.type()) && is_struct(rhs.type())) {
// Handle struct data
dispatch_equality_op_structs(...);
return;
}
// existing function body.
}
@@ -190,6 +192,26 @@ std::optional<data_type> get_common_type(data_type out, data_type lhs, data_type | |||
|
|||
bool is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op) | |||
{ | |||
return double_type_dispatcher(lhs, rhs, is_supported_operation_functor{}, out, op); | |||
return double_type_dispatcher(lhs, rhs, is_supported_operation_functor{}, out, op) || | |||
(is_struct(lhs) && is_struct(rhs) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic should be baked into the is_supported_operation_functor
rather than treated as a special case here.
bool is_supported_operation(data_type out, | ||
column_view const& lhs, | ||
column_view const& rhs, | ||
binary_operator op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep only one API for is_supported_operation
, rather than one for data types and one for column views. If the above function can be refactored to remove the special case for struct types, then we can replace the call to is_supported_operation(data_type out, data_type lhs, data_type rhs, binary_operator op)
with that double type dispatcher call to the impl
.
@@ -284,27 +290,71 @@ void apply_binary_op(mutable_column_view& out, | |||
bool is_rhs_scalar, | |||
rmm::cuda_stream_view stream) | |||
{ | |||
auto common_dtype = get_common_type(out.type(), lhs.type(), rhs.type()); | |||
if (is_struct(lhs.type()) && is_struct(rhs.type())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below: This function is about 65 lines long. Could we separate out the struct logic into another function and call it here / return early, to reduce the long if/else branches in this function body?
*/ | ||
template <typename Nullate> | ||
template <typename Nullate, bool NanConfig = false> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to pull out the NanConfig
and related changes into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR changes the experimental `device_row_comparator` to return `weak_ordering` instead of `bool`. Originally part of PR #9452. Aids PR #10730, which builds strongly-typed two table comparators and should return a `weak_ordering`. Authors: - Ryan Lee (https://github.com/rwlee) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #10793
Further splitting up #9452 -- split off at the suggestion of @bdice Related to #10781 and #4760 -- issues and discussions related to NaN comparison behavior. Authors: - Ryan Lee (https://github.com/rwlee) - Bradley Dice (https://github.com/bdice) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #10870
Closing in favor of #11153 |
Replaces #9452 Takes the core cudf work from 9452 and integrates it into the JNI while removing struct support integration into cudf's binary operator API. Includes scalar comparison support in addition to vector to vector comparison support. Follow on needed Add null_equal operator support Authors: - Ryan Lee (https://github.com/rwlee) - Bradley Dice (https://github.com/bdice) Approvers: - Raza Jafri (https://github.com/razajafri) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #11153
Partial solution to #8964
Adding column to column nested struct comparison for binary comparison operations. The existing binops cannot currently handle structs or nested types -- this PR does not attempt to add support for list types. It also does not include scalar to column and column to scalar comparison support.
Assumptions
Code currently assumes that the nested structs have the same type structure, if this is a bad assumption they can be added with a
CUDF_FAIL
result.Alternatives considered
Using existing binops for the nested child columns and then merging the results was considered and abandoned because of its performance and its handling of nulls within the child column comparisons. The child column comparisons would evaluate a comparison involving null on either side as null, causing invalid results for the struct
struct(2, null) == struct(2, null)
should betrue
andstruct(2, null) == struct(2, 1)
should befalse
. However the child comparisons for both would result in[true, null]
which could not be combined unambiguously.Design
Leverages the
row_equality_comparator
androw_lexicographic_comparator
to compare flattened tables of the struct column. To support the output type and different operator arguments, a type dispatch and a case switch are used. I'm hopeful that there is a better way to dispatch on the operator type than the case switch that's currently used. Right now each operator is a unique function, though with some cleanup I think they can be merged.Open questions that need to be answered
Is there a better way to dispatch on the operator type than the case switch currently being used?
Currently this sits separate from the binop operators, but can be integrated into
binary_ops.cu
. How should this be integrated and should the code be part of struct utilities or the existing binops code?Should nested null equality be configurable?
Follow ons
Add column to scalar and scalar to columnar support
Add
null_equal
operator supportJNI support for the new functionality