Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

Merged
merged 7 commits into from
Mar 1, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 17, 2022

Which issue does this PR close?

Closes #1201.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 17, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2022

Codecov Report

Merging #1326 (635e90d) into master (827cc3e) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1326      +/-   ##
==========================================
+ Coverage   83.00%   83.04%   +0.04%     
==========================================
  Files         180      180              
  Lines       52919    52980      +61     
==========================================
+ Hits        43924    43998      +74     
+ Misses       8995     8982      -13     
Impacted Files Coverage Δ
arrow/src/compute/kernels/comparison.rs 92.47% <100.00%> (+0.29%) ⬆️
parquet/src/encodings/encoding.rs 93.52% <0.00%> (-0.20%) ⬇️
arrow/src/ipc/writer.rs 83.45% <0.00%> (-0.04%) ⬇️
arrow/src/ffi.rs 84.53% <0.00%> (ø)
arrow/src/array/data.rs 83.30% <0.00%> (ø)
arrow/src/csv/reader.rs 88.12% <0.00%> (ø)
arrow/src/csv/writer.rs 72.13% <0.00%> (ø)
arrow/src/compute/util.rs 98.90% <0.00%> (ø)
arrow/src/array/builder.rs 86.73% <0.00%> (ø)
arrow/src/array/array_primitive.rs 94.69% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 827cc3e...635e90d. Read the comment docs.

@viirya
Copy link
Member Author

viirya commented Feb 17, 2022

cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @viirya -- this is looking great 😍

My only real concern is about using !(a^b) rather than a == b but I may be missing something

I went through the tests carefully and they look good to me. epic work

@@ -2032,10 +2032,10 @@ macro_rules! typed_compares {

/// Applies $OP to $LEFT and $RIGHT which are two dictionaries which have (the same) key type $KT
macro_rules! typed_dict_cmp {
($LEFT: expr, $RIGHT: expr, $OP: expr, $KT: tt) => {{
($LEFT: expr, $RIGHT: expr, $OP: expr, $OP_BOOL: expr, $KT: tt) => {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice readability improvement

@@ -2318,7 +2318,7 @@ where
pub fn eq_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray> {
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a == b)
typed_dict_compares!(left, right, |a, b| a == b, |a, b| !(a ^ b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change -- I think the a == b is easier to understand and I would expect that llvm would create optimized code for whatever was being compared.

If this is clippy being silly about comparing booleans perhaps we can just ignore the lint

Suggested change
typed_dict_compares!(left, right, |a, b| a == b, |a, b| !(a ^ b))
typed_dict_compares!(left, right, |a, b| a == b, |a, b| a == b)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, I wrote it like you suggest at first, but changed it basically to make clippy happy. 😄
If we can ignore that, then I can change back.

Copy link
Contributor

@alamb alamb Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore it. I think clippy is somewhat confused probably when the parameters are boolean

typed_compares!(left, right, neq_bool, neq, neq_utf8, neq_binary)
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a != b, |a, b| (a ^ b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_dict_compares!(left, right, |a, b| a != b, |a, b| (a ^ b))
typed_dict_compares!(left, right, |a, b| a != b, |a, b| a != b)

typed_compares!(left, right, lt_bool, lt, lt_utf8, lt_binary)
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a < b, |a, b| (!a) & b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_dict_compares!(left, right, |a, b| a < b, |a, b| (!a) & b)
typed_dict_compares!(left, right, |a, b| a < b, |a, b| a < b)

typed_compares!(left, right, lt_eq_bool, lt_eq, lt_eq_utf8, lt_eq_binary)
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a <= b, |a, b| !(a & (!b)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_dict_compares!(left, right, |a, b| a <= b, |a, b| !(a & (!b)))
typed_dict_compares!(left, right, |a, b| a <= b, |a, b| a <= b)

typed_compares!(left, right, gt_bool, gt, gt_utf8, gt_binary)
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a > b, |a, b| a & (!b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_dict_compares!(left, right, |a, b| a > b, |a, b| a & (!b))
typed_dict_compares!(left, right, |a, b| a > b, |a, b| a > b)

typed_compares!(left, right, gt_eq_bool, gt_eq, gt_eq_utf8, gt_eq_binary)
match left.data_type() {
DataType::Dictionary(_, _) => {
typed_dict_compares!(left, right, |a, b| a >= b, |a, b| !((!a) & b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_dict_compares!(left, right, |a, b| a >= b, |a, b| !((!a) & b))
typed_dict_compares!(left, right, |a, b| a >= b, |a, b| a >= b)

@@ -4790,5 +4851,76 @@ mod tests {
result.unwrap(),
BooleanArray::from(vec![false, true, false])
);

let result = neq_dyn(&dict_array1, &dict_array2);
assert!(result.is_ok());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a style thing, I think it is ok to just .unwrap() the result -- if there is a problem it will panic one line later, but I think the source of the problem would still be quite clear

Suggested change
assert!(result.is_ok());

@viirya
Copy link
Member Author

viirya commented Feb 17, 2022

Oh, I used a == b at first, but when I looked at eq_bool, it uses !(a ^ b), so I modified to follow it. I can change it back to a == b if it looks better.

@viirya
Copy link
Member Author

viirya commented Feb 17, 2022

Thanks @alamb ! Changed the bool ops back and removed is_ok check.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @viirya 👌

@alamb
Copy link
Contributor

alamb commented Feb 28, 2022

Hi @viirya -- I hope you don't mind but i merged this PR from master and added 219c131 to silence clippy -- it was claiming

error: order comparisons between booleans can be simplified
    --> arrow/src/compute/kernels/comparison.rs:2370:68
     |
2370 |             typed_dict_compares!(left, right, |a, b| a < b, |a, b| a < b)
     |                                                                    ^^^^^ help: try simplifying it as shown: `!a & b`
     |
     = note: `-D clippy::bool-comparison` implied by `-D warnings`
     = help: for further information visit [https://rust-lang.gi](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison)

Which is nonsense in my opinion (!a & b) is much less readable than a < b and I would expect the code generator to do that transformation anyways if it helps performance.

@viirya
Copy link
Member Author

viirya commented Mar 1, 2022

Yea, no problem at all! Thanks @alamb !

@alamb alamb merged commit 483a502 into apache:master Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement DictionaryArray support in eq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn
3 participants