Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

viirya · 2022-02-17T06:47:41Z

Which issue does this PR close?

Closes #1201.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

…yn, gt_eq_dyn

arrow/src/compute/kernels/comparison.rs

codecov-commenter · 2022-02-17T07:04:15Z

Codecov Report

Merging #1326 (635e90d) into master (827cc3e) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1326      +/-   ##
==========================================
+ Coverage   83.00%   83.04%   +0.04%     
==========================================
  Files         180      180              
  Lines       52919    52980      +61     
==========================================
+ Hits        43924    43998      +74     
+ Misses       8995     8982      -13

Impacted Files	Coverage Δ
arrow/src/compute/kernels/comparison.rs	`92.47% <100.00%> (+0.29%)`	⬆️
parquet/src/encodings/encoding.rs	`93.52% <0.00%> (-0.20%)`	⬇️
arrow/src/ipc/writer.rs	`83.45% <0.00%> (-0.04%)`	⬇️
arrow/src/ffi.rs	`84.53% <0.00%> (ø)`
arrow/src/array/data.rs	`83.30% <0.00%> (ø)`
arrow/src/csv/reader.rs	`88.12% <0.00%> (ø)`
arrow/src/csv/writer.rs	`72.13% <0.00%> (ø)`
arrow/src/compute/util.rs	`98.90% <0.00%> (ø)`
arrow/src/array/builder.rs	`86.73% <0.00%> (ø)`
arrow/src/array/array_primitive.rs	`94.69% <0.00%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 827cc3e...635e90d. Read the comment docs.

viirya · 2022-02-17T09:43:52Z

cc @alamb

alamb

Thank you @viirya -- this is looking great 😍

My only real concern is about using !(a^b) rather than a == b but I may be missing something

I went through the tests carefully and they look good to me. epic work

alamb · 2022-02-17T13:43:26Z

arrow/src/compute/kernels/comparison.rs

@@ -2032,10 +2032,10 @@ macro_rules! typed_compares {

 /// Applies $OP to $LEFT and $RIGHT which are two dictionaries which have (the same) key type $KT
 macro_rules! typed_dict_cmp {
-    ($LEFT: expr, $RIGHT: expr, $OP: expr, $KT: tt) => {{
+    ($LEFT: expr, $RIGHT: expr, $OP: expr, $OP_BOOL: expr, $KT: tt) => {{


👍 nice readability improvement

alamb · 2022-02-17T13:46:09Z

arrow/src/compute/kernels/comparison.rs

@@ -2318,7 +2318,7 @@ where
 pub fn eq_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray> {
    match left.data_type() {
        DataType::Dictionary(_, _) => {
-            typed_dict_compares!(left, right, |a, b| a == b)
+            typed_dict_compares!(left, right, |a, b| a == b, |a, b| !(a ^ b))


I don't understand this change -- I think the a == b is easier to understand and I would expect that llvm would create optimized code for whatever was being compared.

If this is clippy being silly about comparing booleans perhaps we can just ignore the lint

Suggested change

typed_dict_compares!(left, right, |a, b| a == b, |a, b| !(a ^ b))

typed_dict_compares!(left, right, |a, b| a == b, |a, b| a == b)

Oh, okay, I wrote it like you suggest at first, but changed it basically to make clippy happy. 😄
If we can ignore that, then I can change back.

I think we can ignore it. I think clippy is somewhat confused probably when the parameters are boolean

alamb · 2022-02-17T13:48:16Z

arrow/src/compute/kernels/comparison.rs

-    typed_compares!(left, right, neq_bool, neq, neq_utf8, neq_binary)
+    match left.data_type() {
+        DataType::Dictionary(_, _) => {
+            typed_dict_compares!(left, right, |a, b| a != b, |a, b| (a ^ b))


Suggested change

typed_dict_compares!(left, right, |a, b| a != b, |a, b| (a ^ b))

typed_dict_compares!(left, right, |a, b| a != b, |a, b| a != b)

alamb · 2022-02-17T13:48:56Z

arrow/src/compute/kernels/comparison.rs

-    typed_compares!(left, right, lt_bool, lt, lt_utf8, lt_binary)
+    match left.data_type() {
+        DataType::Dictionary(_, _) => {
+            typed_dict_compares!(left, right, |a, b| a < b, |a, b| (!a) & b)


Suggested change

typed_dict_compares!(left, right, |a, b| a < b, |a, b| (!a) & b)

typed_dict_compares!(left, right, |a, b| a < b, |a, b| a < b)

alamb · 2022-02-17T13:49:11Z

arrow/src/compute/kernels/comparison.rs

-    typed_compares!(left, right, lt_eq_bool, lt_eq, lt_eq_utf8, lt_eq_binary)
+    match left.data_type() {
+        DataType::Dictionary(_, _) => {
+            typed_dict_compares!(left, right, |a, b| a <= b, |a, b| !(a & (!b)))


Suggested change

typed_dict_compares!(left, right, |a, b| a <= b, |a, b| !(a & (!b)))

typed_dict_compares!(left, right, |a, b| a <= b, |a, b| a <= b)

alamb · 2022-02-17T13:49:26Z

arrow/src/compute/kernels/comparison.rs

-    typed_compares!(left, right, gt_bool, gt, gt_utf8, gt_binary)
+    match left.data_type() {
+        DataType::Dictionary(_, _) => {
+            typed_dict_compares!(left, right, |a, b| a > b, |a, b| a & (!b))


Suggested change

typed_dict_compares!(left, right, |a, b| a > b, |a, b| a & (!b))

typed_dict_compares!(left, right, |a, b| a > b, |a, b| a > b)

alamb · 2022-02-17T13:49:41Z

arrow/src/compute/kernels/comparison.rs

-    typed_compares!(left, right, gt_eq_bool, gt_eq, gt_eq_utf8, gt_eq_binary)
+    match left.data_type() {
+        DataType::Dictionary(_, _) => {
+            typed_dict_compares!(left, right, |a, b| a >= b, |a, b| !((!a) & b))


Suggested change

typed_dict_compares!(left, right, |a, b| a >= b, |a, b| !((!a) & b))

typed_dict_compares!(left, right, |a, b| a >= b, |a, b| a >= b)

alamb · 2022-02-17T13:55:13Z

arrow/src/compute/kernels/comparison.rs

@@ -4790,5 +4851,76 @@ mod tests {
            result.unwrap(),
            BooleanArray::from(vec![false, true, false])
        );
+
+        let result = neq_dyn(&dict_array1, &dict_array2);
+        assert!(result.is_ok());


As a style thing, I think it is ok to just .unwrap() the result -- if there is a problem it will panic one line later, but I think the source of the problem would still be quite clear

Suggested change

assert!(result.is_ok());

viirya · 2022-02-17T17:10:17Z

Oh, I used a == b at first, but when I looked at eq_bool, it uses !(a ^ b), so I modified to follow it. I can change it back to a == b if it looks better.

viirya · 2022-02-17T17:41:22Z

Thanks @alamb ! Changed the bool ops back and removed is_ok check.

alamb

Looking good @viirya 👌

alamb · 2022-02-28T21:25:06Z

Hi @viirya -- I hope you don't mind but i merged this PR from master and added 219c131 to silence clippy -- it was claiming

error: order comparisons between booleans can be simplified
    --> arrow/src/compute/kernels/comparison.rs:2370:68
     |
2370 |             typed_dict_compares!(left, right, |a, b| a < b, |a, b| a < b)
     |                                                                    ^^^^^ help: try simplifying it as shown: `!a & b`
     |
     = note: `-D clippy::bool-comparison` implied by `-D warnings`
     = help: for further information visit [https://rust-lang.gi](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison)

Which is nonsense in my opinion (!a & b) is much less readable than a < b and I would expect the code generator to do that transformation anyways if it helps performance.

viirya · 2022-03-01T01:34:38Z

Yea, no problem at all! Thanks @alamb !

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_d…

0a33401

…yn, gt_eq_dyn

github-actions bot added the arrow Changes to the arrow crate label Feb 17, 2022

viirya commented Feb 17, 2022

View reviewed changes

arrow/src/compute/kernels/comparison.rs Show resolved Hide resolved

viirya added 3 commits February 17, 2022 00:42

Fix clippy

a9e8b67

Fix format

066f2b1

Add test

1bf0384

alamb mentioned this pull request Feb 17, 2022

Useeq_dyn, neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn kernels from arrow apache/datafusion#1475

Merged

alamb approved these changes Feb 17, 2022

View reviewed changes

For review comment and suggestion

635e90d

alamb reviewed Feb 17, 2022

View reviewed changes

alamb added 2 commits February 28, 2022 16:07

Merge remote-tracking branch 'apache/master' into issue_1201

f9ab93f

Allow reasonable boolean comparisons

219c131

alamb merged commit 483a502 into apache:master Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

viirya commented Feb 17, 2022

codecov-commenter commented Feb 17, 2022 •

edited

Loading

viirya commented Feb 17, 2022

alamb left a comment

alamb Feb 17, 2022

alamb Feb 17, 2022

viirya Feb 17, 2022

alamb Feb 17, 2022 •

edited

Loading

alamb Feb 17, 2022

alamb Feb 17, 2022

alamb Feb 17, 2022

alamb Feb 17, 2022

alamb Feb 17, 2022

alamb Feb 17, 2022

viirya commented Feb 17, 2022

viirya commented Feb 17, 2022

alamb left a comment

alamb commented Feb 28, 2022

viirya commented Mar 1, 2022

	typed_dict_compares!(left, right, \|a, b\| a == b, \|a, b\| !(a ^ b))
	typed_dict_compares!(left, right, \|a, b\| a == b, \|a, b\| a == b)

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

Implement DictionaryArray support in neq_dyn, lt_dyn, lt_eq_dyn, gt_dyn, gt_eq_dyn #1326

Conversation

viirya commented Feb 17, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Feb 17, 2022 • edited Loading

Codecov Report

viirya commented Feb 17, 2022

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Feb 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Feb 17, 2022

viirya commented Feb 17, 2022

alamb left a comment

Choose a reason for hiding this comment

alamb commented Feb 28, 2022

viirya commented Mar 1, 2022

codecov-commenter commented Feb 17, 2022 •

edited

Loading

alamb Feb 17, 2022 •

edited

Loading