-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DictionaryArray in unary kernel #1990
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1990 +/- ##
========================================
Coverage 83.58% 83.58%
========================================
Files 222 222
Lines 57529 57639 +110
========================================
+ Hits 48088 48180 +92
- Misses 9441 9459 +18
Continue to review full report at Codecov.
|
cc @sunchao I forgot this is not merged yet, I need to use this when implementing dictionary and scalar arithmetic kernels. |
Unrelated error:
|
arrow/src/compute/kernels/arity.rs
Outdated
} | ||
|
||
/// A helper function that applies an unary function to a dictionary array with primitive value type. | ||
fn unary_dict<K, F, T>(array: &DictionaryArray<K>, op: F) -> Result<PrimitiveArray<T>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be public so it can be used by other mods like arithmetic.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They will use unary_dyn
directly.
arrow/src/compute/kernels/arity.rs
Outdated
.as_any() | ||
.downcast_ref::<$value_ty>() | ||
.unwrap() | ||
.take_iter_unchecked($array.keys_iter()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is it possible to directly apply the op
on dictionary values? if values are large strings, the current approach will need to first decode the dictionary and convert it to a "plain" array, and then apply the op
to each value in there, which is expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it too, but didn't try. Let me try if it is feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember it. It is feasible, but it has one con. Because unary_dict
and unary_dyn
must return ArrayRef
, the compiler cannot infer T
so the caller must specify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change in latest commit. You can see if this is okay.
|
||
let values = dict_values | ||
.iter() | ||
.map(|v| v.map(|value| op(value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the clippy complains about this
error: redundant closure
--> arrow/src/compute/kernels/arity.rs:101:24
|
101 | .map(|v| v.map(|value| op(value)))
| ^^^^^^^^^^^^^^^^^ help: replace the closure with the function itself: `op`
|
But I cannot follow the suggestion. The compiler fails to compile because op
cannot moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I tried the same too - this should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good!
T: ArrowPrimitiveType, | ||
F: Fn(T::Native) -> T::Native, | ||
{ | ||
let dict_values = array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
let values = dict_values | ||
.iter() | ||
.map(|v| v.map(|value| op(value))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I tried the same too - this should be fine.
|
||
let mut data = ArrayData::builder(array.data_type().clone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another question: similar to unary
, do we plan to support different output value type than T
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it. I think for mostly usage in query execution, input/output types are the same for unary
. A difficulty to specify output type for unary_dict
is, unlike unary
where we can bind output type on the produced PrimitiveArray<O>
and F: Fn(T::Native) -> T::Native
. In unary_dict
, the output type O
cannot be bound on the produce dictionary array. So the compiler cannot infer it. Then we need to specify O
when invoking unary_dict
/unary_dyn
. I feel it is harder to use. So if we only use it for same input/output types, I just leave it without a separate output type parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @sunchao |
Which issue does this PR close?
Closes #1989.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?