-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overflow-checking variants of arithmetic dyn kernels #2740
Add overflow-checking variants of arithmetic dyn kernels #2740
Conversation
cc @sunchao |
The CI failure looks unrelated. |
math_checked_op_dict | ||
) | ||
} | ||
DataType::Date32 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if the handling of Date32
and Date64
can be extracted as a separate function so it can be shared with add_dyn
, but feel free to ignore if this is not easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, let me leave it as it is now. I'm wonder if we should do checking/non-checking behavior on Data32/Date64.
@@ -522,67 +548,86 @@ macro_rules! typed_dict_math_op { | |||
}}; | |||
} | |||
|
|||
/// Helper function to perform math lambda function on values from two dictionary arrays, this | |||
/// version does not attempt to use SIMD explicitly (though the compiler may auto vectorize) | |||
macro_rules! math_dict_op { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This macro is unnecessary and can be inlined. Removed it.
arrow/src/compute/kernels/arity.rs
Outdated
if a.null_count() == 0 && b.null_count() == 0 { | ||
let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l, *r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This latest optimization cause a bit trouble on the change from PrimitiveArray
to ArrayAccessor
as function parameter.
I can only add one new trait ArrayValuesAccessor
to coordinate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rewrite this as external iteration, i.e. for i in 0..array.len()
this is easier for the compiler, tends to optimise better and is simpler to understand. MutableBuffer::push_unchecked will behave equivalently
The trusted len iterators are not necessary imo, and are probably best avoided in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Sounds good to do. So I can get rid of ArrayValuesAccessor
which is just needed to get values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move away from using trusted len iter, it's basically a hack and internal iteration will optimise better or the same
FYI @Dandandan
arrow/src/array/array.rs
Outdated
|
||
/// Returns a values accessor [`ArrayValuesAccessor`] for this [`ArrayAccessor`] if | ||
/// it supports. Returns [`None`] if it doesn't support accessing values directly. | ||
fn get_values_accessor(&self) -> Option<&dyn ArrayValuesAccessor<Item = Self::Item>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of the dyn indirection here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to return a Option<&[Self::Item]>
directly as the semantics look weird (i.e. some ArrayAccessor
doesn't provide values
). This is to return self as ArrayValuesAccessor
so the caller can call values
.
But I think I will remove it based on #2740 (comment).
arrow/src/compute/kernels/arity.rs
Outdated
if a.null_count() == 0 && b.null_count() == 0 { | ||
let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l, *r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rewrite this as external iteration, i.e. for i in 0..array.len()
this is easier for the compiler, tends to optimise better and is simpler to understand. MutableBuffer::push_unchecked will behave equivalently
The trusted len iterators are not necessary imo, and are probably best avoided in general.
/// This is similar to `math_op` as it performs given operation between two input primitive arrays. | ||
/// But the given operation can return `Err` if overflow is detected. For the case, this function | ||
/// returns an `Err`. | ||
fn math_checked_op<LT, RT, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem to be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for type bound only.
Changed the math_checked_op
to try_binary
, got:
error[E0284]: type annotations needed: cannot satisfy `<_ as native::ArrowPrimitiveType>::Native == i8`
--> arrow/src/compute/kernels/arithmetic.rs:848:21
|
848 | try_binary(left, right, |a, b| a.add_checked(b)).map(|a| Arc::new(a) as ArrayRef)
| ^^^^^^^^^^ cannot satisfy `<_ as native::ArrowPrimitiveType>::Native == i8`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not just a matter of providing the necessary type hint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wrapped in downcast_primitive_array
macro call. It seems no way to add type hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it is because the return type needs to be type hinted. Darn...
))); | ||
} | ||
/// Perform given operation on two `DictionaryArray`s. | ||
/// Returns an error if the two arrays have different value type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of this will be pretty terrible, and results in a huge amount of codegen. I'm not entirely sure of the use-case tbh... Perhaps it is worth exploring putting this behind a feature flag
Edit: I wouldn't be surprised if hydrating the dictionary to its values, performing the operation and casting back was faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, math_op_dict
? The change just inlines macro math_dict_op
into the function math_op_dict
. So I suppose you are not meaning the change but math_op_dict
itself?
This can be done in follow-up to optimize it. The current change diff is quite big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, the compile times are currently despair inducing and I have a chip on my shoulder about them 😅
arrow/src/compute/kernels/arity.rs
Outdated
buffer.append_n_zeroed(len); | ||
let slice = buffer.as_slice_mut(); | ||
|
||
for idx in 0..len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite it with a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we run the arithmetic benchmarks possibly?
arrow/src/compute/kernels/arity.rs
Outdated
// `values` is an iterator with a known size from a PrimitiveArray | ||
return Ok(unsafe { build_primitive_array(len, buffer, 0, None) }); | ||
let mut buffer = BufferBuilder::<O::Native>::new(len); | ||
buffer.append_n_zeroed(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be faster to reserve the correct capacity and use push_unchecked. Avoids zero allocating.
As written I suspect this is a performance regression as this if block is effectively identical to the one below now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if I get it correctly, I did something like:
for idx in 0..len {
unsafe {
buffer.push_unchecked(op(a.value_unchecked(idx), b.value_unchecked(idx))?);
};
}
But still see regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I roll back to the ArrayValuesAccessor
way? I benchmarked it and the performance keeps the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a brief play tomorrow, I would really like to avoid adding further APIs, especially ones that aren't a generalisable abstraction, if we can possibly avoid it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried another way to iterate the values from both side. But still the regression.
I think when we call value_unchecked
to get values iteratively, it is an indirect access pattern compared with values
where we simply process on a slice. It makes LLVM hard to optimize it as vectorized access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice iterators boil down to the same thing, in fact the external iteration is harder for llvm. Something else is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this appears to boil down to the whims of LLVMs inlining heuristics, try_from_trusted_len_iter doesn't appear to get fully inlined and therefore gets optimised properly (although I have a really hard time understanding the generated code). Potentially rustc is doing something to help LLVM here.
However, when the loop is defined in the function body of try_binary LLVM doesn't optimize it properly. If you split it out into a free function with inline(never), it optimises correctly and is actually marginally faster than the iterator.
Annoyingly using try_from_trusted_len_iter but doing something like (0..a.len()).map(|idx| ...)
to construct the iterator also doesn't work, unless split into a free function.
I would therefore suggest splitting the no null variant into a free function marked inline never, as this appears to work...
FYI @Dandandan as I wonder if we're running into this elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting...let me split it out to a free function and benchmark again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the suggested never inline function works.
/// This is similar to `math_op` as it performs given operation between two input primitive arrays. | ||
/// But the given operation can return `Err` if overflow is detected. For the case, this function | ||
/// returns an `Err`. | ||
fn math_checked_op<LT, RT, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not just a matter of providing the necessary type hint?
))); | ||
} | ||
/// Perform given operation on two `DictionaryArray`s. | ||
/// Returns an error if the two arrays have different value type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, the compile times are currently despair inducing and I have a chip on my shoulder about them 😅
{ | ||
math_dict_op!(left, right, op, PrimitiveArray<T>) | ||
let left = left.downcast_dict::<PrimitiveArray<T>>().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This possibly needs to check the actual type, currently it will panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called by a macro which matches the value types of both sides, so it should be matched. But let me add one check.
Sure - if we get a similar performance or better I've no problems with that. |
Agreed, iterators can be made to perform the same, but it requires a significant amount of trickery and is subject to the whims of LLVM. For the purposes of arrow kernels, its just unnecessary imo. https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb may also be of interest, though sadly it isnt't possible to implement try_for_each currently. |
Yea, checked kernels with no null values show obvious regression. |
arrow/src/compute/kernels/arity.rs
Outdated
O: ArrowPrimitiveType, | ||
F: Fn(A::Item, B::Item) -> Result<O::Native>, | ||
{ | ||
let mut buffer = MutableBuffer::new(len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length parameter for MutableBuffer::new
is in bytes, so I think this needs to be multiplied by O::get_byte_width()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed it. Thanks for catching it.
/// This is similar to `math_op` as it performs given operation between two input primitive arrays. | ||
/// But the given operation can return `Err` if overflow is detected. For the case, this function | ||
/// returns an `Err`. | ||
fn math_checked_op<LT, RT, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, it is because the return type needs to be type hinted. Darn...
Benchmark runs are scheduled for baseline = 3bf6eb9 and contender = 9599178. 9599178 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Thanks for review! |
Which issue does this PR close?
Closes #2739.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?