-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overflow-checking variant of arithmetic scalar kernels #2650
Conversation
arrow/src/compute/kernels/arity.rs
Outdated
// ~60% speedup | ||
// Soundness | ||
// `values` is an iterator with a known size because arrays are sized. | ||
let buffer = unsafe { Buffer::from_trusted_len_iter(values.into_iter()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_from_trusted_len_iter would allow avoiding collecting into a Vec
/// | ||
/// This detects overflow and returns an `Err` for that. For an non-overflow-checking variant, | ||
/// use `divide_scalar` instead. | ||
pub fn divide_scalar_checked<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the value of a separate checked scalar kernel for division, given it only elides a single check of the scalar divisor. I would be tempted to leave it out for now, at least until #2647 is resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've thought about it and wondered maybe #2647 can be resolved quickly. 😄
Let me remove division first.
arrow/src/compute/kernels/arity.rs
Outdated
@@ -83,6 +84,41 @@ where | |||
PrimitiveArray::<O>::from(data) | |||
} | |||
|
|||
/// A overflow-checking variant of `unary`. | |||
pub(crate) fn unary_checked<I, F, O>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be try_unary
and it's general method for primitive array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to rename it to try_unary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is to align with the method unary
.
This is the reason I give the suggestion like https://github.com/apache/arrow-rs/pull/2650/files#r963221429
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean to "align with the method unary
". The F
of unary
doesn't return Option
or Result
.
Based on these comments, let me try to guess what you are suggesting, are you suggesting to change returning type of F
function parameter to Result
, move the overflow ArrowError::ComputeError
to the arithmetic scalar kernels, to make this unary_checked
not only for the these arithmetic kernels?
I'm okay for the change, but where do you think the unary_checked
will also be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confused comments for your PR.
In the pr #2661, the refactor also need a function which can convert the
primitivearray
to result< primitivearray>
I think the unary_checked
or try_unary
can be used like unary
where the result need to be Result
.
arrow/src/compute/kernels/arity.rs
Outdated
O: ArrowPrimitiveType, | ||
F: Fn(I::Native) -> Option<O::Native>, | ||
I::Native: ArrowNativeTypeOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O: ArrowPrimitiveType, | |
F: Fn(I::Native) -> Option<O::Native>, | |
I::Native: ArrowNativeTypeOp, | |
O: ArrowPrimitiveType, | |
F: Fn(I::Native) -> Result<O::Native>, | |
I::Native: ArrowNativeTypeOp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?? There is a reason that the return type of F
is Option
. It is not arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of add_check
is Option
, you use the Option
as the result of F
.
But the in the below logic, you will check the Option.
If the option is None, will return a error.
arrow/src/compute/kernels/arity.rs
Outdated
let values = array.values().iter().map(|v| { | ||
let result = op(*v); | ||
if let Some(r) = result { | ||
Ok(r) | ||
} else { | ||
// Overflow | ||
Err(ArrowError::ComputeError(format!( | ||
"Overflow happened on: {:?}", | ||
*v | ||
))) | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let values = array.values().iter().map(|v| { | |
let result = op(*v); | |
if let Some(r) = result { | |
Ok(r) | |
} else { | |
// Overflow | |
Err(ArrowError::ComputeError(format!( | |
"Overflow happened on: {:?}", | |
*v | |
))) | |
} | |
}); | |
let values = array.values().iter().map(|v| { | |
op(*v) | |
}); |
{ | ||
Ok(unary(array, |value| value + scalar)) | ||
unary_checked(array, |value| value.add_checked(scalar)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this https://github.com/apache/arrow-rs/pull/2650/files#r963221429
you can define characteristic error message for this closures
arrow/src/compute/kernels/arity.rs
Outdated
F: Fn(I::Native) -> Option<O::Native>, | ||
I::Native: ArrowNativeTypeOp, | ||
{ | ||
let values = array.values().iter().map(|v| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct, as it will evaluate for null slots which might have arbitrary values. I think you have to consult the null mask...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2666 contains a more "correct" version of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the into_primitive_array_data
will add the null mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default bytes or '0' will be inserted into the null slot.
The default value or '0' may cause fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The null slot may have any value, not just 0
. Consider the case where a null is added to a non-null value of 200
, the resulting null slot will now have a value 200 + whatever was in the null slot
. The only guarantee is it is uninitalized, the actual value is arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good point. I was thought that into_primitive_array_data
adds the null buffer back to the result array like unary
does. However, I missed that point that value on null slots might cause wrongly failure on calling op
here.
I will rebase this once #2666 is merged. |
@tustvold Updated to use |
Thanks. |
Benchmark runs are scheduled for baseline = e646ae8 and contender = e1f8ed8. e1f8ed8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2651.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?