Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow-checking variant of arithmetic scalar kernels #2650

Merged
merged 5 commits into from
Sep 12, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 5, 2022

Which issue does this PR close?

Closes #2651.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 5, 2022
// ~60% speedup
// Soundness
// `values` is an iterator with a known size because arrays are sized.
let buffer = unsafe { Buffer::from_trusted_len_iter(values.into_iter()) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_from_trusted_len_iter would allow avoiding collecting into a Vec

///
/// This detects overflow and returns an `Err` for that. For an non-overflow-checking variant,
/// use `divide_scalar` instead.
pub fn divide_scalar_checked<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the value of a separate checked scalar kernel for division, given it only elides a single check of the scalar divisor. I would be tempted to leave it out for now, at least until #2647 is resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've thought about it and wondered maybe #2647 can be resolved quickly. 😄

Let me remove division first.

@@ -83,6 +84,41 @@ where
PrimitiveArray::<O>::from(data)
}

/// A overflow-checking variant of `unary`.
pub(crate) fn unary_checked<I, F, O>(
Copy link
Contributor

@liukun4515 liukun4515 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be try_unary and it's general method for primitive array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to rename it to try_unary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is to align with the method unary.
This is the reason I give the suggestion like https://github.com/apache/arrow-rs/pull/2650/files#r963221429

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean to "align with the method unary". The F of unary doesn't return Option or Result.

Based on these comments, let me try to guess what you are suggesting, are you suggesting to change returning type of F function parameter to Result, move the overflow ArrowError::ComputeError to the arithmetic scalar kernels, to make this unary_checked not only for the these arithmetic kernels?

I'm okay for the change, but where do you think the unary_checked will also be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confused comments for your PR.
In the pr #2661, the refactor also need a function which can convert the
primitivearray to result< primitivearray>

I think the unary_checked or try_unary can be used like unary where the result need to be Result.

Comment on lines 94 to 96
O: ArrowPrimitiveType,
F: Fn(I::Native) -> Option<O::Native>,
I::Native: ArrowNativeTypeOp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
O: ArrowPrimitiveType,
F: Fn(I::Native) -> Option<O::Native>,
I::Native: ArrowNativeTypeOp,
O: ArrowPrimitiveType,
F: Fn(I::Native) -> Result<O::Native>,
I::Native: ArrowNativeTypeOp,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? There is a reason that the return type of F is Option. It is not arbitrary.

Copy link
Contributor

@liukun4515 liukun4515 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of add_check is Option, you use the Option as the result of F.
But the in the below logic, you will check the Option.
If the option is None, will return a error.

Comment on lines 98 to 109
let values = array.values().iter().map(|v| {
let result = op(*v);
if let Some(r) = result {
Ok(r)
} else {
// Overflow
Err(ArrowError::ComputeError(format!(
"Overflow happened on: {:?}",
*v
)))
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values = array.values().iter().map(|v| {
let result = op(*v);
if let Some(r) = result {
Ok(r)
} else {
// Overflow
Err(ArrowError::ComputeError(format!(
"Overflow happened on: {:?}",
*v
)))
}
});
let values = array.values().iter().map(|v| {
op(*v)
});

{
Ok(unary(array, |value| value + scalar))
unary_checked(array, |value| value.add_checked(scalar))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this https://github.com/apache/arrow-rs/pull/2650/files#r963221429
you can define characteristic error message for this closures

F: Fn(I::Native) -> Option<O::Native>,
I::Native: ArrowNativeTypeOp,
{
let values = array.values().iter().map(|v| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct, as it will evaluate for null slots which might have arbitrary values. I think you have to consult the null mask...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2666 contains a more "correct" version of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the into_primitive_array_data will add the null mask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default bytes or '0' will be inserted into the null slot.

The default value or '0' may cause fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will add a null mask yes, but as the operation is fallible it can't be blindly called on slots without first checking those slots aren't null

Got it.

👍 @tustvold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null slot may have any value, not just 0. Consider the case where a null is added to a non-null value of 200, the resulting null slot will now have a value 200 + whatever was in the null slot. The only guarantee is it is uninitalized, the actual value is arbitrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I was thought that into_primitive_array_data adds the null buffer back to the result array like unary does. However, I missed that point that value on null slots might cause wrongly failure on calling op here.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2022

I will rebase this once #2666 is merged.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2022

@tustvold Updated to use try_unary.

@viirya viirya merged commit e1f8ed8 into apache:master Sep 12, 2022
@viirya
Copy link
Member Author

viirya commented Sep 12, 2022

Thanks.

@ursabot
Copy link

ursabot commented Sep 12, 2022

Benchmark runs are scheduled for baseline = e646ae8 and contender = e1f8ed8. e1f8ed8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add overflow-checking variant of arithmetic scalar kernels
4 participants