-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for primitive and boolean take kernel for nullable indices with an offset #509
Fix for primitive and boolean take kernel for nullable indices with an offset #509
Conversation
Codecov Report
@@ Coverage Diff @@
## master #509 +/- ##
==========================================
+ Coverage 82.64% 82.76% +0.12%
==========================================
Files 165 165
Lines 45703 45724 +21
==========================================
+ Hits 37769 37845 +76
+ Misses 7934 7879 -55
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the logic and the tests carefully and this looks great to me. Thank you @jhorstmann
FYI @ritchie46
@@ -516,7 +522,7 @@ where | |||
nulls = match indices.data_ref().null_buffer() { | |||
Some(buffer) => Some(buffer_bin_and( | |||
buffer, | |||
0, | |||
indices.offset(), | |||
&null_buf.into(), | |||
0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that this 0
is due to the fact that null_buf
was constructed via
let mut null_buf = MutableBuffer::new(num_byte).with_bitset(num_byte, true);
in the same else
clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, null_buf is newly constructed and initialized starting from 0, while the first buffer and offset pair are coming from the indices array which might have a non-0 offset.
There was a proposal before to push the offsets down into all buffers instead of storing it in the array. That way we wouldn't need to care about which array a buffer originally belonged too. But if we do that we'd still need a better abstraction for the validity bitmap and only access it via (chunked) iterators. I'm also not sure whether such a change would have an affect on FFI usage.
arrow/src/compute/kernels/take.rs
Outdated
); | ||
|
||
test_take_primitive_arrays_non_null::<Int64Type>( | ||
vec![0, 1, 2, 3, 4, 5, 6], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use different values here than the indices -- perhaps something like
vec![0, 1, 2, 3, 4, 5, 6], | |
vec![0, 10, 20, 30, 40, 50, 60], |
So it is clearer from just this context that just 20
and 30
should be returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done
Nice. Great that this is fixed before the 5.0 release! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good -- thanks again @jhorstmann
…n offset (#509) * Fix for take kernel with nullable indices and nonnull values * Fix for boolean take kernel when indices have an offset * Use different values for data so they cannot be confused with the indices
…n offset (#509) (#516) * Fix for take kernel with nullable indices and nonnull values * Fix for boolean take kernel when indices have an offset * Use different values for data so they cannot be confused with the indices Co-authored-by: Jörn Horstmann <[email protected]>
Which issue does this PR close?
Closes #502.
While implementing the fix I noticed a similar issue in the boolean take kernel which is now also fixed.
Rationale for this change
When reusing the validity buffer of an array for a newly created array the offsets of the original array have to be taken into account. The original array might have had an offset > 0 but the new array will usually start at 0 and so a slice of the validity buffer has to be taken.
What changes are included in this PR?
Are there any user-facing changes?
No