Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for primitive and boolean take kernel for nullable indices with an offset #509

Merged
merged 3 commits into from
Jun 30, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 88 additions & 3 deletions arrow/src/compute/kernels/take.rs
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,13 @@ where
// Soundness: `slice.map` is `TrustedLen`.
let buffer = unsafe { Buffer::try_from_trusted_len_iter(values)? };

Ok((buffer, indices.data_ref().null_buffer().cloned()))
Ok((
buffer,
indices
.data_ref()
.null_buffer()
.map(|b| b.bit_slice(indices.offset(), indices.len())),
))
}

// take implementation when both values and indices contain nulls
Expand Down Expand Up @@ -516,7 +522,7 @@ where
nulls = match indices.data_ref().null_buffer() {
Some(buffer) => Some(buffer_bin_and(
buffer,
0,
indices.offset(),
&null_buf.into(),
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that this 0 is due to the fact that null_buf was constructed via

        let mut null_buf = MutableBuffer::new(num_byte).with_bitset(num_byte, true);

in the same else clause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, null_buf is newly constructed and initialized starting from 0, while the first buffer and offset pair are coming from the indices array which might have a non-0 offset.

There was a proposal before to push the offsets down into all buffers instead of storing it in the array. That way we wouldn't need to care about which array a buffer originally belonged too. But if we do that we'd still need a better abstraction for the validity bitmap and only access it via (chunked) iterators. I'm also not sure whether such a change would have an affect on FFI usage.

indices.len(),
Expand Down Expand Up @@ -805,6 +811,24 @@ mod tests {
Ok(())
}

fn test_take_primitive_arrays_non_null<T>(
data: Vec<T::Native>,
index: &UInt32Array,
options: Option<TakeOptions>,
expected_data: Vec<Option<T::Native>>,
) -> Result<()>
where
T: ArrowPrimitiveType,
PrimitiveArray<T>: From<Vec<T::Native>>,
PrimitiveArray<T>: From<Vec<Option<T::Native>>>,
{
let output = PrimitiveArray::<T>::from(data);
let expected = Arc::new(PrimitiveArray::<T>::from(expected_data)) as ArrayRef;
let output = take(&output, index, options)?;
assert_eq!(&output, &expected);
Ok(())
}

fn test_take_impl_primitive_arrays<T, I>(
data: Vec<Option<T::Native>>,
index: &PrimitiveArray<I>,
Expand Down Expand Up @@ -876,6 +900,48 @@ mod tests {
.unwrap();
}

#[test]
fn test_take_primitive_nullable_indices_non_null_values_with_offset() {
let index =
UInt32Array::from(vec![Some(0), Some(1), Some(2), Some(3), None, None]);
let index = index.slice(2, 4);
let index = index.as_any().downcast_ref::<UInt32Array>().unwrap();

assert_eq!(
index,
&UInt32Array::from(vec![Some(2), Some(3), None, None])
);

test_take_primitive_arrays_non_null::<Int64Type>(
vec![0, 10, 20, 30, 40, 50],
&index,
None,
vec![Some(20), Some(30), None, None],
)
.unwrap();
}

#[test]
fn test_take_primitive_nullable_indices_nullable_values_with_offset() {
let index =
UInt32Array::from(vec![Some(0), Some(1), Some(2), Some(3), None, None]);
let index = index.slice(2, 4);
let index = index.as_any().downcast_ref::<UInt32Array>().unwrap();

assert_eq!(
index,
&UInt32Array::from(vec![Some(2), Some(3), None, None])
);

test_take_primitive_arrays::<Int64Type>(
vec![None, None, Some(20), Some(30), Some(40), Some(50)],
&index,
None,
vec![Some(20), Some(30), None, None],
)
.unwrap();
}

#[test]
fn test_take_primitive() {
let index = UInt32Array::from(vec![Some(3), None, Some(1), Some(3), Some(2)]);
Expand Down Expand Up @@ -1100,7 +1166,7 @@ mod tests {
}

#[test]
fn test_take_primitive_bool() {
fn test_take_bool() {
let index = UInt32Array::from(vec![Some(3), None, Some(1), Some(3), Some(2)]);
// boolean
test_take_boolean_arrays(
Expand All @@ -1111,6 +1177,25 @@ mod tests {
);
}

#[test]
fn test_take_bool_with_offset() {
let index =
UInt32Array::from(vec![Some(3), None, Some(1), Some(3), Some(2), None]);
let index = index.slice(2, 4);
let index = index
.as_any()
.downcast_ref::<PrimitiveArray<UInt32Type>>()
.unwrap();

// boolean
test_take_boolean_arrays(
vec![Some(false), None, Some(true), Some(false), None],
&index,
None,
vec![None, Some(false), Some(true), None],
);
}

fn _test_take_string<'a, K: 'static>()
where
K: Array + PartialEq + From<Vec<Option<&'a str>>>,
Expand Down