Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_dyn and min_dyn for max/min for dictionary array #2585

Merged
merged 5 commits into from
Aug 25, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 107 additions & 1 deletion arrow/src/compute/kernels/aggregate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,68 @@ where
}
}

/// Returns the min of values in the array.
pub fn min_dyn<T, A: ArrayAccessor<Item = T::Native>>(array: A) -> Option<T::Native>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the _dyn kernels take a trait object, these don't appear to. Perhaps we could choose a different name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, min_of_array?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can't just have a single min kernel for all types? We currently have separate implementations for strings, primitives, etc... which is no longer needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that we have separate min for different type of arrays because there was no common accessor previously.

But the min for primitive array, it has simd and non simd versions. I'm hesitant to replace them with this ArrayAccessor version. Do you think it is okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lets press forward with this PR as is, but renamed to min_array, and I'll write up a ticket to look into this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to min_array, max_array and sum_array

where
T: ArrowNumericType,
T::Native: ArrowNativeType,
{
min_max_dyn_helper::<T, A, _, _>(
array,
|a, b| (!is_nan(*a) & is_nan(*b)) || a < b,
min,
)
}

/// Returns the max of values in the array.
pub fn max_dyn<T, A: ArrayAccessor<Item = T::Native>>(array: A) -> Option<T::Native>
viirya marked this conversation as resolved.
Show resolved Hide resolved
where
T: ArrowNumericType,
T::Native: ArrowNativeType,
{
min_max_dyn_helper::<T, A, _, _>(
array,
|a, b| (is_nan(*a) & !is_nan(*b)) || a > b,
max,
)
}

fn min_max_dyn_helper<T, A: ArrayAccessor<Item = T::Native>, F, M>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn min_max_dyn_helper<T, A: ArrayAccessor<Item = T::Native>, F, M>(
fn min_max_array_helper<T, A: ArrayAccessor<Item = T::Native>, F, M>(

array: A,
cmp: F,
m: M,
) -> Option<T::Native>
where
T: ArrowNumericType,
F: Fn(&T::Native, &T::Native) -> bool,
M: Fn(&PrimitiveArray<T>) -> Option<T::Native>,
{
match array.data_type() {
DataType::Dictionary(_, _) => {
let null_count = array.null_count();

if null_count == array.len() {
return None;
}

let mut has_value = false;
let mut n = T::default_value();
let iter = ArrayIter::new(array);
iter.into_iter().for_each(|value| {
if let Some(value) = value {
if !has_value || cmp(&value, &n) {
has_value = true;
n = value;
}
}
});

Some(n)
}
_ => m(as_primitive_array(&array)),
}
}

/// Returns the sum of values in the primitive array.
///
/// Returns `None` if the array is empty or only contains null values.
Expand Down Expand Up @@ -656,7 +718,7 @@ mod tests {
use super::*;
use crate::array::*;
use crate::compute::add;
use crate::datatypes::{Int32Type, Int8Type};
use crate::datatypes::{Float32Type, Int32Type, Int8Type};

#[test]
fn test_primitive_array_sum() {
Expand Down Expand Up @@ -1058,4 +1120,48 @@ mod tests {
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert!(sum_dyn::<Int8Type, _>(array).is_none());
}

#[test]
fn test_max_min_dyn() {
let values = Int8Array::from_iter_values([10_i8, 11, 12, 13, 14, 15, 16, 17]);
let keys = Int8Array::from_iter_values([2_i8, 3, 4]);

let dict_array = DictionaryArray::try_new(&keys, &values).unwrap();
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert_eq!(14, max_dyn::<Int8Type, _>(array).unwrap());

let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert_eq!(12, min_dyn::<Int8Type, _>(array).unwrap());

let a = Int32Array::from(vec![1, 2, 3, 4, 5]);
assert_eq!(5, max_dyn::<Int32Type, _>(&a).unwrap());
assert_eq!(1, min_dyn::<Int32Type, _>(&a).unwrap());

let keys = Int8Array::from(vec![Some(2_i8), None, Some(7)]);
let dict_array = DictionaryArray::try_new(&keys, &values).unwrap();
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert_eq!(17, max_dyn::<Int8Type, _>(array).unwrap());
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert_eq!(12, min_dyn::<Int8Type, _>(array).unwrap());

let keys = Int8Array::from(vec![None, None, None]);
let dict_array = DictionaryArray::try_new(&keys, &values).unwrap();
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert!(max_dyn::<Int8Type, _>(array).is_none());
let array = dict_array.downcast_dict::<Int8Array>().unwrap();
assert!(min_dyn::<Int8Type, _>(array).is_none());
}

#[test]
fn test_max_min_dyn_nan() {
let values = Float32Array::from(vec![5.0_f32, 2.0_f32, f32::NAN]);
let keys = Int8Array::from_iter_values([0_i8, 1, 2]);

let dict_array = DictionaryArray::try_new(&keys, &values).unwrap();
let array = dict_array.downcast_dict::<Float32Array>().unwrap();
assert!(max_dyn::<Float32Type, _>(array).unwrap().is_nan());

let array = dict_array.downcast_dict::<Float32Array>().unwrap();
assert_eq!(2.0_f32, min_dyn::<Float32Type, _>(array).unwrap());
}
}