Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make consistent behavior on zeros equality on floating point types #3510

Merged
merged 4 commits into from
Jan 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions arrow-ord/src/comparison.rs
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,8 @@ macro_rules! dyn_compare_utf8_scalar {
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn eq_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand All @@ -647,6 +649,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand All @@ -666,6 +670,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt_eq_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand All @@ -685,6 +691,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
Comment on lines +694 to +695
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated these docs to make the behavior clear to users.

/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand All @@ -704,6 +712,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt_eq_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand All @@ -723,6 +733,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn neq_dyn_scalar<T>(left: &dyn Array, right: T) -> Result<BooleanArray, ArrowError>
where
Expand Down Expand Up @@ -2098,6 +2110,8 @@ where
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2141,6 +2155,8 @@ pub fn eq_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray, Arrow
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2186,6 +2202,8 @@ pub fn neq_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray, Arro
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2231,6 +2249,8 @@ pub fn lt_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray, Arrow
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2278,6 +2298,8 @@ pub fn lt_eq_dyn(
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2322,6 +2344,8 @@ pub fn gt_dyn(left: &dyn Array, right: &dyn Array) -> Result<BooleanArray, Arrow
///
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
///
/// # Example
Expand Down Expand Up @@ -2366,6 +2390,8 @@ pub fn gt_eq_dyn(
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn eq<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2386,6 +2412,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn eq_scalar<T>(
left: &PrimitiveArray<T>,
Expand Down Expand Up @@ -2418,6 +2446,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn neq<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2438,6 +2468,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn neq_scalar<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2459,6 +2491,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2480,6 +2514,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt_scalar<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2501,6 +2537,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt_eq<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2522,6 +2560,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn lt_eq_scalar<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2543,6 +2583,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2564,6 +2606,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt_scalar<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2585,6 +2629,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt_eq<T>(
left: &PrimitiveArray<T>,
Expand All @@ -2606,6 +2652,8 @@ where
/// If `simd` feature flag is not enabled:
/// For floating values like f32 and f64, this comparison produces an ordering in accordance to
/// the totalOrder predicate as defined in the IEEE 754 (2008 revision) floating point standard.
/// Note that totalOrder treats positive and negative zeros are different. If it is necessary
/// to treat them as equal, please normalize zeros before calling this kernel.
/// Please refer to `f32::total_cmp` and `f64::total_cmp`.
pub fn gt_eq_scalar<T>(
left: &PrimitiveArray<T>,
Expand Down Expand Up @@ -5828,6 +5876,25 @@ mod tests {
assert_eq!(e, r);
}

#[test]
#[cfg(not(feature = "simd"))]
fn test_floating_zeros() {
let a = Float32Array::from(vec![0.0_f32, -0.0]);
let b = Float32Array::from(vec![-0.0_f32, 0.0]);

let result = eq_dyn(&a, &b).unwrap();
let excepted = BooleanArray::from(vec![false, false]);
assert_eq!(excepted, result);

let result = eq_dyn_scalar(&a, 0.0).unwrap();
let excepted = BooleanArray::from(vec![true, false]);
assert_eq!(excepted, result);

let result = eq_dyn_scalar(&a, -0.0).unwrap();
let excepted = BooleanArray::from(vec![false, true]);
assert_eq!(excepted, result);
}

#[derive(Debug)]
struct ToType {}

Expand Down
36 changes: 7 additions & 29 deletions arrow-ord/src/ord.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,32 +21,21 @@ use arrow_array::types::*;
use arrow_array::*;
use arrow_buffer::ArrowNativeType;
use arrow_schema::{ArrowError, DataType};
use num::Float;
use std::cmp::Ordering;

/// Compare the values at two arbitrary indices in two arrays.
pub type DynComparator = Box<dyn Fn(usize, usize) -> Ordering + Send + Sync>;

/// compares two floats, placing NaNs at last
fn cmp_nans_last<T: Float>(a: &T, b: &T) -> Ordering {
match (a.is_nan(), b.is_nan()) {
(true, true) => Ordering::Equal,
(true, false) => Ordering::Greater,
(false, true) => Ordering::Less,
_ => a.partial_cmp(b).unwrap(),
}
}

fn compare_primitives<T: ArrowPrimitiveType>(
left: &dyn Array,
right: &dyn Array,
) -> DynComparator
where
T::Native: Ord,
T::Native: ArrowNativeTypeOp,
{
let left: PrimitiveArray<T> = PrimitiveArray::from(left.data().clone());
let right: PrimitiveArray<T> = PrimitiveArray::from(right.data().clone());
Box::new(move |i, j| left.value(i).cmp(&right.value(j)))
Box::new(move |i, j| left.value(i).compare(right.value(j)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 regardless this is a good change

}

fn compare_boolean(left: &dyn Array, right: &dyn Array) -> DynComparator {
Expand All @@ -56,18 +45,6 @@ fn compare_boolean(left: &dyn Array, right: &dyn Array) -> DynComparator {
Box::new(move |i, j| left.value(i).cmp(&right.value(j)))
}

fn compare_float<T: ArrowPrimitiveType>(
left: &dyn Array,
right: &dyn Array,
) -> DynComparator
where
T::Native: Float,
{
let left: PrimitiveArray<T> = PrimitiveArray::from(left.data().clone());
let right: PrimitiveArray<T> = PrimitiveArray::from(right.data().clone());
Box::new(move |i, j| cmp_nans_last(&left.value(i), &right.value(j)))
}

fn compare_string<T>(left: &dyn Array, right: &dyn Array) -> DynComparator
where
T: OffsetSizeTrait,
Expand Down Expand Up @@ -197,8 +174,8 @@ pub fn build_compare(
(Int16, Int16) => compare_primitives::<Int16Type>(left, right),
(Int32, Int32) => compare_primitives::<Int32Type>(left, right),
(Int64, Int64) => compare_primitives::<Int64Type>(left, right),
(Float32, Float32) => compare_float::<Float32Type>(left, right),
(Float64, Float64) => compare_float::<Float64Type>(left, right),
(Float32, Float32) => compare_primitives::<Float32Type>(left, right),
(Float64, Float64) => compare_primitives::<Float64Type>(left, right),
(Decimal128(_, _), Decimal128(_, _)) => {
compare_primitives::<Decimal128Type>(left, right)
}
Expand Down Expand Up @@ -372,6 +349,7 @@ pub mod tests {
let cmp = build_compare(&array, &array).unwrap();

assert_eq!(Ordering::Less, (cmp)(0, 1));
assert_eq!(Ordering::Equal, (cmp)(1, 1));
}

#[test]
Expand All @@ -380,8 +358,8 @@ pub mod tests {

let cmp = build_compare(&array, &array).unwrap();

assert_eq!(Ordering::Equal, (cmp)(0, 1));
assert_eq!(Ordering::Equal, (cmp)(1, 0));
assert_eq!(Ordering::Less, (cmp)(0, 1));
assert_eq!(Ordering::Greater, (cmp)(1, 0));
Comment on lines +361 to +362
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_compare's behavior on zeros comparison is inconsistent with comparison kernels. Changed it to consistent.

}

#[test]
Expand Down