test(arrow-ord): remove duplicate logic and make it easier to add tests #6913

rluvaton · 2024-12-26T17:45:32Z

Rationale for this change

While working on #6911 and I started to write tests, I found out I just duplicated the logic, and it was messier, so cleaning up to make it easier to create new tests.

Background: all sort arrays test helper were given 2 vectors with native rust types (e.g. Vec<Option<bool>>), one for the input and the other for the expected output.

and the sort test helper convert it to the input and expected arrays to Arrow arrays (e.g. BooleanArray), and then call sort (or sort_limit depend on the input) and assert that the expected match the sorted.

this is done to help make the test simpler by just providing what the actual arrow array represent without messing with converting to arrow type.

the problem is that some Rust types can have multiple Arrow types, for example Vec<Option<Vec<u8>>> can be BinaryArray or LargeBinaryArray or FixedSizeBinaryArray or BinaryViewArray (same for lists and string).

in that case each sort helper function convert the input data to some of the arrow type (like FixedSizeBinaryArray) or to multiple arrow type (BinaryArray and LargeBinaryArray)

and then do the same as described above.

for the cases that fixed_length was provided it will test FixedSizeBinaryArray

The problem with this is that there are a lot of duplication and adding tests to new type is just adding a lot of boilerplate instead of just starting to write code

What changes are included in this PR?

So instead of this approach I created a single test_sort_array function that is given the input, expected and a function to convert that rust input type (e.g. Vec<Option<Vec<u8>>> like explained above) to all variants from that type (e.g. BinaryArray and LargeBinaryArray and FixedSizeBinaryArray and BinaryViewArray)

then created for each test type a new helper to convert that rust type to arrow type

for creating binary array I returned FixedSizeBinaryArray if all element have the same size, this will no allow to avoid only testing fixed or variable length types.

and moved all the build array helpers to sub module for more organize code.

But why do I pass a function to build the array and not using Rust trait system? this will surely be cleaner as you can have the actual function implementation hidden behind the input type and it would make the test cleaner.

You (basically my other self) are right. The problem is that I tried and it can't work without implementing for each primitive type manually (I think), because let's see the following example:

Let's say I have a trait called ConvertToArray

trait ConvertToArrays {
	fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef>;
}

and then I implement it for primitive type to create primitive array:

impl<T> ConvertToArrays for T::Native
	where T: ArrowPrimitiveType,
	PrimitiveArray<T>: From<Vec<Option<T::Native>>>
{
	fn convert_to_array(input: Vec<Option<Self>>) -> Vec<ArrayRef> {
		vec![Arc::new(PrimitiveArray::<T>::from(data)) as ArrayRef]
	}
}

all good and it works. the problem is I can't implement it for bool when I want to test sorting BooleanArray because ArrowPrimitiveType can be implemented in the future for bool and this would cause duplicate implementation. and for that reason Rust disallow that

Are there any user-facing changes?

nope, only tests

alamb · 2024-12-29T14:21:52Z

Thank you for these PRs @rluvaton -- I know we are seriously behind on PR review in arrow-rs. I hope that we will have more capacity after next week. We just need to find more people willing to help out with reviews

rluvaton · 2024-12-29T14:23:46Z

I'm willing to help, although currently my codebase knowledge is not the greatest but I think it's getting better

alamb · 2024-12-29T14:44:04Z

I'm willing to help, although currently my codebase knowledge is not the greatest but I think it's getting better

Thank you 🙏 -- the most helpful thing would be to review outstanding PRs and ensure the description is understandable, the code is commented and covered with tests, and you think the code does what the description says

If you find such PRs, feel free to @ mention me and I'll try and give it a look asap

Jefffrey

I like the idea; I've left some suggestions to see if we can simplify the code even further.

Jefffrey · 2025-01-03T08:47:23Z

arrow-ord/src/sort.rs

+    fn test_sort_arrays<
+        ListItemType: Clone,
+        IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>,
+    >(
+        into_lists_fn: IntoListFn,
+        data: Vec<Option<ListItemType>>,
+        options: Option<SortOptions>,
+        limit: Option<usize>,
+        expected_data: Vec<Option<ListItemType>>,
+    ) {


Suggested change

fn test_sort_arrays<

ListItemType: Clone,

IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>,

>(

into_lists_fn: IntoListFn,

data: Vec<Option<ListItemType>>,

options: Option<SortOptions>,

limit: Option<usize>,

expected_data: Vec<Option<ListItemType>>,

) {

fn test_sort_arrays<ListItemType, IntoListFn>(

into_lists_fn: IntoListFn,

data: Vec<Option<ListItemType>>,

options: Option<SortOptions>,

limit: Option<usize>,

expected_data: Vec<Option<ListItemType>>,

) where

ListItemType: Clone,

IntoListFn: Fn(Vec<Option<ListItemType>>) -> Vec<ArrayRef>,

{

Thoughts on using where bound here to make it a bit more readable?

thanks, applied

I think this was missed from the changes you applied

arrow-ord/src/sort.rs

Jefffrey · 2025-01-04T03:38:29Z

There's some CI errors to look into.

Beyond that, I'm confused by some of the reasoning provided in the body:

But why do I pass a function to build the array and not using Rust trait system? this will surely be cleaner as you can have the actual function implementation hidden behind the input type and it would make the test cleaner.

You (basically my other self) are right. The problem is that I tried and it can't work without implementing for each primitive type manually (I think), because let's see the following example:

...

all good and it works. the problem is I can't implement it for bool when I want to test sorting BooleanArray because ArrowPrimitiveType can be implemented in the future for bool and this would cause duplicate implementation. and for that reason Rust disallow that

I find it unlikely that ArrowPrimitiveType will be implemented for bool in the future, and even if it does I'm sure there'll be other problems such that this test code breaking is one of the smaller issues that would follow. But even disregarding that, none of the tests you've modified are for BooleanArrays, so I'm not sure it should be used as an example of why a trait based approach might not work.

Of course if a trait based approach is more verbose/complex than the current function based approach I can understand why this approach was chosen.

alamb · 2025-01-10T19:59:58Z

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

test(arrow-ord): remove duplicate logic and make it easier to add tests

fc7c8f6

github-actions bot added the arrow Changes to the arrow crate label Dec 26, 2024

rluvaton added 2 commits December 26, 2024 18:03

fix failing test that cannot infer the fixed size length

7d97c61

fix failed test

3f1f7a7

Jefffrey reviewed Jan 3, 2025

View reviewed changes

apply code review changed

253c99a

alamb marked this pull request as draft January 10, 2025 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(arrow-ord): remove duplicate logic and make it easier to add tests #6913

test(arrow-ord): remove duplicate logic and make it easier to add tests #6913

rluvaton commented Dec 26, 2024 •

edited

Loading

alamb commented Dec 29, 2024

rluvaton commented Dec 29, 2024

alamb commented Dec 29, 2024

Jefffrey left a comment

Jefffrey Jan 3, 2025

rluvaton Jan 3, 2025

Jefffrey Jan 4, 2025

Jefffrey commented Jan 4, 2025

alamb commented Jan 10, 2025

test(arrow-ord): remove duplicate logic and make it easier to add tests #6913

Are you sure you want to change the base?

test(arrow-ord): remove duplicate logic and make it easier to add tests #6913

Conversation

rluvaton commented Dec 26, 2024 • edited Loading

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb commented Dec 29, 2024

rluvaton commented Dec 29, 2024

alamb commented Dec 29, 2024

Jefffrey left a comment

Choose a reason for hiding this comment

Jefffrey Jan 3, 2025

Choose a reason for hiding this comment

rluvaton Jan 3, 2025

Choose a reason for hiding this comment

Jefffrey Jan 4, 2025

Choose a reason for hiding this comment

Jefffrey commented Jan 4, 2025

alamb commented Jan 10, 2025

rluvaton commented Dec 26, 2024 •

edited

Loading