Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take kernel on List array introduces nulls instead of empty lists #3471

Closed
jonmmease opened this issue Jan 5, 2023 · 2 comments · Fixed by #3473
Closed

take kernel on List array introduces nulls instead of empty lists #3471

jonmmease opened this issue Jan 5, 2023 · 2 comments · Fixed by #3473
Labels
arrow Changes to the arrow crate bug

Comments

@jonmmease
Copy link
Contributor

Describe the bug
In looking into apache/datafusion#4828, I came across what I think is an issue with the take kernel when operating on List arrays. It seems that when a taken element of the list array is an empty list, the value becomes null in the result of take.

I think this is a bug, but let me know if it's the intended behavior.

To Reproduce
Here is an example that stats with the list array: [[0], [], [1]] and takes elements [0, 1, 2].

use arrow::array::{Int32Array, Int32Builder, ListBuilder};
use arrow::compute::take;

#[test]
fn test_take() {
    let mut val_builder = Int32Builder::new();
    let mut list_builder = ListBuilder::new(val_builder);
    list_builder.values().append_value(0);
    list_builder.append(true);
    list_builder.append(true);
    list_builder.values().append_value(1);
    list_builder.append(true);
    let list = list_builder.finish();

    println!("list: {:?}", list);
    let indices = Int32Array::from(vec![0, 1, 2]);
    let taken = take(&list, &indices, None).unwrap();

    println!("taken: {:?}", taken);
}

output

list: ListArray
[
  PrimitiveArray<Int32>
[
  0,
],
  PrimitiveArray<Int32>
[
],
  PrimitiveArray<Int32>
[
  1,
],
]

taken: ListArray
[
  PrimitiveArray<Int32>
[
  0,
],
  null,
  PrimitiveArray<Int32>
[
  1,
],
]

Expected behavior
I would expect the result to be the same as the input ([[0], [], [1]]), but instead the result is [[0], null, [1].

Additional context
It looks like this behavior is the result of

if window[0] == window[1] {
// offsets are equal, slot is null
bit_util::unset_bit(null_slice, i);
null_count += 1;
}

@tustvold
Copy link
Contributor

tustvold commented Jan 6, 2023

Agreed this behaviour is wrong, I'd be happy to review a fix

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #3473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants