Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nullif incorrectly calculates null_count, sometimes panics with substraction overflow error #3579

Closed
albel727 opened this issue Jan 20, 2023 · 4 comments · Fixed by #3590
Closed
Assignees
Labels
arrow Changes to the arrow crate bug

Comments

@albel727
Copy link

Describe the bug
nullif(left, right) incorrectly calculates null_count for the result array, whenever left doesn't have a null_buffer and has len % 64 == 0. It can even panic, if there are less than 64 true values in right.

To Reproduce

    let n = 64;
    let left = Int32Array::from((0..n as i32).into_iter().collect::<Vec<_>>());
    let right = BooleanArray::from((0..n).into_iter().map(|_| None).collect::<Vec<_>>());
    arrow_select::nullif::nullif(&left, &right);

Expected behavior
It works.

Actual behavior
It panics with 'attempt to subtract with overflow' at this line:

let null_count = len - valid_count;

Additional context

The problem evaded fixes #3034 and #3263, which were incomplete. The wrong calculation occurs in these lines:

let buffer = bitwise_unary_op_helper(&right, right_offset, len, |b| {
let t = !b;
valid_count += t.count_ones() as usize;
t
});
// We need to compensate for the additional bits read from the end
let remainder_len = len % 64;
if remainder_len != 0 {
valid_count -= 64 - remainder_len
}

The reason it is wrong is the un-intuitive, if not to say wrong, behavior of bitwise_unary_op_helper() and friends. They're calling op() unconditionally on the remainder bits, even if there are none:

let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
let rem = op(left_chunks.remainder_bits());
// we are counting its starting from the least significant bit, to to_le_bytes should be correct
let rem = &rem.to_le_bytes()[0..remainder_bytes];
result.extend_from_slice(rem);

It doesn't make a difference for the produced output buffer, because it is then just extended with slice of 0 bytes, i.e. remains unaffected. But it does matter for FnMut closures with side effects, such as the one found in nullif, which, as a result of this behavior, adds an excess of 64 to valid_count, when there are no remainder bits, i.e. bit length is a multiple of 64.

The quick fix is to do valid_count -= 64 - remainder_len; in nullif() unconditionally, i.e. just remove the if remainder_len != 0 { condition around it. It was clearly written in assumption, that bitwise_unary_op_helper() doesn't call op() in excess, when there are no remainder bits.

A better fix could have been to avoid making excess op() calls in bitwise_unary_op_helper() and friends, but since they're public, it could lead to bugs in external consumers, which might have come to rely on this weird behavior.

In any case, I suggest at least checking whether other usages of bitwise_unary_op_helper() and friends
in arrow-rs make the same incorrect assumption. Temporarily changing the type of op parameter from FnMut to Fn and looking at the compilation error fallout should point out all suspicious places.

@albel727 albel727 added the bug label Jan 20, 2023
@albel727
Copy link
Author

Just in case, here's the code that I used to validate that the quick fix works:

    std::panic::set_hook(Box::new(|_info| { /* silence panics */ }));

    for n in 0..=128+64 {
        let left = Int32Array::from((0..n as i32).into_iter().collect::<Vec<_>>());

        for somes in 0..=n {
            for trues in 0..=somes {
                let right = BooleanArray::from((0..n).into_iter().map(|i| {
                    Some(i < trues).filter(|_| i < somes)
                }).collect::<Vec<_>>());

                let ok = std::panic::catch_unwind(|| {
                    let arr = arrow_select::nullif::nullif(&left, &right).unwrap();
                    let arr: &Int32Array = arrow_array::cast::as_primitive_array(&arr);
                    arrow_array::Array::data(arr).validate_full().unwrap();
                }).is_ok();

                if !ok {
                    println!("{n} {somes} {trues}");
                }
            }
        }
    }

@tustvold
Copy link
Contributor

Thank you for the detailed report, makes sense to me. The change to FnMut was relatively recent so I think this should be the only impacted location, but I will verify.

Bit embarrassing this kernel has taken quite so many iterations to fix 😅

@tustvold tustvold self-assigned this Jan 22, 2023
@tustvold
Copy link
Contributor

tustvold commented Jan 23, 2023

A potential bug I found whilst testing this, dating from 2020 - #3589

Edit: I was mistaken

tustvold added a commit to tustvold/arrow-rs that referenced this issue Jan 23, 2023
tustvold added a commit that referenced this issue Jan 24, 2023
* Fix nullif null count (#3579)

* Clippy
@tustvold tustvold added the arrow Changes to the arrow crate label Jan 27, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #3590

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants