Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary operator produces incorrect result on arrays with resized null buffer #3061

Closed
richox opened this issue Nov 9, 2022 · 1 comment · Fixed by #3062
Closed

binary operator produces incorrect result on arrays with resized null buffer #3061

richox opened this issue Nov 9, 2022 · 1 comment · Fixed by #3062
Labels
arrow Changes to the arrow crate bug

Comments

@richox
Copy link
Contributor

richox commented Nov 9, 2022

Describe the bug
when an array contains a null buffer built from a resized BooleanBufferBuilder, and is later used in binary operators. the result is sometimes incorrect.

To Reproduce

use arrow::array::*;
use arrow::compute::*;
use arrow::datatypes::*;

fn main() {

    // arg1 = [null] * 13
    let mut data_buffer_builder = BufferBuilder::<i32>::new(13);
    data_buffer_builder.append_slice(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]);
    let data_buffer = data_buffer_builder.finish();

    let mut null_buffer_builder = BooleanBufferBuilder::new(16);
    null_buffer_builder.append_slice(&[false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true]);
    null_buffer_builder.resize(13);
    let null_buffer = null_buffer_builder.finish();
    println!("null_buffer.count_set_bits={}", null_buffer.count_set_bits());

    let arg1: Int32Array = ArrayDataBuilder::new(DataType::Int32)
        .len(13)
        .null_count(13)
        .buffers(vec![data_buffer])
        .null_bit_buffer(Some(null_buffer))
        .build()
        .unwrap()
        .into();

    // arg2 = [1, 2, ...,11, 12]
    let mut data_buffer_builder = BufferBuilder::<i32>::new(13);
    data_buffer_builder.append_slice(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]);
    let data_buffer = data_buffer_builder.finish();

    let arg2: Int32Array = ArrayDataBuilder::new(DataType::Int32)
        .len(13)
        .null_count(0)
        .buffers(vec![data_buffer])
        .null_bit_buffer(None)
        .build()
        .unwrap()
        .into();

    let result_dyn = add_dyn(&arg1, &arg2).unwrap();
    let result = result_dyn.as_any().downcast_ref::<Int32Array>().unwrap();

    println!("result.null_count={}", result.null_count());
    println!("result.len={}", result.len());
    println!("result={:?}", result);
}

Expected behavior
the above code should print null_buffer.count_set_bits=0 and result.null_count=13 since arg1 has 13 nulls. however it prints null_buffer.count_set_bits=3 and result.null_count=10.

Additional context
we met this issue when doing some arithmetic operations with datafusion on a parquet file. the parquet crate does resize the null buffer on last records of a row group and unexpectedly produces several incorrect records.

@alamb
Copy link
Contributor

alamb commented Nov 11, 2022

label_issue.py automatically added labels {'arrow'} from #3062

richox pushed a commit to kwai/blaze that referenced this issue Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants