Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliced batch w/ bool column doesn't roundtrip through IPC #3496

Closed
crepererum opened this issue Jan 9, 2023 · 2 comments · Fixed by #3498
Closed

Sliced batch w/ bool column doesn't roundtrip through IPC #3496

crepererum opened this issue Jan 9, 2023 · 2 comments · Fixed by #3498
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate bug

Comments

@crepererum
Copy link
Contributor

Describe the bug
It seems that the IPC writer doesn't correctly account for sliced boolean columns.

To Reproduce

#[test]
fn encode_bools() {
    let val_bool_field = Field::new("val", DataType::Boolean, false);

    let schema = Arc::new(Schema::new(vec![val_bool_field]));

    let bools = BooleanArray::from(vec![true, false]);

    let batch =
        RecordBatch::try_new(Arc::clone(&schema), vec![Arc::new(bools)]).unwrap();
    let batch = batch.slice(1, 1);

    let mut writer = StreamWriter::try_new(Vec::<u8>::new(), &schema).unwrap();
    writer.write(&batch).unwrap();
    writer.finish().unwrap();
    let data = writer.into_inner().unwrap();

    let mut reader = StreamReader::try_new(Cursor::new(data), None).unwrap();
    let batch2 = reader.next().unwrap().unwrap();
    assert_eq!(batch, batch2);
}

This fails with:

thread 'writer::tests::encode_bools' panicked at 'assertion failed: `(left == right)`
  left: `RecordBatch { schema: Schema { fields: [Field { name: "val", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [BooleanArray
[
  false,
]], row_count: 1 }`,
 right: `RecordBatch { schema: Schema { fields: [Field { name: "val", data_type: Boolean, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [BooleanArray
[
  true,
]], row_count: 1 }`', arrow-ipc/src/writer.rs:1953:9

Expected behavior
Test passes.

Additional context
Tested on commit eae993f .

Originally observed in https://github.com/influxdata/influxdb_iox/issues/6515 . There we've also looked into the raw IPC data which lead to the conclusion that this is likely related to boolean columns (since other columns seem to work) and a writer issue (not a reader issue).

@crepererum crepererum added the bug label Jan 9, 2023
@alamb alamb added the arrow-flight Changes to the arrow-flight crate label Jan 9, 2023
@crepererum
Copy link
Contributor Author

I'm working on a fix.

crepererum added a commit to crepererum/arrow-rs that referenced this issue Jan 9, 2023
tustvold pushed a commit that referenced this issue Jan 10, 2023
* fix: bool IPC

Fixes #3496.

* refactor: simplify code

* refactor: `assert!` -> `assert_eq!`
@tustvold tustvold added the arrow Changes to the arrow crate label Jan 13, 2023
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #3498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants