You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when a nested NullArray is written to parquet the definition levels are such that not all values are actually null. This is currently masked by bugs that prevent reading such arrays (fixes in flight) and also because NullArrayReader ignores any value data that may be present.
To Reproduce
#[test]
fn foo() {
let null_field = Field::new("item", DataType::Null, true);
let list_field = Field::new("emptylist", DataType::List(Box::new(null_field)), true);
let schema = Schema::new(vec![list_field]);
// Build [[], null, [null, null]]
let a_values = NullArray::new(2);
let a_value_offsets = arrow::buffer::Buffer::from(&[0, 0, 0, 2].to_byte_slice());
let a_list_data = ArrayData::builder(DataType::List(Box::new(Field::new(
"item",
DataType::Null,
true,
))))
.len(3)
.add_buffer(a_value_offsets)
.null_bit_buffer(Buffer::from(vec![0b00000101]))
.add_child_data(a_values.data().clone())
.build()
.unwrap();
let a = ListArray::from(a_list_data);
assert_eq!(a.is_valid(0), true);
assert_eq!(a.is_valid(1), false);
assert_eq!(a.is_valid(2), true);
assert_eq!(a.value(0).len(), 0);
assert_eq!(a.value(2).len(), 2);
assert_eq!(a.value(2).null_count(), 2);
let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap();
let file = File::create("temp.parquet").unwrap();
let mut writer = ArrowWriter::try_new(file, batch.schema(), None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
}
Then read the file using duckdb
duckdb.query(f"select * from 'temp.parquet'").fetchall()
[(None,), (None,), ([0, 0],)]
We mysteriously have a null array containing non-null data 😱
Expected behavior
>>> duckdb.query(f"select * from 'temp.parquet'").fetchall()
[(None,), (None,), ([None, None],)]
Additional context
This is likely to be a consequence of a somewhat surprising quirk of NullArrays which is that they don't actually contain a null bitmask, as they don't contain any buffers at all.
The text was updated successfully, but these errors were encountered:
Describe the bug
Currently when a nested NullArray is written to parquet the definition levels are such that not all values are actually null. This is currently masked by bugs that prevent reading such arrays (fixes in flight) and also because
NullArrayReader
ignores any value data that may be present.To Reproduce
Then read the file using duckdb
We mysteriously have a null array containing non-null data 😱
Expected behavior
Additional context
This is likely to be a consequence of a somewhat surprising quirk of NullArrays which is that they don't actually contain a null bitmask, as they don't contain any buffers at all.
The text was updated successfully, but these errors were encountered: