-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure there is a single zero in the offsets buffer for an empty ListArray. #1620
Comments
I got this error: ---- ipc::reader::tests::read_generated_files_014 stdout ----
---- ipc::reader::tests::read_generated_files_014 stdout ----
thread 'ipc::reader::tests::read_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
---- ipc::reader::tests::read_generated_streams_014 stdout ----
thread 'ipc::reader::tests::read_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
---- ipc::writer::tests::read_and_rewrite_generated_files_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
---- ipc::writer::tests::read_and_rewrite_generated_streams_014 stdout ----
thread 'ipc::writer::tests::read_and_rewrite_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29 after deleting I don't know how data is generated in arrow-testing. |
@alamb Could you please help to recheck that whether empty array has empty offsets buffer in arrow-testing? |
Hi @HaoYang670. I looked at the C++ implementation and I think it does allow an empty offsets buffer for an empty array -- namely the validation code skips doing any checks if
I also don't know how this data was generated |
Does it mean that for empty (list array, binary array, and string array), both an empty offsets buffer and an offsets buffer with a single zero are allowed? If this is true, how could we implement this issue? |
But it seems like list array does not allow empty offsets: |
I think that code is for constructing a new array programatically. The validation code I think may be more permissive to support arrays that come in from other arrow implementations. Maybe it is time to ask on the [email protected] mailing list 🤔 I am out of ideas |
I will send an email |
Hey, there is one file in the test files that has no offset - it is the file for version 0.14 or something. My understanding is that previous versions of the (IPC) spec did not require offset and recent versions require them. I have a comment about this in arrow2. (I am curious about the mailing list discussion though; this is not very authoritative ^^) |
It seems like Arrow tends to be ambiguous in this case, and an empty buffer or a buffer with just a 0 in it both are valid. |
Personally, I prefer keeping the current implementation. |
Email link for cross reference: https://lists.apache.org/thread/w7g1zfqrjxx0bvrct0mt5zwxvdnc9nob |
Could we close this issue, or open it for more discussions? |
I think the conclusion is "we should keep the current implementation, as imperfect as it may be, for backwards compatibility reasons" Agree with closing -- anyone who disagrees please let me know and I'll reopen it Thanks for the effort @HaoYang670 |
Good idea checking the C++ version. The docs for binary layout also mention
which would mean there has to be a single zero in the offsets buffer for an empty
ListArray
.If this is a requirements it would be better to validate it when creating
ArrayData
. The code inArrayData::validate_each_offset
explicitly allow this case though:I'm more hesitant to change this now, maybe let's wait for some more eyes on this.
Originally posted by @jhorstmann in #1602 (comment)
The text was updated successfully, but these errors were encountered: