-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior #197
Comments
An update: manually audited all the uses of the rg -n -H --no-heading -e 'size_hint' $(git rev-parse --show-toplevel || pwd)
/Users/alamb/Software/arrow-rs/arrow/src/buffer/immutable.rs:308: let (lower, _) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:387: let (lower, _) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:433: let (_, upper) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:475: let (_, upper) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:524: let (_, upper) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:621: let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
/Users/alamb/Software/arrow-rs/arrow/src/buffer/mutable.rs:653: iterator.size_hint().0.saturating_add(7) / 8, //convert bit count to byte count, rounding up
/Users/alamb/Software/arrow-rs/arrow/src/array/array_primitive.rs:338: let (lower, _) = iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_primitive.rs:376: /// I.e. that `size_hint().1` correctly reports its length.
/Users/alamb/Software/arrow-rs/arrow/src/array/array_primitive.rs:384: let (_, upper) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_primitive.rs:959: let (_, upper_size_bound) = value_iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_list.rs:153: let (lower, _) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_binary.rs:236: let (_, data_len) = iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_binary.rs:1125: let (_, upper_size_bound) = value_iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/builder.rs:271: .size_hint()
/Users/alamb/Software/arrow-rs/arrow/src/array/builder.rs:777: .size_hint()
/Users/alamb/Software/arrow-rs/arrow/src/array/iterator.rs:68: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/array/iterator.rs:140: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/array/iterator.rs:214: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/array/iterator.rs:293: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/array/iterator.rs:370: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/util/trusted_len.rs:36: let (_, upper) = iterator.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/util/bit_chunk_iterator.rs:160: fn size_hint(&self) -> (usize, Option<usize>) {
/Users/alamb/Software/arrow-rs/arrow/src/array/array_boolean.rs:197: let (_, data_len) = iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_dictionary.rs:179: let (lower, _) = it.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_dictionary.rs:221: let (lower, _) = it.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_string.rs:174: let (_, data_len) = iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_string.rs:221: let (_, data_len) = iter.size_hint();
/Users/alamb/Software/arrow-rs/arrow/src/array/array_string.rs:548: let (_, upper_size_bound) = string_iter.size_hint(); My analysis is that any place where
For 2. above, the The only thing remaining for me to do on this issue is to verify the uses of the unsafe |
As a final follow up, I did verify the uses of |
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11862
As [~jorgecarleitao] says on apache/arrow#9588 (comment)
The (Rust) Iterator spec recommends, but does not require, that the iterator reports a correct length. Consumer that lead to undefined behavior from an incorrect size_hint are the causers of said undefined behavior.
The only case where consumers can trust the iterators' length is when the interator implement unsafe trait TrustedLen. Unfortunately, TrustedLen is still in unstable. For that reason, we have been exposing unsafe Buffer::from_trusted_len_iter and the like for those cases.
So the code should be updated to handle the case where the reported
size_hint
turns out to be incorrectThe text was updated successfully, but these errors were encountered: