Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out of bounds read in bit chunk iterator #416

Merged
merged 2 commits into from
Jun 8, 2021

Conversation

jhorstmann
Copy link
Contributor

Which issue does this PR close?

Closes #198.

Rationale for this change

The previous code could read a few bytes out of bounds. I could not cause this to trigger a segmentation fault, but miri also detected a problem and can now run the tests without errors.

What changes are included in this PR?

Are there any user-facing changes?

No

let next =
unsafe { std::ptr::read_unaligned(raw_data.add(index + 1)).to_le() };
let next = unsafe {
std::ptr::read_unaligned(raw_data.add(index + 1) as *const u8) as u64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix here is casting the pointer back to *u8 and reading only a single byte.

The other changes are a bit of a cleanup, the masking of next below should not be needed since it masked of exactly the bits that would be shifted out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not the remainder, don't we potentially need to read more than 8 bits? I.e. doesn't this index contain between 1 and 63 bits that need to be "merged" into current?

I get a feeling that this will ignore all bits after the 8th and less than 64. At least this is what I remember from fixing it in arrow2 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor ensures that the bit_offset is between 0..8, this means we need to be able to read unaligned u64, but need at most one additional byte. I'll add this as a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right 👍 Good thinking. Thanks for the clarification.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jhorstmann

I reviewed the code and ran test_iter_remainder_out_of_bounds under MIRI on my machine:

cargo +nightly miri test -p arrow -- test_iter_remainder_out_of_bounds

With the code change in this PR:

No failures are reported

Without the code change in this PR:

running 1 test
test util::bit_chunk_iterator::tests::test_iter_remainder_out_of_bounds ... error: Undefined Behavior: memory access failed: pointer must be in-bounds at offset 4103, but is outside bounds of alloc1293907 which has size 4096
   --> /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:788:9
    |
788 |         copy_nonoverlapping(src as *const u8, tmp.as_mut_ptr() as *mut u8, mem::size_of::<T>());
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: pointer must be in-bounds at offset 4103, but is outside bounds of alloc1293907 which has size 4096
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::ptr::read_unaligned::<u64>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:788:9
note: inside `<util::bit_chunk_iterator::BitChunkIterator as std::iter::Iterator>::next` at arrow/src/util/bit_chunk_iterator.rs:144:26
   --> arrow/src/util/bit_chunk_iterator.rs:144:26
    |
144 |                 unsafe { std::ptr::read_unaligned(raw_data.add(index + 1)).to_le() };
    |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: inside `<util::bit_chunk_iterator::BitChunkIterator as std::iter::Iterator>::fold::<std::option::Option<u64>, fn(std::option::Option<u64>, u64) -> std::option::Option<u64> {std::iter::Iterator::last::some::<u64>}>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:2111:29
    = note: inside `<util::bit_chunk_iterator::BitChunkIterator as std::iter::Iterator>::last` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:242:9
note: inside `util::bit_chunk_iterator::tests::test_iter_remainder_out_of_bounds` at arrow/src/util/bit_chunk_iterator.rs:268:30
   --> arrow/src/util/bit_chunk_iterator.rs:268:30
    |
268 |         assert_eq!(u64::MAX, bitchunks.iter().last().unwrap());
    |                              ^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at arrow/src/util/bit_chunk_iterator.rs:259:5
   --> arrow/src/util/bit_chunk_iterator.rs:259:5
    |
259 | /     fn test_iter_remainder_out_of_bounds() {
260 | |         // allocating a full page should trigger a fault when reading out of bounds
261 | |         const ALLOC_SIZE: usize = 4 * 1024;
262 | |         let input = vec![0xFF_u8; ALLOC_SIZE];
...   |
269 | |         assert_eq!(0x7F, bitchunks.remainder_bits());
270 | |     }
    | |_____^

@alamb alamb mentioned this pull request Jun 7, 2021
Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 💯

@alamb alamb merged commit e8d9ef5 into apache:master Jun 8, 2021
alamb pushed a commit that referenced this pull request Jun 8, 2021
* Fix out of bounds read in bit chunk iterator

* Add comment why reading one additional byte is enough
alamb added a commit that referenced this pull request Jun 9, 2021
* Fix out of bounds read in bit chunk iterator

* Add comment why reading one additional byte is enough

Co-authored-by: Jörn Horstmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Out of bound reads in chunk iterator
3 participants