Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle overflow properly in core::slice #25300

Merged
merged 3 commits into from
May 12, 2015
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 28 additions & 47 deletions src/libcore/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ impl<T> SliceExt for [T] {
assume(!p.is_null());
if mem::size_of::<T>() == 0 {
Iter {ptr: p,
end: (p as usize + self.len()) as *const T,
end: ((p as usize).wrapping_add(self.len())) as *const T,
_marker: marker::PhantomData}
} else {
Iter {ptr: p,
Expand Down Expand Up @@ -277,7 +277,7 @@ impl<T> SliceExt for [T] {
assume(!p.is_null());
if mem::size_of::<T>() == 0 {
IterMut {ptr: p,
end: (p as usize + self.len()) as *mut T,
end: ((p as usize).wrapping_add(self.len())) as *mut T,
_marker: marker::PhantomData}
} else {
IterMut {ptr: p,
Expand Down Expand Up @@ -632,35 +632,17 @@ fn size_from_ptr<T>(_: *const T) -> usize {


// Use macros to be generic over const/mut
//
// They require non-negative `$by` because otherwise the expression
// `(ptr as usize + $by)` would interpret `-1` as `usize::MAX` (and
// thus trigger a panic when overflow checks are on).

// Use this to do `$ptr + $by`, where `$by` is non-negative.
macro_rules! slice_add_offset {
macro_rules! slice_offset {
($ptr:expr, $by:expr) => {{
let ptr = $ptr;
if size_from_ptr(ptr) == 0 {
transmute(ptr as usize + $by)
transmute((ptr as isize).wrapping_add($by))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there even a reason for this to be a transmute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gankro I believe it's a transmute because a) the macro doesn't have any way to refer to the element type so it can't use an as cast, and b) so the same code works with both *const and *mut pointers. But the transmute is constrained to producing the same pointer type as $ptr already, due to the use of ptr.offset($by) in the other branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see it's to let inference actually fill in the blanks. Also why it's a macro, I guess.

} else {
ptr.offset($by)
}
}};
}

// Use this to do `$ptr - $by`, where `$by` is non-negative.
macro_rules! slice_sub_offset {
($ptr:expr, $by:expr) => {{
let ptr = $ptr;
if size_from_ptr(ptr) == 0 {
transmute(ptr as usize - $by)
} else {
ptr.offset(-$by)
}
}};
}

macro_rules! slice_ref {
($ptr:expr) => {{
let ptr = $ptr;
Expand All @@ -683,22 +665,24 @@ macro_rules! iterator {
#[inline]
fn next(&mut self) -> Option<$elem> {
// could be implemented with slices, but this avoids bounds checks
unsafe {
::intrinsics::assume(!self.ptr.is_null());
::intrinsics::assume(!self.end.is_null());
if self.ptr == self.end {
None
} else {
if self.ptr == self.end {
None
} else {
unsafe {
if mem::size_of::<T>() != 0 {
::intrinsics::assume(!self.ptr.is_null());
::intrinsics::assume(!self.end.is_null());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming here that there's no need to make these assertions in the case where self.ptr == self.end. We don't care about the pointer value in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizing compilers are weird and chaotic. This will simply be revisited until it benches/codegens perfectly (it's the most important iterator after all).

}
let old = self.ptr;
self.ptr = slice_add_offset!(self.ptr, 1);
self.ptr = slice_offset!(self.ptr, 1);
Some(slice_ref!(old))
}
}
}

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let diff = (self.end as usize) - (self.ptr as usize);
let diff = (self.end as usize).wrapping_sub(self.ptr as usize);
let size = mem::size_of::<T>();
let exact = diff / (if size == 0 {1} else {size});
(exact, Some(exact))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why these assumes were added in the first place, they showed up in f9ef8cd without anything relevant in the commit message, but they aren't valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helping llvm to not insert redundant null checks in certain loops. Ask @dotdash and @huonw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are they invalid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have everything in fresh memory, but if these are invalid, we are in big trouble anyway (can't use .offset). Split in cases based on the size of T here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're potentially invalid if the element size is zero, because the iterators are free to wrap around in that case. Slices of zero-sized elements don't really have a meaningful pointer, but the iterators behave as though the element has a size of 1 in order to count the number of yielded elements correctly. This means that a slice with a sufficiently high pointer value and sufficiently large length will result in an iterator that wraps around.

I also question the edge case of a slice of non-zero-sized elements that includes the highest byte in the address space. It should be theoretically possible to have a valid slice that includes an element placed in the absolute highest address, and if so, then the end pointer will have wrapped around to 0 and the ptr value will wrap around when the iterator yields its final element. Although in this particular edge case I don't know if the use of offset() is valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the assumptions back in a slightly different form. I believe the new version should be valid in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the impression that such an object size and placement would be illegal/impossible. If it isn't, we should make sure it is illegal in safe rustc. We can't safely index/address all bytes of that object otherwise.

GEP is our .offset().

If the GEP has the inbounds keyword, the result value is undefined (a “trap value”) if the GEP overflows (i.e. wraps around the end of the address space).

http://llvm.org/docs/GetElementPtr.html#what-happens-if-a-gep-computation-overflows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss Ok, I wasn't sure what actually happened if it overflowed. I assumed that the resulting value couldn't actually be dereferenced, but I thought it was plausible that it could still be used as a pointer comparison (since one-past-the-end is a valid location, and having an object occupy the last byte in the address space seems like a plausible thing to support). But if the documentation explicitly says it's a poison value then I guess it is.

That said, this only applies to slices of non-zero-sized elements. offset() on a slice of zero-sized elements results in the same pointer (because anything multiplied by 0 is still 0). The iterators use that macro specifically to get around this case, but e.g. indexing into a slice of zero-sized elements uses offset() and that's fine (it just results in the same pointer that it was called on).

In light of this, I can re-add the end assumption to next().

Expand Down Expand Up @@ -726,13 +710,15 @@ macro_rules! iterator {
#[inline]
fn next_back(&mut self) -> Option<$elem> {
// could be implemented with slices, but this avoids bounds checks
unsafe {
::intrinsics::assume(!self.ptr.is_null());
::intrinsics::assume(!self.end.is_null());
if self.end == self.ptr {
None
} else {
self.end = slice_sub_offset!(self.end, 1);
if self.end == self.ptr {
None
} else {
unsafe {
self.end = slice_offset!(self.end, -1);
if mem::size_of::<T>() != 0 {
::intrinsics::assume(!self.ptr.is_null());
::intrinsics::assume(!self.end.is_null());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code made the assertions prior to offsetting self.end. I'm assuming here that asserting after the offset instead of before it is better, because it means the call to slice_ref!(self.end) can then assume it's non-null.

}
Some(slice_ref!(self.end))
}
}
Expand All @@ -743,7 +729,7 @@ macro_rules! iterator {

macro_rules! make_slice {
($t: ty => $result: ty: $start: expr, $end: expr) => {{
let diff = $end as usize - $start as usize;
let diff = ($end as usize).wrapping_sub($start as usize);
let len = if mem::size_of::<T>() == 0 {
diff
} else {
Expand All @@ -757,7 +743,7 @@ macro_rules! make_slice {

macro_rules! make_mut_slice {
($t: ty => $result: ty: $start: expr, $end: expr) => {{
let diff = $end as usize - $start as usize;
let diff = ($end as usize).wrapping_sub($start as usize);
let len = if mem::size_of::<T>() == 0 {
diff
} else {
Expand Down Expand Up @@ -794,7 +780,7 @@ impl<'a, T> Iter<'a, T> {
fn iter_nth(&mut self, n: usize) -> Option<&'a T> {
match self.as_slice().get(n) {
Some(elem_ref) => unsafe {
self.ptr = slice_add_offset!(elem_ref as *const _, 1);
self.ptr = slice_offset!(self.ptr, (n as isize).wrapping_add(1));
Some(slice_ref!(elem_ref))
},
None => {
Expand Down Expand Up @@ -827,12 +813,7 @@ impl<'a, T> RandomAccessIterator for Iter<'a, T> {
fn idx(&mut self, index: usize) -> Option<&'a T> {
unsafe {
if index < self.indexable() {
if mem::size_of::<T>() == 0 {
// Use a non-null pointer value
Some(&mut *(1 as *mut _))
} else {
Some(transmute(self.ptr.offset(index as isize)))
}
Some(slice_ref!(self.ptr.offset(index as isize)))
} else {
None
}
Expand Down Expand Up @@ -867,7 +848,7 @@ impl<'a, T> IterMut<'a, T> {
fn iter_nth(&mut self, n: usize) -> Option<&'a mut T> {
match make_mut_slice!(T => &'a mut [T]: self.ptr, self.end).get_mut(n) {
Some(elem_ref) => unsafe {
self.ptr = slice_add_offset!(elem_ref as *mut _, 1);
self.ptr = slice_offset!(self.ptr, (n as isize).wrapping_add(1));
Some(slice_ref!(elem_ref))
},
None => {
Expand Down
34 changes: 34 additions & 0 deletions src/test/run-pass/slice-of-zero-size-elements.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Copyright 2015 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// compile-flags: -C debug-assertions

use std::slice;

pub fn main() {
// In a slice of zero-size elements the pointer is meaningless.
// Ensure iteration still works even if the pointer is at the end of the address space.
let slice: &[()] = unsafe { slice::from_raw_parts(-5isize as *const (), 10) };
assert_eq!(slice.len(), 10);
assert_eq!(slice.iter().count(), 10);

// .nth() on the iterator should also behave correctly
let mut it = slice.iter();
assert!(it.nth(5).is_some());
assert_eq!(it.count(), 4);

let slice: &mut [()] = unsafe { slice::from_raw_parts_mut(-5isize as *mut (), 10) };
assert_eq!(slice.len(), 10);
assert_eq!(slice.iter_mut().count(), 10);

let mut it = slice.iter_mut();
assert!(it.nth(5).is_some());
assert_eq!(it.count(), 4);
}