-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark typed buffer APIs safe
(#996) (#1027)
#1866
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -181,19 +181,14 @@ impl Buffer { | |||||||
|
||||||||
/// View buffer as typed slice. | ||||||||
/// | ||||||||
/// # Safety | ||||||||
/// # Panics | ||||||||
/// | ||||||||
/// `ArrowNativeType` is public so that it can be used as a trait bound for other public | ||||||||
/// components, such as the `ToByteSlice` trait. However, this means that it can be | ||||||||
/// implemented by user defined types, which it is not intended for. | ||||||||
pub unsafe fn typed_data<T: ArrowNativeType + num::Num>(&self) -> &[T] { | ||||||||
// JUSTIFICATION | ||||||||
// Benefit | ||||||||
// Many of the buffers represent specific types, and consumers of `Buffer` often need to re-interpret them. | ||||||||
// Soundness | ||||||||
// * The pointer is non-null by construction | ||||||||
// * alignment asserted below. | ||||||||
let (prefix, offsets, suffix) = self.as_slice().align_to::<T>(); | ||||||||
/// This function panics if the underlying buffer is not aligned | ||||||||
/// correctly for type `T`. | ||||||||
pub fn typed_data<T: ArrowNativeType>(&self) -> &[T] { | ||||||||
// SAFETY | ||||||||
// ArrowNativeType are trivially transmutable, and this method checks alignment | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
let (prefix, offsets, suffix) = unsafe { self.as_slice().align_to::<T>() }; | ||||||||
assert!(prefix.is_empty() && suffix.is_empty()); | ||||||||
offsets | ||||||||
} | ||||||||
|
@@ -451,7 +446,7 @@ mod tests { | |||||||
macro_rules! check_as_typed_data { | ||||||||
($input: expr, $native_t: ty) => {{ | ||||||||
let buffer = Buffer::from_slice_ref($input); | ||||||||
let slice: &[$native_t] = unsafe { buffer.typed_data::<$native_t>() }; | ||||||||
let slice: &[$native_t] = buffer.typed_data::<$native_t>(); | ||||||||
assert_eq!($input, slice); | ||||||||
}}; | ||||||||
} | ||||||||
|
@@ -573,12 +568,12 @@ mod tests { | |||||||
) | ||||||||
}; | ||||||||
|
||||||||
let slice = unsafe { buffer.typed_data::<i32>() }; | ||||||||
let slice = buffer.typed_data::<i32>(); | ||||||||
assert_eq!(slice, &[1, 2, 3, 4, 5]); | ||||||||
|
||||||||
let buffer = buffer.slice(std::mem::size_of::<i32>()); | ||||||||
|
||||||||
let slice = unsafe { buffer.typed_data::<i32>() }; | ||||||||
let slice = buffer.typed_data::<i32>(); | ||||||||
assert_eq!(slice, &[2, 3, 4, 5]); | ||||||||
} | ||||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -275,17 +275,14 @@ impl MutableBuffer { | |
|
||
/// View this buffer asa slice of a specific type. | ||
/// | ||
/// # Safety | ||
/// | ||
/// This function must only be used with buffers which are treated | ||
/// as type `T` (e.g. extended with items of type `T`). | ||
/// | ||
/// # Panics | ||
/// | ||
/// This function panics if the underlying buffer is not aligned | ||
/// correctly for type `T`. | ||
pub unsafe fn typed_data_mut<T: ArrowNativeType>(&mut self) -> &mut [T] { | ||
let (prefix, offsets, suffix) = self.as_slice_mut().align_to_mut::<T>(); | ||
pub fn typed_data_mut<T: ArrowNativeType>(&mut self) -> &mut [T] { | ||
// SAFETY | ||
// ArrowNativeType are trivially transmutable, and this method checks alignment | ||
let (prefix, offsets, suffix) = unsafe { self.as_slice_mut().align_to_mut::<T>() }; | ||
assert!(prefix.is_empty() && suffix.is_empty()); | ||
offsets | ||
} | ||
|
@@ -299,7 +296,7 @@ impl MutableBuffer { | |
/// assert_eq!(buffer.len(), 8) // u32 has 4 bytes | ||
/// ``` | ||
#[inline] | ||
pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) { | ||
pub fn extend_from_slice<T: ArrowNativeType>(&mut self, items: &[T]) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method was potentially unsound, as ToByteSlice is not sealed and so could theoretically be implemented for a type that is not trivially transmutable (which the implementation of this method implicitly assumes). Edit: this is an API change |
||
let len = items.len(); | ||
let additional = len * std::mem::size_of::<T>(); | ||
self.reserve(additional); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -576,7 +576,7 @@ macro_rules! def_get_binary_array_fn { | |
fn $name(array: &$ty) -> Vec<ByteArray> { | ||
let mut byte_array = ByteArray::new(); | ||
let ptr = crate::util::memory::ByteBufferPtr::new( | ||
unsafe { array.value_data().typed_data::<u8>() }.to_vec(), | ||
array.value_data().as_slice().to_vec(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know why this was ever using typed_data... |
||
); | ||
byte_array.set_data(ptr); | ||
array | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is truly "safe" -- is it really true that any bit pattern is a valid
ArrowNativeType
? I am thinking about floating point representations in particular -- I wonder if this API could potentially create invalidf32
/f64
which seems like it would thus still beunsafe
🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think https://doc.rust-lang.org/std/primitive.f32.html#method.from_bits is relevant here, the short answer is it is perfectly safe to transmute arbitrary bytes to floats, it may not be wise, but it is not UB.
In particular the standard library provides safe functions that transmute u32 -> f32, u64 -> f64, and so I think it is fair to say all bit sequences are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is safe because there are no actual undefined bit patterns in any of the native types (as opposed to
bool
orOption<...>
for example). Certain bit patterns might get canonicalized when interpreted as floating point values, but I don't think that would be considered undefined behavior. There are more details about specific behavior in the docs for f64::from_bits (which is considered safe).