Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::io: vectored reads with uninitialized memory (Read::read_buf_vec) #104

Closed
nrc opened this issue Sep 15, 2022 · 4 comments
Closed

std::io: vectored reads with uninitialized memory (Read::read_buf_vec) #104

nrc opened this issue Sep 15, 2022 · 4 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@nrc
Copy link
Member

nrc commented Sep 15, 2022

Proposal

This is an implementation of part of RFC 2930 (read_buf) which was not deeply specified in the RFC

Problem statement

Implement Read::read_buf_vec.

Motivation, use-cases

Vectored IO without having to initialise all the buffers first.

Solution sketches

This mostly follows BorrowedBuf/BorrowedCursor, extended to slices for vectored IO. We need a new version of IoSliceMut which does not assume that all data in it is initialised. Note that this is an improvement over the read_vec API because the underlying buffers can be reused.

An alternative would be to use the old single buffer design, but this has problems with a complex API and a soundness footgun. We could also pass a slice of BorrowedCursors, but this is pretty horrible ergonomically.

/// A buffer type used with `Read::read_buf_vectored`. Unlike `IoSliceMut`, there is no guarantee
/// that its memory has been initialised.
///
/// It is semantically a wrapper around an &mut [MaybeUninit<u8>], but is guaranteed to be ABI
/// compatible with the `iovec` type on Unix platforms and WSABUF on Windows.
pub struct IoSliceMaybeUninit<'a> {}

impl Debug for IoSliceMaybeUninit<'_> {}

/// Create a new `IoSliceMaybeUninit` from an uninitialized buffer.
impl<'a> From<&'a mut [MaybeUninit<u8>]> for IoSliceMaybeUninit<'a> {
    fn from(buf: &'a mut [MaybeUninit<u8>]) -> IoSliceMaybeUninit<'a> {}
}

impl<'a> IoSliceMaybeUninit<'a> {
    /// Create a new IoSliceMaybeUninit from an existing mutable slice of `u8`s.
    ///
    /// SAFETY: all bytes in the slice must be initialized by the time the IoSliceMaybeUninit is
    /// destroyed and thus access to the data is restored for `slice` or other views of the data.
    pub unsafe fn from_slice(slice: &'a mut [u8]) -> IoSliceMaybeUninit<'a> {}

    /// Create a new IoSliceMaybeUninit with its internal cursor advanced.
    pub fn advance(&self, n: usize) -> Self {}

    /// View the slice as a slice of `u8`s.
    ///
    /// # Safety
    ///
    /// The caller must ensure that all elements of the slice have been initialized.
    pub unsafe fn as_slice(&self) -> &[u8] {}

    /// View the slice as a mutable slice of `u8`s.
    ///
    /// # Safety
    ///
    /// The caller must ensure that all elements of the slice have been initialized.
    pub unsafe fn as_mut_slice(&mut self) -> &mut [u8] {}

    /// View the slice as a mutable slice of `MaybeUninit<u8>`.
    pub fn as_maybe_init_slice(&mut self) -> &mut [MaybeUninit<u8>] {}

    /// Returns the number of elements in the slice.
    pub fn len(&self) -> usize {}
}

/// A borrowed byte buffer, consisting of multiple underlying buffers, which is incrementally filled
/// and initialized. Primarily designed for vectored IO.
///
/// This type is a sort of "double cursor". It tracks three regions in the buffer: a region at the beginning of the
/// buffer that has been logically filled with data, a region that has been initialized at some point but not yet
/// logically filled, and a region at the end that is fully uninitialized. The filled region is guaranteed to be a
/// subset of the initialized region.
///
/// In summary, the contents of the buffer can be visualized as:
/// ```not_rust
/// [   |   |      |    |   |       |   ] Underlying buffers
/// [             capacity              ]
/// [ filled |         unfilled         ]
/// [    initialized    | uninitialized ]
/// ```
///
/// A `BorrowedSliceBuf` is created around some existing data (or capacity for data) via a unique reference
/// (`&mut`). The `BorrowedSliceBuf` can be configured (e.g., using `clear` or `set_init`), but otherwise
/// is read-only. To write into the buffer, use `unfilled` to create a `BorrowedSliceCursor`. The cursor
/// has write-only access to the unfilled portion of the buffer (you can think of it as a
/// write-only iterator).
///
/// The lifetime `'a` is a bound on the lifetime of the underlying data.
pub struct BorrowedSliceBuf<'a> {}

impl<'a> BorrowedSliceBuf<'a> {
    /// Create a new `BorrowedSliceBuf` from a slice of possibly initialized io slices.
    pub fn new<'b: 'a>(bufs: &'a mut [IoSliceMaybeUninit<'b>]) -> BorrowedSliceBuf<'a> {}

    /// Returns the length of the filled part of the buffer.
    pub fn len(&self) -> usize {}

    /// Returns the number of completely filled slices in the buffer.
    pub fn len_filled_slices(&self) -> usize {}

    /// Returns the number of filled elements in any partially filled slice.
    ///
    /// If there are no partially filled slices, then this method returns `0`.
    pub fn len_partial_filled_slice(&self) -> usize {}

    /// Iterate over the filled portion of the buffer.
    pub fn iter_filled_slices(&self) -> FilledSliceIterator<'_, 'a> {}

    /// Returns a cursor over the unfilled part of the buffer.
    pub fn unfilled<'this>(&'this mut self) -> BorrowedSliceCursor<'this> {}

    /// Clears the buffer, resetting the filled region to empty.
    ///
    /// The number of initialized bytes is not changed, and the contents of the buffer are not modified.
    pub fn clear(&mut self) -> &mut Self {}

    /// Asserts that a prefix of the underlying buffers are initialized. The initialized prefix is
    /// all of the first `b - 1` buffers and the first `n` bytes of the `b`th buffer. In other words,
    /// `(b, n)` is the coordinates of the first uninitialized byte in the buffers.
    ///
    /// `BorrowedSliceBuf` assumes that bytes are never de-initialized, so this method does nothing when called with fewer
    /// bytes than are already known to be initialized.
    ///
    /// # Safety
    ///
    /// The caller must ensure that all of the `(b, n)` prefix has already been initialized.
    pub unsafe fn set_init(&mut self, b: usize, n: usize) -> &mut Self {}
}

/// A writeable view of the unfilled portion of a [`BorrowedSliceBuf`](BorrowedSliceBuf).
///
/// Provides access to the initialized and uninitialized parts of the underlying `BorrowedSliceBuf`.
/// Data can be written directly to the cursor by using [`append`](BorrowedSliceCursor::append) or
/// indirectly by writing into a view of the cursor (obtained by calling `as_mut`, `next_init_mut`,
/// etc.) and then calling `advance`.
///
/// Once data is written to the cursor, it becomes part of the filled portion of the underlying
/// `BorrowedSliceBuf` and can no longer be accessed or re-written by the cursor. I.e., the cursor tracks
/// the unfilled part of the underlying `BorrowedSliceBuf`.
///
/// The lifetime `'a` is a bound on the lifetime of the underlying data.
pub struct BorrowedSliceCursor<'a> {}

impl<'a> BorrowedSliceCursor<'a> {
    /// Clone this cursor.
    ///
    /// Since a cursor maintains unique access to its underlying buffer, the cloned cursor is not
    /// accessible while the clone is alive.
    pub fn reborrow<'this>(&'this mut self) -> BorrowedSliceCursor<'this> {}

    /// Returns the available space in the cursor.
    pub fn capacity(&self) -> usize {}

    /// Returns the number of bytes written to this cursor since it was created from a `BorrowBuf`.
    ///
    /// Note that if this cursor is a reborrow of another, then the count returned is the count written
    /// via either cursor, not the count since the cursor was reborrowed.
    pub fn written(&self) -> usize {}

    /// Returns a mutable reference to the whole cursor.
    ///
    /// Returns a guard type which dereferences to a `&mut [IoSliceMaybeUninit<'a>]`
    ///
    /// # Safety
    ///
    /// The caller must not uninitialize any bytes in the initialized portion of the cursor.
    pub unsafe fn as_mut<'this>(&'this mut self) -> BorrowedSliceGuard<'this, 'a> {}

    /// Returns a shared reference to the initialized portion of the first (at least partially)
    /// initialised buffer in the cursor.
    ///
    /// Returns a reference to a slice of a single underlying buffer. That buffer will be the first
    /// unfilled buffer which is at least partially initialized. The returned slice is the part of
    /// that buffer which is initialized. If there is no part of any buffer which is both unfilled
    /// and initialised, then this method returns `None`.
    ///
    /// Does not iterate over buffers in any way. Calling this method multiple times will return
    /// the same slice unless data is either filled or initialized (e.g., by calling `advance`).
    pub fn next_init_ref(&self) -> Option<&[u8]> {}

    /// Returns a mutable reference to the initialized portion of the first (at least partially)
    /// initialised buffer in the cursor.
    ///
    /// Returns a reference to a slice of a single underlying buffer. That buffer will be the first
    /// unfilled buffer which is at least partially initialized. The returned slice is the part of
    /// that buffer which is initialized. If there is no part of any buffer which is both unfilled
    /// and initialised, then this method returns `None`.
    ///
    /// Does not iterate over buffers in any way. Calling this method multiple times will return
    /// the same slice unless data is either filled or initialized (e.g., by calling `advance`).
    pub fn next_init_mut(&mut self) -> Option<&mut [u8]> {}

    /// Returns a mutable reference to the uninitialized portion of the first (at least partially)
    /// uninitialised buffer in the cursor.
    ///
    /// It is safe to uninitialize any of these bytes.
    ///
    /// Returns `None` if `self` is entirely initialized.
    ///
    /// Does not iterate over buffers in any way. Calling this method multiple times will return
    /// the same slice unless data is either filled or initialized (e.g., by calling `advance`).
    pub fn next_uninit_mut(&mut self) -> Option<&mut [MaybeUninit<u8>]> {}

    /// Returns a mutable reference to the first buffer in the cursor.
    ///
    /// Returns `None` if `self` is empty.
    ///
    /// Does not iterate over buffers in any way. Calling this method multiple times will return
    /// the same slice unless data is filled (e.g., by calling `advance`).
    ///
    /// # Safety
    ///
    /// The caller must not uninitialize any bytes in the initialized portion of the cursor.
    pub unsafe fn next_mut(&mut self) -> Option<&mut [MaybeUninit<u8>]> {}

    /// Initializes all bytes in the cursor.
    pub fn ensure_init(&mut self) -> &mut Self {}

    /// Initializes all bytes in the first (at least partially unfilled) buffer in the cursor.
    pub fn ensure_next_init(&mut self) -> &mut Self {}

    /// Advance the cursor by asserting that `n` bytes have been filled.
    ///
    /// After advancing, the `n` bytes are no longer accessible via the cursor and can only be
    /// accessed via the underlying `BorrowedSliceBuf`. I.e., the `BorrowedSliceBuf`'s filled portion
    /// grows by `n` elements and its unfilled portion (and the capacity of this cursor) shrinks by
    /// `n` elements.
    ///
    /// # Safety
    ///
    /// The caller must ensure that the first `n` bytes of the cursor have been properly
    /// initialised.
    pub unsafe fn advance(&mut self, mut n: usize) -> &mut Self {}

    /// Appends data to the cursor, advancing position within its buffer.
    ///
    /// # Panics
    ///
    /// Panics if `self.capacity()` is less than `buf.len()`.
    pub fn append(&mut self, mut buf: &[u8]) {}

    /// Sets the initialized region to the minimum of the currently initialized region
    /// and the filled region.
    ///
    /// The caller must ensure all invariants for `self.bufs.filled` are satisfied (see the field definition)
    ///
    /// # Safety
    ///
    /// The caller must ensure that the filled region is entirely initialized.
    unsafe fn update_init_to_filled(&mut self) {}
}

impl<'a> Write for BorrowedSliceCursor<'a> {
    fn write(&mut self, buf: &[u8]) -> Result<usize> {}
    fn flush(&mut self) -> Result<()> {}
}

/// An iterator over the filled slices of a `BorrowedSliceBuf`.
///
/// See `BorrowedSliceBuf::iter_filled`.
pub struct FilledSliceIterator<'buf, 'data> {}

impl<'buf, 'data> Iterator for FilledSliceIterator<'buf, 'data> {
    type Item = &'buf [u8];
    fn next(&mut self) -> Option<&'buf [u8]>
}

/// Guard type used by `BorrowedSliceCursor::as_mut`.
///
/// Presents a view of the cursor containing only the filled data (via the `Deref` impls). Also
/// resets the state of the underlying BorrowedSliceBuf to a view of the complete
/// buffer when dropped.
pub struct BorrowedSliceGuard<'buf, 'data> {}

impl<'buf, 'data> Drop for BorrowedSliceGuard<'buf, 'data> {}
impl<'buf, 'data> Deref for BorrowedSliceGuard<'buf, 'data> {
    type Target = [IoSliceMaybeUninit<'data>];
    fn deref(&self) -> &[IoSliceMaybeUninit<'data>] {}
}
impl<'buf, 'data> DerefMut for BorrowedSliceGuard<'buf, 'data> {}

Links and related work

Draft PR with incomplete implementation: rust-lang/rust#101842

@nrc nrc added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 15, 2022
@programmerjake
Copy link
Member

#[derive(Clone)]
pub struct IoSliceMaybeUninit<'a> {}

Clone on a &mut [_]?! looks wrong to me.

UB:

let array = [0u8; 4];
let buf1 = IoSliceMaybeUninit::from(array);
let buf2 = buf1.clone();
let mut_ref1: &mut [_] = buf1.as_maybe_init_slice();
let mut_ref2: &mut [_] = buf2.as_maybe_init_slice();
// UB: both mut_ref1 and mut_ref2 point to the same memory despite being unique references

Also, as_maybe_init_slice is a unsound API because it lets you safely get a &mut [MaybeUninit<u8>] from a &mut [u8], you can then write uninit bytes without needing unsafe, allowing producing an uninit u8 in safe code.

@nrc
Copy link
Member Author

nrc commented Sep 15, 2022

Yeah, I think you're right on both counts. Removing Clone shouldn't make a huge difference to the API, it's mainly about how things are implemented. For the second point, I think as_maybe_init_slice should probably be safe and converting a &[u8] to IoSliceMaybeUninit should be unsafe, what do you think?

@nrc
Copy link
Member Author

nrc commented Nov 27, 2022

I've updated the design in the OP and I've pushed rust-lang/rust@03530ce to the draft PR with the changes discussed above.

@the8472
Copy link
Member

the8472 commented Jul 12, 2023

We discussed it during yesterday's libs-api meeting. We're ok with the general idea but the API surface will likely need some exploration and refinement.

Questions that came up during the meeting:

  • could this be better integrated with IoSlice
  • could BorrowedBuf be unified with BorrowedSliceBuf, with the former being a special case of the latter

@the8472 the8472 closed this as completed Jul 12, 2023
@the8472 the8472 added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

3 participants