Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Read::initializer #42788

Closed
sfackler opened this issue Jun 21, 2017 · 20 comments · Fixed by #81156
Closed

Tracking issue for Read::initializer #42788

sfackler opened this issue Jun 21, 2017 · 20 comments · Fixed by #81156
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@sfackler
Copy link
Member

sfackler commented Jun 21, 2017

Implemented in #42002.

Open questions / blockers

  • The code currently creates references to uninitialized integers (grep for 42788), can we avoid that?
@sfackler sfackler added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 21, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 22, 2017
@alexcrichton
Copy link
Member

I'm personally no longer sold on the infrastructure that we've got here I think. I was initially hoping that we'd have some safe way of working with these APIs and such, but in the end it all turned up unsafe. In light of all that I'd personally advocate for much simpler API:

impl Read {
    #[inline]
    unsafe fn may_read_buffer(&self) -> bool { true }
}

I don't think the complexity we have today is worth the cost currently, and I'd imagine that we could start making abstractions like:

fn initialize_buffer_for(r: &Read, buf: &mut [u8]) {
    // ...
}

@tbu-
Copy link
Contributor

tbu- commented Dec 13, 2017

I wrote a tiny library for reading into uninitialized buffers: https://crates.io/crates/buffer/

Maybe its ideas could be adapted for the standard library. The following is safe code using the library:

let mut vec = Vec::with_capacity(1024);
if reader.read_buffer(&mut vec)?.len() != 0 {
    if vec[0] == 0 {
        // ...
    }
}

Adoption of something like this in the standard library would probably also help with adoption by third party crates, see e.g. tokio-rs/mio#628.

@seanmonstar
Copy link
Contributor

An alternative to the current API is to go along the path that TrustedLen did, which is just to have an unsafe trait.

unsafe trait ReadUninit: Read {}

With specialization landing, anyone who cares to can take advantage of ReadUinit:

impl<T: Read> ReadBuf for T {
    fn read_buf<B: BufMut>(&mut self, buf: &mut B) {
        self.read(buf.zeroed())
    }
}

impl<T: ReadUninit> ReadBuf for T {
    fn read_buf<B: BufMut>(&mut self, buf: &mut B) {
        self.read(buf.bytes())
    }
}

@sfackler
Copy link
Member Author

That's incompatible with trait objects which are used pretty heavily with Read in particular.

@seanmonstar
Copy link
Contributor

Specialization is incompatible with trait objects?

@sfackler
Copy link
Member Author

If I have a &mut Read, I have no way of checking if it implements ReadUninit or not.

@jethrogb
Copy link
Contributor

jethrogb commented Jan 28, 2019

@RalfJung suggested a function that takes &mut [MaybeUninit<u8>]

@alexcrichton
Copy link
Member

@sfackler do you recall if we have a tracking issue for the alternative method of implementing this where we hint to the compiler that a buffer shouldn't be considered as undefined? (the strategy we discussed at the last work week)

@sfackler
Copy link
Member Author

There's a bunch of discussion in the PR adding it: #58363.

It turns out that we explicitly decided to not go in that direction in the lead up to 1.0, so I think it's worth an RFC that I haven't gotten around to writing. In particular, that approach can change behavior for user-provided Read implementations that are passed to e.g. io::copy even before it's stable.

@abonander
Copy link
Contributor

I second the proposal for &mut [MaybeUninit<u8>] especially since the API design with MaybeUninit seems to be going in that direction anyway; there's just a few methods that I think need to be added to make this work nicely:

// std::mem
impl<T> MaybeUninit<T> {
    pub fn init_all(uninit: &mut [MaybeUninit<T>], val: T) -> &mut [T]  where T: Copy { // `T` could even be `Clone` since this method is panic-safe except for leaks
        ... // write `val` to all elements
        Self::as_init_all(uninit)
    }

    pub fn init_copy(uninit: &mut [MaybeUninit<T>], data: &[T]) -> &mut [T] where T: Copy { // `T` could also be `Clone`
        assert_eq!(uninit.len(), data.len());
        unsafe {
            Self::as_first_ptr(uninit).copy(data, uninit.len());
        }
        Self::as_init_all(uninit)
    }

    pub unsafe fn as_init_all(uninit: &mut [MaybeUninit<T>] -> &mut [T] { 
        ... // assume-initialized intrinsic
        mem::transmute(uninit)
    }
}

// std::io
pub trait Read {
    // ... existing methods

    fn read_uninit(&mut self, uninit: &mut [MaybeUninit<u8>]) -> io::Result<()> {
        self.read(MaybeUninit::init_all(uninit, 0))
    }
}

Then implementations that don't read the buffer can override read_uninit() and use MaybeUninit::as_first_ptr() to pass it to a syscall or copy to it from another buffer with init_copy().

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jul 31, 2019

I wonder about having a method like the following in Read.

fn read_uninit<'a>(&mut self, buf: &'a mut [MaybeUninit<u8>]) -> Result<&'a mut [u8]> {
    for byte in &mut *buf {
        unsafe {
            *byte.as_mut_ptr() = 0;
        }
    }
    let initialized = unsafe { mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(buf) };
    let size = self.read(initialized)?;
    Ok(&mut initialized[..size])
}

The reason for it returning Result<&'a mut [u8]> is to allow using this in safe code as well as preventing malicious implementations - if Result<usize> was returned, as it is a case for other Read methods, an implementation could just say Ok(42) and watch the world burn when trying to interpret the uninitialized bytes. It is the responsibility of an implementation to ensure the mutable slice it returns is initialized.

The choice of 0 assignment is somewhat arbitrary. It can be some other constant or frozen uninitialized memory, as long it doesn't outright cause UB it's fine.

Note that such an implementation doesn't ensure that that result.as_ptr() == MaybeUninit::first_ptr(buf). Unsafe implementations that depend on this should check for this manually. This is necessary, as implementations could return values like Box::leak(vec![1, 2, 3].into_boxed_slice()) otherwise, which isn't verified by the compiler. Optimizer should be able to remove this assertion most of the time.

@sfackler
Copy link
Member Author

The default implementation will make that pretty hard to use in practice, unless you already know that your Read implementation overrides it. You're commonly calling read with a large buffer and don't expect it to be entirely filled, so writing to the entire thing before the raw read call imposes a very large performance penalty. There's an issue somewhere on Tokio's issue tracker about a similar issue with their AsyncRead::read_buf method.

@SimonSapin
Copy link
Contributor

Maybe this, together with no_std compat, makes it worth considering an io2 module with entirely separate traits?

@seanmonstar
Copy link
Contributor

I've been thinking for a few months now that if I could redo them, I wouldn't pass byte slices, but some trait object probably (like &mut dyn Buf), so that new concepts can be built and still be compatible.

@SimonSapin
Copy link
Contributor

What examples of new concepts do you have in mind?

@seanmonstar
Copy link
Contributor

So far, there's 2 that we use extensively: vectored IO, and uninitialized buffers. Using these features require an extra method to be implemented on Read/Write, and both are critical for performance. A problem we've seen is that any read/write wrappers that forget to forward the extra methods suddenly lose on performance, and it's not obvious why.

If there were a single read and write method, where a wrapper would forward to an inner type, the ability to use uninitialized memory and vectored IO would only depend on the dyn Buf and the innermost Read/Write. For similar inspiration, there's ByteBuffer in Java's NIO and ByteBuf in Netty, which allow for new buffer strategies to be added instead of requiring &[u8].

@abonander
Copy link
Contributor

abonander commented Aug 28, 2019

Can't we address the issue of wrappers not forwarding with a lint? It's not bulletproof but anyone who cares should see the warning. The bonus of a lint is that it can be applied to other traits with similar situations, like Iterator (and even AsyncRead if it's exposed).

The problem with introducing a new trait/module is that it fragments the ecosystem and early adopters suffer from a lack of widespread support. There's plenty of examples of this. I still don't see a whole lot of use of Java's NIO myself, but then again I don't really work in enterprise software. I'm not entirely opposed to the idea myself, but it would definitely need a concerted push for adoption.

Now, if Rust ever decides to bring async I/O into the stdlib, it could definitely make the transition then as there would be no competing implementation.

@danielhenrymantilla
Copy link
Contributor

Getting rid of all references to uninitialized bytes in Read

Uninitialized buffers should now be using [MaybeUninit<u8>]. An example of a Read-like API with them is featured in the ::uninit crate. Together with Vec::reserve_uninit it leads to a safe and readable pattern.

The only thing lacking in my crate is an added try_downcast method for Read:

#![allow(bad_style)] // These would not be the final names

trait Read {
    // ...
    //! Added method:

    /// To be overriden by implementors of `ReadIntoUninit`
    #[inline]
    fn try_downcast_to_ReadIntoUninit (self: &'_ mut Self)
        -> Option<&'_ mut dyn ReadIntoUninit>
    {
        None
    }
}

This way, for instance, the unsafe code at https://doc.rust-lang.org/1.37.0/src/std/io/mod.rs.html#355-374 could be removed, and we could have something using DEFAULT_BUF_SIZE for a specialized (R : ReadIntoUninit + ?Sized) version of fn read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> { which would avoid using small allocations for the specialized case, and would fall back to zero-initialized small vec extentions in the current case (i.e., also getting rid of the unsafe, since the unsafe block would simply become g.buf.resize(g.buf.len() + 32, 0).

Users with a dyn Read could then try to upgrade it to dyn ReadIntoUninit to use the improved versions of .read_to_end().

Centril added a commit to Centril/rust that referenced this issue Sep 4, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
Centril added a commit to Centril/rust that referenced this issue Sep 5, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
Centril added a commit to Centril/rust that referenced this issue Sep 5, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
@KodrAus KodrAus added A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 31, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 3, 2020
@Kestrer
Copy link
Contributor

Kestrer commented Jan 10, 2021

Can this be closed now that the reading into unintialized buffers RFC is merged and #78485 exists? Since the API is going to be removed anyway, it seems pointless to keep this open.

@sfackler
Copy link
Member Author

It will be closed when the API is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.