Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing new AsyncRead / AsyncWrite traits #1744

Closed
wants to merge 1 commit into from

Conversation

seanmonstar
Copy link
Member

Introduce new AsyncRead / AsyncWrite

This PR introduces new versions of AsyncRead / AsyncWrite traits.
The proposed changes aim to improve:

  • ergonomics.
  • integration of vectored operations
  • working with uninitialized byte slices.

Overview


The PR changes the AsyncRead and AsyncWrite traits to accept T: Buf and T: BufMut values instead of &[u8] and &mut [u8]. Because
&[u8] implements Buf and &mut [u8] implements BufMut, the same
calling patterns used today are still possible. Additionally, any type
that implements Buf and BufMut may be used. This includes
Cursor<&[u8]>, Bytes, ...

Improvement in ergonomics


Calls to read and write accept buffers, but do not necessary use up
the entirety of the buffer. Both functions return a usize representing
the number of bytes read / written. Because of this, it is common to
write loops such as:

let mut rem = &my_data[..];while !rem.is_empty() {
    let n = my_socket.write(rem).await?;
    rem = &rem[n..];
}


The key point to notice is having to use the return value to update the
position in the cursor. This is both common and error prone. The Buf /
BufMut traits aim to ease this by building the cursor concept directly
into the buffer. By using these traits with AsyncRead / AsyncWrite,
the above loop can be simplified as:

let mut buf = Cursor::new(&my_data[..]);while buf.has_remaining() {
    my_socket.write(&mut buf).await?;
}


A small reduction in code, but it removes an error prone bit of logic
that must be often repeated.

Integration of vectored operations


In the AsyncRead / AsyncWrite traits provided by futures-io,
vectored operations are covered using separate fns: poll_read_vectored
and poll_write_vectored. These two functions have default
implementations that call the non-vectored operations.

This has a draw back, when implementing AsyncRead / AsyncWrite,
usually as a layer on top of a type such as TcpStream, the implementor
must not forget to impleement these two additional functions. Otherwise,
the implementation will not be able to use vectored operations even if
the underlying TcpStream supports it. Secondly, it requires duplication
of logic: one poll_read implementation and one poll_read_vectored
implementation. It is possible to implement one in terms of the other,
but this can result in sub-optimial implementations.

Imagine a situation where a rope
data structure is being written to a socket. This structure is comprised
of many smaller byte slices (perhaps thousands). To write it efficiently
to the socket, avoiding copying data is preferable. To do this, the byte
slices need to be loaded in an IoSlice. Since modern linux systems
support a max of 1024 slices, we initialize an array of 1024 slices,
iterate the rope to populate this array and call poll_write_vectored.
The problem is that, as the caller, we don't know if the AsyncWrite
type supports vectored operations or not, poll_write_vectored is
called optimistically. However, the implementation "forgot" to proxy its
function to TcpStream, so poll_read is called w/ the first entry in
the IoSlice. The problem is, for each call to poll_read_vectored, we
must iterate 1024 nodes in our rope to only have one chunk written at a
time.

By using T: Buf as the argument, the decision of whether or not to use
vectored operations is left up to the leaf AsyncWrite type.
Intermediate layers only implement poll_write w/ T: Buf and pass it
along to the inner stream. The TcpStream implementation will know that
it supports vectored operations and know how many slices it can write at
a time and do "the right thing".

Working with uninitialized byte slices


When passing buffers to AsyncRead, it is desirable to pass in
uninitialized memory which the poll_read call will write to. This
avoids the expensive step of zeroing out the memory (doing so has
measurable impact at the macro level). The problem is that uninitialized
memory is "unsafe", as such, care must be taken.

Tokio initially attempted to handle this by adding a
prepare_uninitialized_buffer
function. std is investigating adding a
similar
though improved variant of this API. However, over the years, we have
learned that the prepare_uninitialized_buffer API is sub optimal for
multiple reasons.

First, the same problem applies as vectored operations. If an
implementation "forgets" to implement prepare_uninitialized_buffer
then all slices must be zeroed out before passing it to poll_read,
even if the implementation does "the right thing" (not read from
initialized memory). In practice, most implementors end up forgetting to
implement this function, resulting in memory being zeroed out.

Secondly, implementations of AsyncRead that should not require
unsafe to implement now must add unsafe simply to avoid having
memory zeroed out.

Switching the argument to T: BufMut solves this problem via the
BufMut trait. First, BufMut provides low-level functions that return
&mut [MaybeUninitialized<u8>]. Second, it provides utility functions
that provide safe APIs for writing to the buffer (put_slice, put_u8,
...). Again, only the leaf AsyncRead implementations (TcpStream)
must use the unsafe APIs. All layers may take advantage of uninitialized
memory without the associated unsafety.

Drawbacks


The primary drawback is genericizing the AsyncRead and AsyncWrite
traits. This adds complexity. We feel that the added benefits discussed
above outweighs the drawbacks, but only trying it out will validate it.

Relation to futures-io, std, and roadmap


The relationship between tokio's I/O traits and futures-io has come
up a few times in the past. Tokio has historically maintained its own
traits. futures-io has just been released with a simplified version of
the traits. There is also talk of standardizing AsyncRead and
AsyncWrite in std. Because of this, we believe that now is the
perfect time to experiment with these traits. This will allow us to gain
more experience before committing to traits in std.

The next version of Tokio will not be 1.0. This allows us to
experiment with these traits and remove them for 1.0 if they do not pan
out.

Once AsyncRead / AsyncWrite are added to std, Tokio will provide
implementations for its types. Until then, tokio-util will provide a
compatibility layer between Tokio types and futures-io.

This replaces the `[u8]` byte slice arguments to `AsyncRead` and
`AsyncWrite` with `dyn Buf` and `dyn BufMut` trait objects.
@seanmonstar seanmonstar requested review from carllerche and a team November 6, 2019 23:43
let res = ready!(self.as_mut().get_pin_mut().poll_read(cx, buf));
self.discard_buffer();
return Poll::Ready(res);
}
unimplemented!()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the intention to finish this before merging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, ahem, yes. Nothing to see here.

(I commented some things out when I started the branch to chip away at the insane number of compiler errors, and then may have forgotten where I left some of these...)

@hawkw
Copy link
Member

hawkw commented Nov 6, 2019

@seanmonstar from the PR description:

The primary drawback is genericizing the AsyncRead and AsyncWrite
traits. This adds complexity. We feel that the added benefits discussed
above outweighs the drawbacks, but only trying it out will validate it.

It looks like the AsyncRead/AsyncWrite traits in this PR take dyn Buf/dyn BufMut trait object references? Is this comment still accurate?

@Ralith
Copy link
Contributor

Ralith commented Nov 6, 2019

Initial thoughts from a quick skim:

  • Since the cursor is now handled automatically, should the usize return be removed?
  • Can we take this opportunity to replace io::Error with an associated type, in the same spirit of trying things while we still can?

e: Oh, and I really like this idea overall.

@seanmonstar
Copy link
Member Author

Since the cursor is now handled automatically, should the usize return be removed?

Good point, I'd thought about that as well. I'm in favor of changing the return types of poll_read and poll_write to io::Result<()> (from io::Result<usize>) since it's part of {Buf, BufMut}.

Can we take this opportunity to replace io::Error with an associated type, in the same spirit of trying things while we still can?

I haven't been bothered by the return type myself, so it wasn't something that I personally wanted to experiment with, and I was scared away by the supposed dragons mentioned in the origin std-io reform RFC. But I don't mean to scare away others from experimenting as well :)

@Ralith
Copy link
Contributor

Ralith commented Nov 6, 2019

But I don't mean to scare away others from experimenting as well :)

Fair enough, I can draft a follow-up once this is in.

@sfackler
Copy link
Contributor

sfackler commented Nov 7, 2019

WRT unininitialized buffers, I have a half-written thing discussing the path forward for std that I'm going to try to finish up this week: https://paper.dropbox.com/doc/IO-Buffer-Initialization--AoGf~cBSiAGi3mjGAJe9fW9pAQ-MvytTgjIOTNpJAS6Mvw38 - please add comments if you'd like!

It does seem to me like this is a nice and clean approach, but there's no way we can adjust Read in the same way. It may very well be the case that we shouldn't handicap AsyncRead to ensure it matches Read, but IMO it's something to think about (especially if we want to avoid huge amounts of churn when the AsyncRead/AsyncWrite traits land in std).

@carllerche carllerche changed the title Introduce new AsyncRead / AsyncWrite Proposing new AsyncRead / AsyncWrite traits Nov 7, 2019
@Ralith
Copy link
Contributor

Ralith commented Nov 7, 2019

The downsides of generic methods seems to be mainly additional complexity required to use a trait object (via secondary provided object-safe methods, or a secondary trait with a blanket impl). In exchange, they offer guaranteed static dispatch. I'm not sure either side is a clear win, particularly since virtual dispatch likely pales in comparison to the actual cost of doing I/O.

One case where the inlining afforded by monomorphization might be important is implementations of Buf that produce large numbers of small discontinuous slices. For example, imagine a serialization library that derives Buf implementations for PoD structs to be read as a series of slices, one field at a time. This feels a bit contrived, but I'm not sure it's beyond reason.

@seanmonstar
Copy link
Member Author

I can re-check the description later, but it might not be clear: this proposes passing &mut dyn Buf instead of generics, specifically to allow dyn AsyncRead and stuff to still work.

@Ralith
Copy link
Contributor

Ralith commented Nov 7, 2019

Yeah, just giving some initial thoughts on that tradeoff in response to @carllerche's request.

specifically to allow dyn AsyncRead and stuff to still work.

That could still be accomplished with an extra trait.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 7, 2019

@seanmonstar

I can re-check the description later, but it might not be clear: this proposes passing &mut dyn Buf instead of generics, specifically to allow dyn AsyncRead and stuff to still work.

Yeah, I think the original text

genericizing the AsyncRead and AsyncWrite traits

Is what generated the confusion here, along with the various mentions of T: Buf and T: BufMut.

@Matthias247
Copy link
Contributor

I'm not sure either side is a clear win, particularly since virtual dispatch likely pales in comparison to the actual cost of doing I/O.
One case where the inlining afforded by monomorphization might be important is implementations of Buf that produce large numbers of small discontinuous slices.

I think there is something to this. I don't think virtual dispatches matter when reading from the real TCP socket. But often buffered readers are implemented on top of these - which are then used by deserialization libraries. When those read things in 1-8 byte chunks from the buffer the dispatch might have an impact.

But that should be benchmarkable. I guess there are libraries out there which are exactly doing this (h2?).

Is the ergonomics reason such a big thing in practice? I'm typically just doing write_all() on byte slices or slics of byte slices via extension traits - which also means I don't have to advance the cursor in the application code. If not using write_all I typically care about the amount of written bytes.

By using T: Buf as the argument, the decision of whether or not to use
vectored operations is left up to the leaf AsyncWrite type.
Intermediate layers only implement poll_write w/ T: Buf and pass it
along to the inner stream.

Is nesting and AsyncWrite a big thing? From my point of view it is mostly a leaf type which is "hand-implemented". Even if not implemented by a TCP socket but by a TLS or HTTP library, those will be leaves in the sense that the writes go into their internal buffers and are not directly forwarded. They have to implement support for vectored operations anyway.

For application code I think interacting with those things via Future adapters will likely get more the norm. Here people do not interact with the low-level AsyncRead/AsyncWrite methods anyway. E.g. we write things like:

fn do_with_stream(stream: &dyn AsyncWrite) {
   let rem = &my_data[..];
   stream.write_all(rem).await
}

@carllerche
Copy link
Member

@Matthias247

Is nesting and AsyncWrite a big thing?

Same as std... there are use cases (TLS, deflate / inflate, mocking,...) there are protocol specific impls, like HTTP bodies, websocket, etc...

In my personal experience, I definitely hit the issue where, in order to have the most efficient impl possible, I needed to implement both poll_read and poll_read_vectored separately to do different things, resulting in code dup. Most often than not, I ended up just skipping the specialized poll_read_vectored option.

For me, the biggest factor is the "safety" aspect and uninitialized memory. Being required to zero out memory is sub optimal and I don't think the current strategy for modeling safety & uninitialized memory is ideal. WDYT?

@carllerche
Copy link
Member

Also, to note, taking &mut dyn Buf will prevent passing in &[u8] as an argument. If the argument type is impl Buf, then &[u8] can be passed in. Though, odds are users will not call poll_read directly, instead going via the async fn read(&mut self) -> io::Result<usize> path.

Object safety can also be achieved by doing something like (i forgot the exact incantation):

trait AsyncRead {
    fn poll_read(&mut self, dst: impl BufMut) -> Poll<io::Result<usize>> where Self: Sized;

    #[doc(hidden)]
    fn poll_read_dyn(&mut self, dst: &mut dyn BufMut) -> Poll<io::Result<usize>> {
        self.poll_read(dst)
    }
}

impl AsyncRead for Box<dyn AsyncRead> {
    fn poll_read(&mut self, dst: impl BufMut) -> Poll<io::Result<usize>> {
        self.poll_read_dyn(dst)
    }
}

@Matthias247
Copy link
Contributor

Same as std... there are use cases (TLS, deflate / inflate, mocking,...) there are protocol specific impls,
like HTTP bodies, websocket, etc...

Right. But I would think a lot of those just go back into either being an buffered reader around another stream, or they have an internal buffer which actually doesn't benefit from vectored IO. Maybe @Nemo157 has some insight from the implementation of async-compression?

In my personal experience, I definitely hit the issue where, in order to have the most efficient impl possible, I needed to implement both poll_read and poll_read_vectored separately to do different things, resulting in code dup. Most often than not, I ended up just skipping the specialized poll_read_vectored option.

I'm guilty too :-) Although the only situation I really could recall where I could forward some vectored IO was in the case of some buffered reader/writer operation - when the buffer was exhausted and bypassed.

For me, the biggest factor is the "safety" aspect and uninitialized memory. Being required to zero out memory is sub optimal and I don't think the current strategy for modeling safety & uninitialized memory is ideal. WDYT?

I definitely agree that zeroing out memory is a bit annoying from a performance perspective, since it is typically not necessary for the correctness of the program. I don't yet understand what the impact of simply not doing it would be, since I don't have the historic context of APIs like prepare_uninitialized_buffer. A not uninitialized byte array is at least still a byte array with valid u8 values that can be read. So for any implementation it of an AsyncRead it should still be ok reading those values? Is this about tools like Miri not liking those interactions? I am wondering whether the differentiation between uninitialized memory and initialized ones must really be a property if the byte stream, or rather of the caller or memory allocator. Would you really ever do anything different in a Stream based on the knowledge that bytes have not been initialized?

Also, to note, taking &mut dyn Buf will prevent passing in &[u8] as an argument.

I think being able to pass &[u8] is important for discoverability and to allow newcomers to work with the API. It will likely be what most users are using, if they don't go for owned APIs like Bytes. I think a generic argument like Buf here can increase the barrier for newcomers. It reminds me of when I started boost asio long ago, and the function signatures like this one mention MutableBufferSequence and I first had no idea what to pass. I learned then quickly that I can pass buffer(ptr, length), but it took quite a bit of time to understand and appreciate the full flexibility of the API.
I therefore think that being able to pass a normal byte slice - potentially with implicit conversion - is important. As far as I understand that could also be in the async fn/Future wrapper around this. But maybe the situation is different for Rust, since users get in touch with generics a lot more often.

Object safety was one of the main reason why directly implementing IO as Future producing types was not found ideal in rust-lang/futures-rs#1365. I think the reasons are valid, and I have appreciated the ability to pass byte streams as dyn AsyncRead now in several occasions. Although I'm still a bit sad that I can't implement these types via composition and async fn.

@Nemo157
Copy link
Contributor

Nemo157 commented Nov 7, 2019

I'm in favor of changing the return types of poll_read and poll_write to io::Result<()> (from io::Result<usize>) since it's part of {Buf, BufMut}.

Being able to easily see how many bytes were written/read during the current call is very useful, at a minimum to be able to check for EOF, but also when performing other operations on the data in parallel to writing/reading it.

EDIT: It is possible to get this data without returning it, but it becomes a very painful pattern after the fifth time of writing it

let prior_len = buf.bytes().len();
writer.poll_write(&mut buf)?;
let written = buf.bytes().len() - prior_len;

and looking at BufMut I'm not certain whether remaining_mut is allowed to change dynamically as data is written, if it is I don't think it is possible to externally determine how many bytes were written during poll_read without wrapping the buffer.

@Nemo157
Copy link
Contributor

Nemo157 commented Nov 7, 2019

But I would think a lot of those just go back into either being an buffered reader around another stream, or they have an internal buffer which actually doesn't benefit from vectored IO. Maybe @Nemo157 has some insight from the implementation of async-compression?

Nope, I have been ignoring the vectored IO methods so far.

One complication is that I'm using AsyncBufRead and AsyncBufWrite for the underlying IO, which don't expose any vectored access (although maybe it doesn't matter at this interface, is vectored IO just so that the application can write disparate slices in one kernel operation, or does splitting a single large read up into multiple smaller slices give a performance benefit?).

If a user is passing vectored IO buffers into the encoder/decoder then there would be a performance advantage to having it loop over all the buffers within the context of a single operation, to avoid extra calls into the underlying IO per-buffer. This seems easy to implement, and I could do the inverse of the default implementation and just have poll_read call poll_read_vectored with a single buffer vector.

@MOZGIII
Copy link
Contributor

MOZGIII commented Nov 7, 2019

Are there any plans on having generic support for various underlying types? I.e. Buf<T: Copy>. I’m working with audio samples, and there are a lot of operations where it’d be useful to have the same abstractions that we have for u8 buffers, but for other types (f32, u16, i16 to name a few).
If not I think we should implement a separate (i.e. not std and non tokio) crate with such abstraction.

To be concrete, I have a practical use case example. I\m using the code that extends the AsyncRead/AsyncWrite trait concepts for generic types, for instance this: https://github.com/MOZGIII/netsound/blob/1137f77966a6a3bd053acbc2e06dee68b31af0ba/src/io/async_write_items.rs

@Shnatsel
Copy link

Shnatsel commented Nov 7, 2019

The "read to uninitialized memory" is a really thorny problem, and I'm glad to see it's being addressed! However, some important aspects of it are not clear to me from the description:

Switching the argument to T: BufMut solves this problem via the BufMut trait. First, BufMut provides low-level functions that return &mut [MaybeUninitialized<u8>].

Returning a single slice does not seem to be possible with the current release of bytes crate - the documentation on BufMut states that the underlying storage may or may not be in contiguous memory. I see this PR uses bytes crate from git, is there a requirement for the storage to be contiguous in the git version of bytes?

Also, BufMut writes are infallible, i.e. it will allocate more memory if it runs out of capacity. Does this still hold for the revised version of BufMut ?
If so, this is contrary to the current Read behavior that writes to a fixed-size slice. In many cases allocating an unbounded amount of memory is undesirable and will lead to a DoS attack; I fear providing an automatically reallocating interface alone will make people roll their own unsafe with uninitialized slices, just like it's already happening with Vec. Providing writes to a Vec-like buffer with a bounded capacity sounds like a simpler solution, since it could be used to safely implement writes to an unbounded buffer in turn.

@Shnatsel
Copy link

Shnatsel commented Nov 7, 2019

@sfackler I've been giving a lot of thought to encapsulating uninit buffers, incl. in Read trait. I'll leave comments on the doc in the next few days. You can also catch me in #wg-secure-code on Rust Zulip if you like.

@sfackler
Copy link
Contributor

sfackler commented Nov 7, 2019

Also, BufMut writes are infallible, i.e. it will allocate more memory if it runs out of capacity.

That's the behavior of Vec's implementation, but not the behavior of the implementations for other types like slices or Cursors. For those types, the capacity never increases.

Definitely interested in your thoughts on Read + uninit buffers!

@Ralith
Copy link
Contributor

Ralith commented Nov 7, 2019

Are there any plans on having generic support for various underlying types?

This is an interesting thought; AsyncRead/Write then become something like a batch-oriented version of Stream/Sink. Probably something to explore independent of this specific PR though, same as adding an associated error type.

@baloo
Copy link
Contributor

baloo commented Nov 8, 2019

Just wondering, given the amount of times AsyncRead/AsyncWrite is referenced to in tokio and its ecosystem, wouldn't all those dyn Buf or dyn BufMut (from a concrete type) affect compilation speed significantly? I feel like it's a lot more work for rustc to handle those.

Otherwise, the arguments sounds reasonable.

@hawkw
Copy link
Member

hawkw commented Nov 8, 2019

@baloo I believe a dyn Trait is going to be much easier on the compiler than a generic or impl Trait (no monomorphization!)

@DoumanAsh
Copy link
Contributor

@hawkw virtual dispatch has runtime overhead and it might not be desirable for high performance.

Using Buf* is certainly convenient as user can provide something more than concrete type, but we should note potential performance hit

@withoutboats
Copy link

My comment is divided into two sections: first, a relatively small proposal for how the organization in this API should change. Second, some commentary on the divergence between this API and the futures crate, and my opinion about the future of IO interfaces in std.


First, I think it would be advisable to move the Buf traits into a separate library from bytes. The bytes library has two key components: the Bytes/BytesMut types and the Buf/BufMut traits. By exposing these traits in the interface of tokio, the semver store of the bytes crate is linked intimately with the semver story of tokio. I think you should create a new low level crate for the trait definitions, while allowing the bytes crate the flexibility to make breaking changes more regularly.

It is much easier for users to navigate breakages of libraries that define types than libraries that define traits, because traits are a contract between multiple users (for example, between tokio and its users), and everyone must agree on the definition of the contract. In contrast, users can have multiple definitions of the Bytes types in their application without significant difficulties.


Second, I think that the standard library is pretty likely to adopt the AsyncRead/AsyncWrite traits from futures essentially as is, and I don't think its likely that it will adopt these changes. I want to explain why I have that view.

Before I do, I should say I think its fine for tokio to diverge: the bridge from std types to tokio will be cheap, because its just a virtualization of the argument, whereas the bridge from tokio to std will probably require a deep copy. But I expect the bridge to mainly work in the std -> tokio direction (i.e. types implementing std's traits used with interfaces expecting tokio's traits), so hopefully that's not such a big deal. My point is that I'm not trying to convince you to adopt futures' definitions in tokio, just explain why I think those will be the more likely choice for std.

Vectored operations

In my opinion, the solution exposed by futures is adequate: have an additional write_vectored method. This discussion has raised the fact that this method could be intercepted by a middleware layer which has failed to implement it properly. While its good to create APIs that discourage bugs, we can't eliminate all bad code, and I'm not really convinced this is a problem that can't be solved socially.

There's a funnel effect to encountering this bug: the user has to both want to perform a vectored operation and be using middleware which is implemented incorrectly. This makes the problem, in my opinion, quite niche, and the best solution is to send PRs to bad libraries that aren't supporting vectored operations.

In extremis, we could do exactly what we did with flush already: make the vectored ops required to implement, to encourage users to forward them correctly. I'm not convinced even this is necessary, but I do think it would be sufficient.

In contrast, I'm not enthusiastic with the solution which bypasses the vectored methods. For example, to get 1024 vectored buffers for your rope, the leaf AsyncWrite would have to allocate space on the stack for 1024 iovecs - several pages of space. It would have to do this unconditionally, meaning a huge stack frame (multiple pages) for every write. Or it would have to choose a smaller number, and then that rope is again not as optimal as it could be.

Though I don't know the whole backstory here, it sounds to me like someone hit a first mover problem: no one had been doing vectored IO in Rust, and so no one had written code to handle it properly. This resulted in frustrating performance problems. But that's a sign that our ecosystem is still maturing; once we have a robust ecosystem that has been battle tested in production by many users, these problems will be increasingly fleeting. I see no reason this problem can't be solved over time with PRs and issues in key crates.

Uninitialized Memory

This one is heavier. I have two opinions here; I expect they're a bit controversial, each with different groups of people. And maybe I'm wrong.

My first claim is that the cost of zeroing memory is insignificant for any well designed long running network application, which should have a buffer pool that amortizes the cost of initializing memory into insignificance over the lifetime of the application. I can see this being untrue if you have extremely spiky loads with sudden brief periods of intense activity, and you don't want to maintain resident memory enough to handle those spikes at all times, but I don't think that's normal.

My second claim is that we are eventually going to provide an API for LLVM freeze. This has a certain air of inevitability to me, despite the risks of using freeze. Yes, its true that if you freeze an uninitialized buffer it could leak secrets, but Read implementations that don't just write into the buffer are pathologically wrong and not being used in practice. Therefore, simply freezing an uninitialized buffer before passing it to an AsyncRead interface is sufficient, for people who can't afford to zero their memory, to solve the problem. No change to the trait interface needed.

Downsides of this API

I want to highlight downsides of the API that are reasons I would be uncomfortable implementing this API in std. These are not blockers, and with enough upside I could be convinced to override them, but as I've elaborated I think the arguments in favor of this API over the one in futures are weak.

  • I think divergence between the non-async and async traits should be as minimal as possible. I have a strong preference for diverging only as necessary to make them asynchronous, and not for any other reason. Its really great if a user looking at the divergence can get an idea of the underlying futures model; other changes are a distraction that will confuse people about the differences between async and blocking IO (e.g. "are vectored operations somehow special for async IO?")
  • Trying to define an abstract interface for all buffers, which covers not only slices with different ownership semantics but also ropes and so forth, seems really fraught with stability issues, and I have a hard time imagining adding an interface like that to std.
  • The virtual call is not free. It's not that expensive either, I expect, but it does need justification.

Conclusion

Pulling all of that together, I'm more optimistic about the API in futures than one which uses a different buffer type argument. In general, I'd say I'm very bullish that the API we have in futures now is the one we will add to std.

(I'm also not enthusiastic about other changes people are considering from the futures trait, like changes to make it possible to implement read/write with an async fn, or changes to share buffers with completion based IO, but I've left those out of this post since they aren't the proposal here.)

@seanmonstar
Copy link
Member Author

Thanks @withoutboats for the well-thought post!

I think it would be advisable to move the Buf traits into a separate library from bytes.

We've considered this, and it very well could happen! If worth considering more, we should open a separate issue to discuss the trade-offs.

it sounds to me like someone hit a first mover problem

I can offer my own experience: even though I have a very deep understanding of how the vectored APIs work, I have been bitten by this problem, and thus I'm motivated to fix the API, since it currently is too easy for a veteran to mess up, thus I must assume others would also have this problem.

I've written a lot of middle layers that implement AsyncRead/AsyncWrite: some as simple new types, some to add in metrics, some as an enum of multiple possible IO types. I've at times forgotten to forward these methods (though once being bitten a few times, I think I always remember now). I've also forgotten to notice in code reviews when someone else doesn't forward.

For example, to get 1024 vectored buffers for your rope, the leaf AsyncWrite would have to allocate space on the stack for 1024 iovecs - several pages of space.

We could add Buf::vectored_hint() -> Option<usize> or similar to allow AsyncWrites to not need big stack buffers unconditionally, as one solution. This doesn't seem like much of an issue in my experience.

My first claim is that the cost of zeroing memory is insignificant for any well designed long running network application [..]

My experience leads to me to believe differently.

[..] which should have a buffer pool that amortizes the cost of initializing memory into insignificance over the lifetime of the application.

It seems in many cases, using something like jemalloc provides a much better optimized "pool" than one an application may write. But certainly, there are cases where small local pools are still better.

I have a strong preference for diverging only as necessary to make them asynchronous, and not for any other reason.

I get that, it makes plenty of sense!

I lean more towards picking the best design for the most common cases, even if std has a different one. I don't think the issues I've outlined in this proposal are unique to futures-io; I feel std::io::{Read, Write} have the same shortcomings.


I think my personal preference is to make sure there is a design that allows addressing these issues, even if std didn't want them. For those that want the improved ergonomics and performance, it's within reach. And those that want compatibility should still be able to plug in.

@carllerche
Copy link
Member

@withoutboats Thanks for the thoughtful response. I will take more time to digest it (lot going on today).

I do want to add a quick note for your consideration re: vectored operations. While "forgetting" to implement the vectored functions are a problem, I believe the bigger problem is that in cases where the implementation cannot implement the function (no vectored capabilities), simply forwarding the call to the non vectored function is not sufficient. I go in more detail in this comment.

@abonander
Copy link
Contributor

abonander commented Nov 26, 2019

As far as forgetting to forward methods goes, I still think a warn-by-default lint would be a fine solution. I would prefer it to go directly into rustc though, otherwise you just move the problem to remembering to set up Clippy. The other reason I think it belongs in rustc is so it can be applied to, e.g. Iterator::size_hint() as well.

I'd gladly take charge on writing and implementing an RFC if we decide that's a good way to go.

@erincandescent
Copy link

My first claim is that the cost of zeroing memory is insignificant for any well designed long running network application, which should have a buffer pool that amortizes the cost of initializing memory into insignificance over the lifetime of the application. I can see this being untrue if you have extremely spiky loads with sudden brief periods of intense activity, and you don't want to maintain resident memory enough to handle those spikes at all times, but I don't think that's normal.

I want to vehemently disagree with this approach for a few reasons:

  • As has been noted, Jemalloc often offers higher performance than any ad-hoc caches implemented by the application
  • You are introducing additional layers of memory pooling, which increases system wide memory pressure and reduces the ability of applications to return unused memory to the system
  • In doing so, you are throwing away the ability of the memory allocator to detect and mitigate certain classes of bugs. While these are mostly the kinds of bugs that safe Rust also defends against, defense in depth is always a useful property of a system

That is to say, while accessing the contents of a reused buffer may be safe in a Rust-the-language sense (i.e. does not exhibit or permit any memory unsafety violations), it is not safe in a general systems sense. After all, famously this is the cause of both Heartbleed and Cloudbleed

Note also in particular Heartbleed went undetected for so long because OpenSSL implemented its own buffer pool. I suspect that the full degree of Heartbleed (i.e. leaking private keys) is near impossible in Rust, as it seems unlikely that anyone would be loading those via such a buffer pool, but it still offers lots of ability to leak other private data.

I feel APIs should be designed to be misuse resistant. Ideally, the route of least resistance is not just memory-safe, but also makes it as difficult as possible to accidentally cause other kinds of security vulnerabilities.

It worries me that with the futures-rs approach and buffer pools, it only takes a corner case bug in some read filter (e.g. a deframer or a decompressor) to cause a catastrophic data leak. Even with use of buffer pools, the approach Tokio is looking at implementing still resists misuse in a way that would leak secret data.

@DoumanAsh
Copy link
Contributor

DoumanAsh commented Nov 27, 2019

I feel APIs should be designed to be misuse resistant.

There is no need to provide care for cases when user intentionally passes unitialized memory though.
It is user responsibility to deal with UB, not the interface.

@RalfJung
Copy link

My second claim is that we are eventually going to provide an API for LLVM freeze. This has a certain air of inevitability to me, despite the risks of using freeze. Yes, its true that if you freeze an uninitialized buffer it could leak secrets, but Read implementations that don't just write into the buffer are pathologically wrong and not being used in practice. Therefore, simply freezing an uninitialized buffer before passing it to an AsyncRead interface is sufficient, for people who can't afford to zero their memory, to solve the problem. No change to the trait interface needed.

Note that, as far as memory-backed buffers go, this is not just a matter of providing an API to something LLVM already has. The freeze that recently landed in LLVM freezes a single SSA value, not a buffer in memory.

The issue with leaking secrets is less the Read trait in particular and more the fact that once we let this cat out of the bag, freeze becomes a primitive that will be used in other places in Rust, with less control and less consideration and thus a higher risk of someone getting it wrong. That is certainly not a blocker -- some people possibly misuing an API is not a good enough argument to prevent other people from using it properly. But I don't think your argument really helps to alleviate the main concern about freeze.

I am a bit worried about implicitly deciding to adopt freeze as a side-effect of API design, as opposed to consciously making that decision.

@withoutboats
Copy link

I probably won't be able to engage much more over the next few weeks, but I want to emphasize that I think its fine for tokio to have traits that implement a different interface which you think better suits your users. My arguments mainly have to do with std - and there, I think the symmetry with the std::io::{Read, Write} is really paramount. I feel we'd need an overwhelming consensus that the interface those traits expose is inadequate, and a plan for how to resolve the problem for blocking IO as well, in order to give up the symmetry between the two.

So I'd mainly encourage you with that in mind to think about the costs of the conversion between the two APIs and plan for the possibility that users will have to do that conversion, hopefully as cheaply as possible.

@Nemo157
Copy link
Contributor

Nemo157 commented Nov 27, 2019

Before I do, I should say I think its fine for tokio to diverge: the bridge from std types to tokio will be cheap, because its just a virtualization of the argument, whereas the bridge from tokio to std will probably require a deep copy. But I expect the bridge to mainly work in the std -> tokio direction (i.e. types implementing std's traits used with interfaces expecting tokio's traits), so hopefully that's not such a big deal.

I would personally expect the opposite. General middleware libraries (e.g. compression, parsers, etc.) would be defined generically over std's traits, then used in applications that pass them Tokio's implementations. But that also meshes well with your expectation that this direction is the cheap conversion, so maybe I'm somehow reading your parenthetical expansion backwards?

@erincandescent
Copy link

I feel APIs should be designed to be misuse resistant.

There is no need to provide care for cases when user intentionally passes unitialized memory though.
It is user responsibility to deal with UB, not the interface.

In the Read interface, the caller provides a buffer and expects (at least as far as is defined by the output/return type) the callee to fill it. That is to say, part of the expected contract is that the callee overwrites the entire buffer (or, at least, buf[0..n] where n is the return value/etc)

This isn't actually about uninitialised memory as such, but rather about arbitrarily initialised memory which potentially contains secrets which need to be overwritten.

@withoutboats
Copy link

I would personally expect the opposite. General middleware libraries (e.g. compression, parsers, etc.) would be defined generically over std's traits, then used in applications that pass them Tokio's implementations. But that also meshes well with your expectation that this direction is the cheap conversion, so maybe I'm somehow reading your parenthetical expansion backwards?

More like I wrote it backwards! This gets confusing 😅

@kabergstrom
Copy link

kabergstrom commented Dec 2, 2019

The proposed API necessitates a redundant copy of all IO bytes because it cannot guarantee that the user-provided buffers are page-aligned and locked to physical memory, as is required for DMA transfers. For maximum performance, either the IO layer should be responsible for allocating and managing buffers according to its requirements for DMA or these requirements should be communicated to the user so that it can be properly allocated, and perhaps initialized by the IO layer.

If this proposal aims to maximize possible performance, something like this would be more appropriate:
fn poll_read(&mut self) -> Poll<io::Result<IOBuffers>> where Self: Sized;

The IOBuffers in this case would be a number of buffers (probably a linked list) allocated by the IO layer's internal buffer pool, and returned to that pool on drop, maybe using a form of indirect call for the drop. Preferably it would be !Send.

WinSock Registered I/O provides an API for "pre-registering" buffers and thus reducing overhead. linux AIO has certain alignment requirements and aims to perform DMA transfers directly with provided buffers. io_uring also has similar options.

@najamelan
Copy link
Contributor

the IO layer should be responsible for allocating and managing buffers

I've also been thinking about this. I'm afraid the general reaction will be that this completely inverses the very much adopted pattern in rust that producers always take a reference to a buffer. It's a consumer driven pattern where the consumer can chose to reuse that buffer once done with the data.

I have been thinking about this when intermediate layers change the type of the data. A websocket message in tungstenite is an enum with a variant (for binary data):
Message::Binary(Vec<u8>). Now implementing an AsyncRead/Write over websocket implies a needless copy in the AsyncRead impl to put the data from the vector in the buffer passed into poll_read.

The current design of std::io::Read and Write is described here: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#revising-reader-and-writer
I couldn't find any discussion of possible producer allocated buffers.

@Nemo157
Copy link
Contributor

Nemo157 commented Dec 5, 2019

There is a trait available for IO-layer allocated buffers: AsyncBufRead, for the write side I have been using a similar private AsyncBufWrite trait (though only with one concrete implementation of a BufWriter adaptor wrapping a non-buffered provided writer).

Has there been any investigation into whether these changes should also result in changes to the AsyncBufRead API? It seems like that could also maybe use something like fn poll_fill_buf(...) -> Poll<io::Result<&dyn Buf>>.

@Ralith
Copy link
Contributor

Ralith commented Dec 5, 2019

Is there the same need for heterogeneous buffer types in I/O-layer-allocation schemes?

@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 18, 2019

Why can’t we just split the vectored API into another set of traits? Like AsyncReadVectored and AsyncWriteVectored. This would be good cause we’d be able to explicitly manage our requirements on the vectored IO support at the type system level.

@MOZGIII
Copy link
Contributor

MOZGIII commented Dec 18, 2019

Given rust aims to be a system programming language, it’s always hard for see API like the current AsycRead and AsyncWrite as acceptable - it’s impossible to manage the expectations on the presence of the real vectored IO support. Same applies to the the stabilized vectored IO for Read Write. Is there a rationale for why was it concluded that having a fallback logic for vectored IO works for everyone, and that should be the default (built into the trait itself)?

@cdbattags
Copy link

cdbattags commented Jan 29, 2020

Any updates on this? And where do we currently stand on using tokio and async_std together?

See actix/examples#228 (comment) and https://github.com/actix/examples/pull/228/files#diff-a8ac72c878c236b5b5236b768a9e9c1aR17

@carllerche
Copy link
Member

Thanks for all the discussion. Things have changed since this PR so I am going to close it and work on opening a new proposal factoring in this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.