API Proposal: Adding ArraySegment-based methods to the System.IO namespace #16635

jamesqo · 2016-03-08T01:17:09Z

Background

To avoid unnecessary allocations, lots of code that works with Streams and I/O passes around a 'portion' of an array containing the relevant data instead of copying it to a new one. Since .NET does not yet support multiple return values, this is commonly represented as an ArraySegment<T>, where T is the type of the element contained in the array. Unfortunately, the System.IO namespace does not have overly sophisticated support for ArraySegments (aside from the recently-added TryGetBuffer for MemoryStream), forcing people who use them to 'unravel' them like this:

stream.Write(segment.Array, segment.Offset, segment.Count);

I propose adding convenience methods to the existing System.IO APIs that will make this less verbose, and prevent accidents like e.g. the programmer mixing up Offset and Count.

Proposed API

namespace System.IO
{
    public class BinaryReader : IDisposable
    {
        public int Read(ArraySegment<byte> buffer);
        public int Read(ArraySegment<char> buffer);
    }

    public class BinaryWriter : IDisposable
    {
        public void Write(ArraySegment<byte> buffer);
        public void Write(ArraySegment<char> buffer);
    }

    public class MemoryStream : Stream
    {
        public MemoryStream(ArraySegment<byte> buffer);
        public MemoryStream(ArraySegment<byte> buffer, bool writable);
        public MemoryStream(ArraySegment<byte> buffer, bool writable, bool publiclyVisible);
    }

    public class Stream : IDisposable
    {
        public int Read(ArraySegment<byte> buffer);
        public Task<int> ReadAsync(ArraySegment<byte> buffer);
        public Task<int> ReadAsync(ArraySegment<byte> buffer, CancellationToken cancellationToken);
        public void Write(ArraySegment<byte> buffer);
        public Task WriteAsync(ArraySegment<byte> buffer);
        public Task WriteAsync(ArraySegment<byte> buffer, CancellationToken cancellationToken);
    }

    public abstract class TextReader : IDisposable
    {
        public int Read(ArraySegment<char> buffer);
        public int ReadBlock(ArraySegment<char> buffer);
        public Task<int> ReadBlockAsync(ArraySegment<char> buffer);
    }

    public abstract class TextWriter : IDisposable
    {
        public void Write(ArraySegment<char> buffer);
        public Task WriteAsync(ArraySegment<char> buffer);
        public void WriteLine(ArraySegment<char> buffer);
        public Task WriteLineAsync(ArraySegment<char> buffer);
    }
}

The text was updated successfully, but these errors were encountered:

justinvp · 2016-03-08T03:57:01Z

Before adding a bunch of additional ArraySegment<T> helper overloads in System.IO, it might be prudent to wait until Span<T>/ReadOnlySpan<T> lands in CoreFX, and then adopt that in System.IO along these lines.

dotnet/roslyn#120
https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/Span.cs
https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/ReadOnlySpan.cs

KrzysztofCwalina · 2016-03-09T01:27:57Z

I think it's fair to say that we will either add support for slices or arraysegment. Let's hope that for slices :-)

jamesqo · 2016-03-10T01:25:01Z

@justinvp @KrzysztofCwalina How would that work in terms of the existing API though? As far as I can see, Span<T> does not expose the underlying array it's based on.

KrzysztofCwalina · 2016-03-10T01:48:46Z

I think we will need unsafe API on Span to access T*

benaadams · 2016-09-05T00:00:58Z

Return ValueTask<int> rather than Task<int> take the opportunity since the return types on the current methods can't be changed.

jamesqo · 2016-09-05T01:08:06Z

@benaadams These are just wrappers over the existing methods.

benaadams · 2016-09-05T01:18:10Z

@jamesqo Task<int> will convert to ValueTask<int> so they can still be a wrapper; however if they are virtual then derived classes can override them to return actual ValueTask<int>

karelz · 2016-10-11T21:44:32Z

UPDATE (old 'obsolete' ask/suggestion deleted):
Please wait on boader design discussion around Span<T> and ArraySegment to happen first.

jamesqo · 2016-10-12T01:45:13Z

@karelz, is it possible to Span<T> efficiently to a (T[] array, int offset, int count)? If not then it may not be possible to update this proposal w/o making all of the methods virtual and performing copying by default like is done is Encoding, which would be rather unfortunate...

@jkotas

karelz · 2016-10-13T18:44:58Z

@KrzysztofCwalina will kick off much broader public discussion around design and usage of Span<T>, ArraySegment, etc. early next week. That design discussion will probably go on for a while.
Until there is consensus from that discussion we will put on hold all API additions which are likely to be impacted by the outcome of that discussion (like this one). Please stay tuned.

davidfowl · 2017-03-20T06:10:04Z

Span may not really work well for some of these APIs because of the stack only nature of it (specifically the async APIs)

ayende · 2017-04-23T07:10:05Z

@davidfowl but if I have some unmanged memory, I really want to just pass a span to it rather than cipu to managed array. It isn't allocated on the stack

stephentoub · 2017-06-22T18:18:45Z

I'm going to close this in favor of dotnet/corefx#21281. @jamesqo, if anything would be covered by this issue that's not covered functionally by the other, please comment over there. Thanks!

ddotlic · 2019-11-28T08:59:08Z

@stephentoub Let me try to convince you that the item you linked to and all the - very useful! - changes made to the .NET APIs WRT Span and friends (Memory etc.) have almost nothing to do with this proposal 😉

An example: am loading bytes from a Stream (it's a file, but no matter) and processing them (decryption and decompression); this is a .NET Standard 2.1 class library. Am trying to minimize allocations. This is somewhat hard because lots of FX code relies on Stream API, though the changes in dotnet/corefx#21281 help quite a bit.

Ultimately, everything works OK-ish except that MemoryStream I am writing into (will come back to this) allocates way too much because when it doesn't have enough space it doubles and copies.

Sample code:

private static byte[] Decompress(byte[] data) {
    using var output = new MemoryStream();
    using var input = new MemoryStream(data);
    using var gzip = new GZipStream(input, CompressionMode.Decompress);

    zip.CopyTo(output);
    output.Flush();
    return output.ToArray();
}

In practice, I don't need to write into this MemoryStream - the APIs kinda force my hand, that's the only reason. In fact, both byte[] on input and byte[] on output of this method are wasteful.

I could do things differently: the byte[] that is sent to this method was read from a Stream using a "classic" Read(byte[]) overload. I could have used Read(Span<byte>) multiple times (I don't necessarily know the length of the source stream) and have in my possession a list of Span<byte> (or rather list of Memory<byte> because I'd use MemoryPool<byte>.Shared.Rent)

Exactly in the same way in the middle of the above method, instead of CopyTo I could have used Read(Span<byte>) on gzip (in a loop) until I read everything.

In both cases, I now have blocks of memory which logically represent a single contiguous array of bytes. I could create a single block and copy all the bytes, but that's just wasteful.

Since most FX APIs still only work with Streams, it would be ideal if I could "wrap" this list of Memory<byte> with a MemoryStream, very much like the original proposal in this item. Then the rest of the app reading from this MemoryStream would incur close to zero further allocations (and when MemoryStream is disposed, I could return the rented arrays to the pool), assuming the reading code also used Read(Span<byte>) overload.

I've looked almost everywhere, and have seen a few things which could help me, but not quite:

System.IO.Pipelines seems useful, but I'd need adapters (not available?) for Stream wrapping pipe reader; additionally, it looks like a lot of code for something which can be made simpler
Along the same lines, it looks like ReadOnlySequence and BufferSegment could be used to partially model what I have above but only in isolation - there are no classes "virtualizing" a list of BufferSegment or a single ReadOnlySequence(it models several segments but its API is frankly very difficult to understand) such that I can give the virtual Span<byte> or Memory<byte> to other .NET APIs which expect more or less flat array of bytes (if I could do that then I wouldn't need a new version of MemoryStream, the existing one would work fine)
I looked at a lot of extension methods, source of FX and CoreCLR and cannot find anything.

So... am I missing something? It seems to me that all that I need would be an implementation of MemoryStream - read only variant - which would accept in its constructor a list of Memory<byte> or a list of BufferSegment. That could cover a lot of scenarios when combined with existing APIs. Or something which could facade into a single Memory<byte> from a list of the Memory<byte> then use normal MemoryStream.

Is there something in FX that I have missed? I'm sure I have, please advise 😊

Thanks for reading and thanks for the great work everyone is doing in .NET Core!

ddotlic · 2019-11-28T11:37:36Z

@stephentoub Please disregard all references to BufferSegment: turns out it was only used as an example in @davidfowl article 😕 it's not an FX type...

davidfowl · 2019-11-30T18:24:05Z

I think a better solution would be to expose the innards of those Stream implementation as flat APIs (we've discussed this internally for a while now).

public static class GZip
{
    public OperationStatus TryDecompress(ReadOnlySpan<byte> input, Span<byte> output, out int bytesWritten);
}

Something like the above, I haven't thought about it deeply though (it might need to be non static since zlib is stateful)

stephentoub · 2019-12-01T17:57:46Z

Let me try to convince you

Thanks. Almost all of the APIs proposed in this thread are addressed, in that they're all Write/Read methods that take ArraySegment<T>, which is effectively superceded by Memory<T>, and are addressed as mentioned by https://github.com/dotnet/corefx/issues/21281. The one set you're commenting on are the MemoryStream constructors, which actually are also mentioned by that issue (see the section titled "System.IO.BufferStream and System.IO.ReadOnlyBufferStream"... that issue was created when Memory<T> was called Buffer<T>). That portion of the issue was split out into https://github.com/dotnet/corefx/issues/22404, which is still open; feel free to provide feedback on it there. There is also https://github.com/dotnet/corefx/issues/21380 about array pooling with MemoryStream.

ddotlic · 2019-12-02T08:28:05Z

@davidfowl Of course, having a more "modern" API would be much better than struggling with the aging Stream (we are all well aware of its shortcomings) but the reality is that a crapload of code, including FX, still works "best" (meaning most functionality provided out-of-the-box) with the Stream, hence my comments. It's good to know that folks inside Microsoft are aware of this (tiny) aspect of the FX and are looking into ways to improve it. You say you've discussed internally, are there any "in the open" discussions the community at large may contribute to?

@stephentoub Yes, after I wrote all that I stumbled upon dotnet/corefx#21380 which is much more fitting for what I'm trying to do here, in the absence of pervasive changes to the FX along the lines of @davidfowl proposal above. Sorry for not finding that other item sooner. In fact, when I think about it a bit more - the dotnet/corefx#22404 looks exactly like what I would expect to have one day - one stream to read values from and other to write into; in practice most of my memory streams are either read-only or write-only. I do share @KrzysztofCwalina sentiment though - how the hell are we supposed to name this thing? 😉I actually have a working implementation for portions of both items you linked to; I have some concerns which I will raise on there.

Thank you both for your comments clearly made in your own time, over the WE, it's much appreciated!

stephentoub assigned KrzysztofCwalina Mar 8, 2016

karelz unassigned KrzysztofCwalina Oct 11, 2016

stephentoub closed this as completed Jun 22, 2017

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.1.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Jan 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Proposal: Adding ArraySegment-based methods to the System.IO namespace #16635

API Proposal: Adding ArraySegment-based methods to the System.IO namespace #16635

jamesqo commented Mar 8, 2016

justinvp commented Mar 8, 2016

KrzysztofCwalina commented Mar 9, 2016

jamesqo commented Mar 10, 2016

KrzysztofCwalina commented Mar 10, 2016

benaadams commented Sep 5, 2016

jamesqo commented Sep 5, 2016

benaadams commented Sep 5, 2016

karelz commented Oct 11, 2016 •

edited

Loading

jamesqo commented Oct 12, 2016

karelz commented Oct 13, 2016

davidfowl commented Mar 20, 2017

ayende commented Apr 23, 2017

stephentoub commented Jun 22, 2017

ddotlic commented Nov 28, 2019

ddotlic commented Nov 28, 2019

davidfowl commented Nov 30, 2019

stephentoub commented Dec 1, 2019

ddotlic commented Dec 2, 2019 •

edited

Loading

API Proposal: Adding ArraySegment-based methods to the System.IO namespace #16635

API Proposal: Adding ArraySegment-based methods to the System.IO namespace #16635

Comments

jamesqo commented Mar 8, 2016

Background

Proposed API

justinvp commented Mar 8, 2016

KrzysztofCwalina commented Mar 9, 2016

jamesqo commented Mar 10, 2016

KrzysztofCwalina commented Mar 10, 2016

benaadams commented Sep 5, 2016

jamesqo commented Sep 5, 2016

benaadams commented Sep 5, 2016

karelz commented Oct 11, 2016 • edited Loading

jamesqo commented Oct 12, 2016

karelz commented Oct 13, 2016

davidfowl commented Mar 20, 2017

ayende commented Apr 23, 2017

stephentoub commented Jun 22, 2017

ddotlic commented Nov 28, 2019

ddotlic commented Nov 28, 2019

davidfowl commented Nov 30, 2019

stephentoub commented Dec 1, 2019

ddotlic commented Dec 2, 2019 • edited Loading

karelz commented Oct 11, 2016 •

edited

Loading

ddotlic commented Dec 2, 2019 •

edited

Loading