-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial Span/Buffer-based APIs across corefx #22387
Comments
Not sure if it should be considered as part of this effort or tracked separately, but it'd be nice if there was a non-allocating way to get an I suggest two new methods that match the naming/style of other methods on namespace System.Text
{
public class Encoding
{
public virtual int GetPreambleByteCount();
public virtual int GetPreambleBytes(Span<byte> bytes);
}
}
We could provide a default implementation based on the existing |
You can have just one non-allocating |
Nice. But what would we call it? We can't overload on return type. Edit: Maybe a property? |
What about
I see
These have What about |
Seems worthwhile. I've updated the proposal with it. Presumably the base implementation would just be: public virtual ReadOnlySpan<byte> Preamble => GetPreamble(); and then derived implementations would do something like (e.g. for UTF8): private static readonly byte[] s_preamble = new byte[] { 0xEF, 0xBB, 0xBF };
...
public override ReadOnlySpan<byte> Preamble => _emitUTF8Identifier ?
new ReadOnlySpan<byte>(s_preamble) :
ReadOnlySpan<byte>.Empty; But that means that a misbehaving consumer could get a reference to the contents via |
Yes, I meant to have both Parse and TryParse on these, but apparently didn't transfer them over properly from my notes. Fixed.
@justinvp, why do you ask about it? You think it's critical to have across the board? I'd left it out intentionally as I've not seen it used frequently or in hot path situations, and it could always be added subsequently, but maybe I'm misinformed?
The implementations today are already allocation-heavy. If/when we can improve the implementations to be better, and if they're used on paths where it matters, then I could see adding them. But right now it doesn't seem like cost of getting a string from a |
Yes, that's why the method has the Dangerous prefix. |
Really not a different situation than different than |
Right. |
Ok. It just feels different than some of the other cases we've discussed, as it's not just allowing someone to mutate local state that was supposed to be read-only, but global state that's shared by others. But I'm ok with it as long as we understand the consequences. And I realize it's just an extension of some of the other cases we've discussed.
That requires unsafe code and pointers, at which point all bets are off. This doesn't. |
Are there scenarios where all bets aren't off when you use APIs with |
There's a difference between a naming convention and having unfettered ability to write to any arbitrary location in memory. I'm not arguing we should do something differently, just pointing out there is a meaningful difference between the two, at least to me. |
The unsafe keyword just prevents you from using unsafe "all bets are off" language features. It does not prevent you from using equivalent library APIs. It has been the case since .NET Framework 1.0. dotnet/roslyn-analyzers#972 is a suggestion on how to fix this hole.
Marshal APIs (and many other more subtle APIs) let you do that without unsafe keyword since .NET Framework 1.0.
Yes, it has been the convention that we have been using recently. It is not new. It has been used (inconsistently) in the past - e.g. for APIs like |
Yeah, good point. |
The reason I asked about
I don't disagree. The reason I asked was for consistency with the other |
@KrzysztofCwalina, @terrajobst, how are we thinking about EDIT: I see we have some extensions related to this in System.Memory in corefx. We'll likely need some of those extensions moved to corelib to enable implementations there, in particular support for faster copying and comparing. |
(I edited the initial post to tweak some of the signatures based on offline feedback from @KrzysztofCwalina , related to my open issues question around status reporting. It'd be great if folks could look at the specifics of the APIs and whether they meet needs regarding that.) |
We also need to add methods to convert from primitive types to var lengthString = payload.Length.ToString(CultureInfo.InvariantCulture);
var utf8LengthBytes = Encoding.UTF8.GetBytes(lengthString);
Instead, it would be great if we could do something like this: var buffer = new byte[10]; // max int
int written = payload.Length.CopyTo(buffer, CultureInfo.InvariantCulture, Encoding.UTF8); Format directly into a public struct Int32
{
public int CopyTo(Span<byte> bytes, CultureInfo cultureInfo, Encoding encoding);
} |
Thanks, @davidfowl. I agree that was an omission from my proposal and we should definitely include such core formatting APIs. A few thoughts/questions:
|
It's definitely more composable but less efficient when I just want encoded bytes (not chars) without the intermediate step. Maybe the UTF8 focused formatters in corefxlab work better for this scenario since they are particularly optimized for UTF8 bytes (the web pretty much decided on UTF8 http://utf8everywhere.org/) |
Yeah, I think that's probably the case. I'm inclined to say the core types in coreclr/corefx should start with the constituent operations, and we start with the more efficient/combined ones in the more-advanced-optimized-for-UTF8-and-every-last-ounce-of-perf-specialized-parsing/formatting library. @KrzysztofCwalina, opinion? |
Will there be an easy way to parse to types based on a length instead of number of expected bytes based on types? Would it be possible to overload with a "length" or "numberOfBytes" parameter to use when parsing similar to the Invariant types (e.g. InvariantUtf8) CoreFxLab? |
@shaggygi, I'm unclear what you're asking for. Given, e.g. public struct Int16
{
public bool TryParse(ReadOnlySpan<char> chars, out short result, NumberStyles style = NumberStyles.Integer, NumberFormatInfo info = null);
…
} what would the value of an overload that takes a length be? Why wouldn't you just slice the chars? Or are you specifically talking about skipping chars entirely and parsing from bytes to types based on a specific encoding? For that I'm inclined to say that such use is the domain of the corefxlab libs, at least for this first go-around with getting a core set of APIs added to the core types. |
@stephentoub I might have overlooked the Invariant APIs in CoreFxLab. There are cases today where I have a byte[] and need to read a number of bytes into a certain type (e.g. uint). Being 4 bytes for this type, it seems like I needed 4 bytes to convert to/from the type and byte[]. I was thinking the Invariant APIs included a "length" parameter so you could specifically tell how many bytes to use during the conversions. For example, there is a protocol I work with that includes number of bytes that are included in the packet further downstream to process to read a particular type. Since I know how many bytes, I could pass that into the method to parse into a particular type. I guess I could cast and use a ushort if I needed to use 2 bytes instead of 4 in this example. So if my assumption was correct on the Invariant APIs, I was suggesting to add an overloading method to this topic for the TryParse and ToXXX methods. |
@shaggygi, sorry, I'm still not clear on what you're suggesting. Can you propose a complete API signature and an example usage of that so that I can better understand the scenario? Sorry for being dense. |
FWIW, it's not just the web. Almost all of the game- and graphics-related native libraries also use UTF8 exclusively, even on Windows. OpenGL, Vulkan, SDL2, Assimp, imgui are some examples. |
I'm walking a potentially fine line here. We've got all of these formatters and parsers in the works in corefxlab, and they're focused on and tuned for these UTF8 scenarios, while also employing powerful but more complex APIs. I could see us taking a few routes:
In this issue I've taken the approach of (2), which essentially boiled down to looking at existing APIs and creating overloads that created the smallest meaningful mapping between arrays/strings/pointers and spans/buffers. We could of course add additional UTF8-focused overloads, but I do think they would be additive rather than replacements, because we do want those "easy" situations like I mentioned to remain easy, and then as we add these we very quickly end up with (3). Maybe that's where we'll eventually end up, but I'm very hesitant to start there. Reasonable? |
The original memory of the string object.
You wrote "in order gain value from all these overloads one would require to allocate the Span backed memory region from a pool of heap memory", which is explicitly stating that these APIs are tied to a specific Span use-case, and I'm saying that's not true. For example, to pick an example at random (pun intended), if I want 100 random bytes, today I might write: Random r = ...;
var bytes = new byte[100];
r.Next(bytes); but with the exposed API, I can instead write: Random r = ...;
byte* p = stackalloc byte[100];
var bytes = new Span<byte>(p, 100);
r.Next(bytes); and now I have my hundred bytes, but I didn't allocate any memory, and I didn't use a heap-based pool. |
@stephentoub, is |
If you want a smaller set, you don't even have to stackalloc and can address of a local var/struct e.g. for 8 random bytes Random r = ...;
ulong l;
var bytes = new Span<byte>(&l, sizeof(ulong));
r.Next(bytes); |
Yes, the name keeps going back and forth: |
Thanks.
Those indeed show a clear use case... I suggest including within the text describing the motivation.
I would also suggest to state this as one of the motivations.
So I assume constructing a String with this ctor will allocate a new String object which internally points to the Span backed characters? |
Actually, it would be GREAT if you could include snippets like these along each API :) |
It will allocate a new string and copy the data from the span: |
Then it's twice important to mention that a Span which is a result of a sliced String points to the original String memory, otherwise the use-case is not clear. |
@am11 Consider subscribing to dotnet/corefxlab#1653 to get notified when the spec gets updated/finalized. |
@clrjunkie The design on Span has been full of examples and I'm not sure what you're saying is lacking. This particular thread? This thread is not to document or explain Span but to track the work to be done adding Span to existing BCL APIs. |
@jnm2 I don't know what documentation you are referring to but I'm only interested in the spec which is referenced in the issue I linked to. If you had made the effort to read it you would have noticed that it is work in progress and somewhat problematic especially on the examples side. So given information about Span/Buffer is scattered all over and this particular issue is not a gantt chart but rather contains useful information about the motivations for incorporating Span within key CoreFx api's, I think what I'm saying only serves to benefit other community members who read it as many other issues link to it. |
Yes, I did, and it the spec already answers every question you've brought up here had you made the effort to read it. I'm sorry to put it like this, but this is what it looks like from the outside:
Spec: Second sentence of https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md#introduction
Spec: https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md#non-allocating-substring
Spec: allows creation from pointer https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md#api-surface
Spec: https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md#non-allocating-substring Etc. |
I'm all for examples of use cases though, so please suggest away. |
@jnm2 Sorry, I'm not a native English speaker, was that supposed to be some kind of insult? |
I'm sorry, not at all! "Please suggest away" means please make all the suggestions you can come up with. Let's collect all the major use cases! Other English idioms that mean the same thing, but some could sound like opposites 😆: "go to town," "have at it," "knock yourself out." Languages are funny things! (And thanks for confirming what I meant rather than assuming. It can be tough and it really shows a good level of maturity not always seen! Much appreciated.) |
@jnm2 Ah :)) No worries! but let's see first if the ones I suggested so far get in.. they would really get a readers notice if they are presented in the context of the motivation |
In this thread I saw question regarding |
Why not just cast the enum to/from its underlying integral type and use the corresponding BitConverter or similar methods with that? |
True, that will work. |
@stephentoub May I ask, why |
Interfaces can't be changed; adding new methods to an interface is a breaking change.
You're welcome to open an issue with a proposal. cc: @bartonjs |
@stephentoub or others, sorry in advance if the question were already asked, I've searched quite some time but couldn't find for it. I would like to know if there's an API now to initialize a Many thanks |
@nockawa, not currently publicly exposed, though you can see/copy the implementation of one here: |
@stephentoub thanks, but why it's a ReadOnly version? Nothing should prevent us of dealing with a writable |
Because for the internal purposes we needed this for, it would always be a ReadOnlyMemory. |
You could certainly wrap a writable stream around a |
With the advent of
Span<T>
andBuffer<T>
, there are a multitude of improvements we’ll want to make to types across coreclr/corefx. Some of these changes include exposingSpan<T>
/Buffer<T>
/ReadOnlySpan<T>
/ReadOnlyBuffer<T>
themselves and the associated operations on them. Other changes involve using those types internally to improve memory usage of existing code. This issue isn’t about either of those. Rather, this issue is about what new APIs we want to expose across corefx (with some implementations in coreclr/corert).A good approximation of a starting point is looking at existing APIs that work with arrays or pointers (and to some extent strings), and determining which of these should have
Span
/Buffer
-based overloads (there will almost certainly be “new” APIs we’ll want to add that don’t currently have overloads, but for the most part those can be dealt with separately and one-off). There are ~3000 such methods that exist today in the corefx reference assemblies. We’re obviously not going to add ~3000 new overloads that work withSpan
/Buffer
, nor should we. But many of these aren’t relevant for one reason or another, e.g. they’re in components considered to be legacy, they’re very unlikely to be used on hot paths where a span/buffer->array conversion would matter at all, etc.I’ve gone through the framework and identified a relatively small set of methods I believe we should start with and add together for the next release. This is dialable, of course, but I believe we need a solid set of these to represent enough mass that span/buffer permeate the stack and make sense to use in an application. All of these are cases where using a span or a buffer instead of an array makes a quantifiable difference in allocation, and thus can have a measurable impact on the overall memory profile of a consuming application, contributing to an overall improvement in performance.
System.BitConverter
BitConverter is used to convert between primitive types and bytes, but the current APIs force an unfortunate amount of allocation due to working with byte[]s. We can help to avoid much of that by adding overloads that work with
Span<byte>
, addingCopyBytes
methods instead ofGetBytes
methods, and addingTo*
overloads that acceptReadOnlySpan<byte>
instead of acceptingbyte[]
.EDIT 7/18/2017: Updated based on API review.
Separated out into https://github.com/dotnet/corefx/issues/22355 for implementation.
System.Convert
As with BitConverter, the Convert class is also used to convert arrays. Most of the members are about converting from individual primitives to other individual primitives, but several work with arrays, in particular those for working with Base64 data. We should add the following methods:
Separated out as https://github.com/dotnet/corefx/issues/22417 for implementation.
System.Random
The Random class provides a NextBytes method that takes a byte[]. In many situations, that’s fine, but in some you’d like to be able to get an arbitrary amount of random data without having to allocate such an array, and we can do that with spans. For example, here’s a case where we’re getting some random data only to then want a Base64 string from it:
https://referencesource.microsoft.com/#System/net/System/Net/WebSockets/WebSocketHelpers.cs,366
Separated out into https://github.com/dotnet/corefx/issues/22356 for implementation.
Primitive Parse methods
Related to BitConverter and Convert, it’s very common to want to parse primitive values out of strings. Today, that unfortunately often involves creating substrings representing the exact piece of text, and then passing it to a string-based TryParse method. Similarly, it's common to convert primitives to strings via a method like ToString. Instead, I suggest we add the following:
Separated out as https://github.com/dotnet/corefx/issues/22403 for implementation.
Then we should also support parsing a few common types out of
ReadOnlySpan<char>
:EDIT 7/18/2017: Updated per API review
DateTime{Offset} separated out as https://github.com/dotnet/corefx/issues/22358 for implementation.
TimeSpan separated out as https://github.com/dotnet/corefx/issues/22375 for implementation.
Version separated out as https://github.com/dotnet/corefx/issues/22376 for implementation.
System.Guid
Guids are often constructed from byte[]s, and these byte[]s are often new’d up, filled in, and then provided to the Guid, e.g.
https://referencesource.microsoft.com/#mscorlib/system/guid.cs,b622ef5f6b76c10a,references
We can avoid such allocations with a Span ctor, with the call sites instead creating a Span from 16 bytes of stack memory. Guids are also frequently converted to byte[]s to be output, e.g.
https://referencesource.microsoft.com/#mscorlib/system/guid.cs,94f5d8dabbf0dbcc,references
and we can again avoid those temporary allocations by supporting copying the Guid’s data to a span:
EDIT 7/18/2017: Updated per API review
Separated out as https://github.com/dotnet/corefx/issues/22377 for implementation.
System.String
It’ll be very common to create strings from spans. We should have a ctor for doing so:
Separated out as https://github.com/dotnet/corefx/issues/22378 for implementation.
In addition to the new ctor on String, we should also expose an additional Create method (not a ctor so as to support a generic method argument). One of the difficulties today with String being immutable is it’s more expensive for developers to create Strings with custom logic, e.g. filling a char[] and then creating a String from that. To work around that expense, some developers have taken to mutating strings, which is very much frowned upon from a BCL perspective, e.g. by using the String(char, int) ctor to create the string object, then using unsafe code to mutate it, and then handing that back. We could handle such patterns by adding a method like this:
Separated out as https://github.com/dotnet/corefx/issues/22380 for implementation.
The value of overloads like this is amplified when you start using them together. For example, here’s an example of creating a random string:
https://referencesource.microsoft.com/#System.Web.Mobile/UI/MobileControls/Adapters/ChtmlTextBoxAdapter.cs,62
This incurs the allocation of a byte[] to pass to Random, a char[] as a temporary from which to create the string, and then creating the string from that char[]; unnecessary copies and allocations.
Finally, we may also want to provide string.Format support for
ReadOnlyBuffer<char>
as an argument; although passing it as an object will box it, string.Format could write it to the generated string without needing an intermediary string created, so a smaller allocation and avoiding an extra copy.System.IO.Stream
EDIT 7/18/2017: Updated per API review.
Separated out as https://github.com/dotnet/corefx/issues/22381 for implementation.
The base
{ReadOnly}Buffer
-accepting methods can use TryGetArray to access a wrapped array if there is one, then delegating to the existing array-based overloads. If the buffer doesn’t wrap an array, then it can get a temporary array from ArrayPool, delegate and copy (for reads) or copy and delegate (for writes).The base
Span
-accepting methods can do the ArrayPool/delegation approach in all cases.We’ll then need to override these methods on all relevant streams: FileStream, NetworkStream, SslStream, DeflateStream, CryptoStream, MemoryStream, UnmanagedMemoryStream, PipeStream, NullStream, etc. to provide more efficient
Span
/Buffer
-based implementations, which they should all be able to do. In a few corner cases, there are streams where the synchronous methods are actually implemented as wrappers for the asynchronous ones; in such cases, we may be forced to live with the ArrayPool/copy solution. However, in such cases, the synchronous implementation is already relatively poor, incurring additional costs (e.g. allocations, blocking a thread, etc.), and they’re that way in part because synchronous usage of such streams is discouraged and thus these haven’t been very optimized, so the extra overhead in these cases is minimal.Once these additional methods exist, we’ll want to use them in a variety of places in implementation around corefx, e.g.
https://github.com/dotnet/corefx/blob/d6b11250b5113664dd3701c25bdf9addfacae9cc/src/Common/src/System/Net/WebSockets/ManagedWebSocket.cs#L1140
as various pieces of code can benefit not only from the use of
Span
/Buffer
, but also from the async methods returningValueTask<T>
instead ofTask<T>
, and the corresponding reduced allocations for ReadAsync calls that complete synchronously.System.IO.BufferStream and System.IO.ReadOnlyBufferStream
Just as we have a MemoryStream that works with
byte[]
and an UnmanagedMemoryStream that works with abyte*
and a length, we need to support treatingBuffer<byte>
andReadOnlyBuffer<byte>
as streams. It’s possible we could get away with reimplementingMemoryStream
on top ofBuffer<byte>
, but more than likely this would introduce regressions for at least some existing use cases. Unless demonstrated otherwise, we will likely want to have two new stream types for these specific types:The name of BufferStream is unfortunately close to that of BufferedStream, and they mean very different things, but I’m not sure that’s important enough to consider a less meaningful name.
Separated out as https://github.com/dotnet/corefx/issues/22404 for discussion and implementation.
Optional:
It’s also unfortunate that all of these various memory-based streams don’t share a common base type or interface; we may want to add such a thing, e.g. an interface that anything which wraps a
Buffer<T>
can implement… then for example code that uses Streams and has optimizations when working directly with the underlying data can query for the interface and special-case when the underlyingBuffer<T>
can be accessed, e.g.We could implement this not only on
BufferStream
andReadOnlyBufferStream
, but also onMemoryStream
,UnmanagedMemoryStream
, and even on non-streams, basically anything that can hand out a representation of its internals as buffers.Separated out as https://github.com/dotnet/corefx/issues/22404 for discussion and implementation.
System.IO.TextReader and System.IO.TextWriter
As with streams, we should add the relevant base virtuals for working with spans/buffers:
We’ll then also need to override these appropriately on our derived types, e.g. StreamReader, StreamWriter, StringReader, etc.
Separated out as https://github.com/dotnet/corefx/issues/22406 for implementation.
System.IO.BinaryReader and System.IO.BinaryWriter
As with TextReader/Writer, we also want to enable the existing BinaryReader/Writer to work with spans:
The base implementations of these can work with the corresponding new methods on Stream.
Separated out as https://github.com/dotnet/corefx/issues/22428 and https://github.com/dotnet/corefx/issues/22429 for implementation.
System.IO.File
EDIT 6/26/2017: For now I've removed these File methods, as it's unclear to me at this point that they actually meet the core need. If you're repeatedly reading and pass a span that's too small, you're going to need to read again after handling the read data, but with these APIs as defined, each read results in needing to open and the close the file again; similarly if you're using the write APIs to write a file in pieces. At that point you're better off just using FileStream and reading/writing spans and buffers. We should look to add helpers here when we can figure out the right helpers at the right level of abstraction, e.g. should we expose the ability to just create a SafeFileHandle and then have these helpers operate on SafeFileHandle rather than on a string path?
The File class provides helpers for reading/writing data from/to files. Today the various “read/write data as bytes” functions work with byte[]s, which mean allocating potentially very large byte[]s to store the data to be written or that’s read. This can be made much more efficient with spans, allowing the buffers to be pooled (in particular for file reads).System.Text.StringBuilder
StringBuilder is at the core of lots of manipulation of chars, and so it’s a natural place to want to work with
Span<char>
andReadOnlySpan<char>
. At a minimum we should add the following APIs to make it easy to get data in and out of a StringBuilder without unnecessary allocation:Separated out as https://github.com/dotnet/corefx/issues/22430.
In addition to these, we’ll also want to consider a mechanism for exposing the one or more buffers StringBuilder internally maintains, allowing them to be accessed as
ReadOnlySpan<char>
. This is covered separately by #22371.System.Text.Encoding
The Encoding class already exposes a lot of methods for converting between chars/strings and bytes, and it includes overloads for both arrays and pointers. While it might seem onerous to add an additional set for spans, the number of new overloads needed is actually fairly small, and we can provide reasonable default base implementations in terms of the existing pointer-based overloads. These methods could potentially have optimized derived overrides, but more generally they will help the platform to have a consistent feel, avoiding the need for devs with spans to use unsafe code to access the pointer-based methods.
Separated out as https://github.com/dotnet/corefx/issues/22431.
System.Numerics
Separated out in https://github.com/dotnet/corefx/issues/22401 for implementation.
Even with BigInteger being a less-commonly-used type, there are still places even in our own code where this would be helpful:
https://source.dot.net/#System.Security.Cryptography.X509Certificates/Common/System/Security/Cryptography/DerSequenceReader.cs,206
In fact, BigInteger’s implementation itself could benefit from this, such as with parsing:
https://referencesource.microsoft.com/#System.Numerics/System/Numerics/BigNumber.cs,411
System.Net.IPAddress
It’s very common to create IPAddresses from data gotten off the network, and in high-volume scenarios. Internally we try to avoid creating
byte[]
s to pass to IPAddress when constructing them, such as by using an internal pointer-based ctor, but outside code doesn’t have that luxury. We should make it possible to go to and from IPAddress without additional allocation:Even internally this will be useful in cases like these:
https://source.dot.net/#System.Net.NetworkInformation/Common/System/Net/SocketAddress.cs,135
https://source.dot.net/#System.Net.NameResolution/Common/System/Net/Internals/IPAddressExtensions.cs,21
EDIT 7/25/2017: Updated based on API review
Separated out as https://github.com/dotnet/corefx/issues/22607 for implementation.
System.Net.Sockets
This is one of the more impactful areas for spans and buffers, and having these methods will be a welcome addition to the Socket family.
Today, there are lots of existing methods on sockets, e.g. an overload for sync vs async with the APM pattern vs async with Task vs async with SocketAsyncEventArgs, and overload for Send vs Receive vs SendTo vs ReceiveMessageFrom vs… etc. I do not think we should add an entire new set of span/buffer-based methods, and instead we should start with the most impactful. To me, that means adding the following two synchronous and two Task-based asynchronous overloads for sending and receiving:
Note that I’ve put the ReceiveAsync and SendAsync overloads on the SocketTaskExtensions class as that’s where the existing overloads live. We could choose to instead make these new overloads instance methods on Socket.
In addition, the highest-performing set of APIs with sockets are based on SocketAsyncEventArgs. To support those, we should add the following:
The implementation currently stores a
byte[]
buffer. We can change that to store aBuffer<byte>
instead, and just have the existingSetBuffer(byte[])
overload wrap thebyte[]
in aBuffer<byte>
. There is an existingBuffer { get; }
property as well. We can change that to just use TryGetArray on the internal buffer and return it if it exists, or else null. TheGetBuffer
method is then there to support getting the setBuffer<byte>
, in case the supplied buffer wrapped something other than an array.EDIT 7/25/2017: Updated based on API review
Separated out as https://github.com/dotnet/corefx/issues/22608 for implementation
System.Net.WebSockets.WebSocket
Similar to Socket is WebSocket, in that it’s desirable to be able to use these APIs with finer-grain control over allocations than is currently easy or possible, and in high-throughput situations. We should add the following members:
Note that ReceiveAsync is returning a ValueTask rather than a Task (to avoid that allocation in the case where the operation can complete synchronously), and it’s wrapping a ValueWebSocketReceiveResult instead of a WebSocketReceiveResult. The latter is the type that currently exists, but is a class. I’m suggesting we also add the following struct-based version, so that receives can be made to be entirely allocation-free in the case of synchronous completion, and significantly less allocating even for async completion.
EDIT 7/25/2017: Updated per API review
Separated out as https://github.com/dotnet/corefx/issues/22610 for implementation
System.Net.Http
This area still needs more thought. My current thinking is at a minimum we add the following:
The GetBytesAsync overload avoids the need for allocating a potentially large byte[] to store the result, and the ReadOnlyBufferContent makes it easy to upload
ReadOnlyBuffer<byte>
s rather than justbyte[]
s.There is potentially more that can be done here, but I suggest we hold off on that until we investigate more deeply to understand what we’d need to plumb throughout the system. HttpClient is itself a relatively small wrapper on top of HttpClientHandler, of which there are multiple implementations (ours and others’), and to plumb a buffer all the way down through would involve new APIs and implementation in HttpClientHandler implementations. Definitely something to investigate.
Separated out as https://github.com/dotnet/corefx/issues/22612 for implementation (ReadOnlyBufferContent... we decided against the GetBytesAsync helper for now in 7/25/2017 API review)
System.Security.Cryptography namespace
Several hashing-related types in System.Security.Cryptography would benefit from being able to work with data without allocating potentially large
byte[]
s. One is HashAlgorithm, which has several methods for processing an inputbyte[]
and allocating/filling an outputbyte[]
:EDIT 7/25/2017: Updated per API review
Separated out as https://github.com/dotnet/corefx/issues/22613 for implementation
Similarly, several overloads would be beneficial on IncrementalHash:
EDIT 7/25/2017: Updated per API review
Separated out as https://github.com/dotnet/corefx/issues/22614 for implementation
Similarly, several encryption/signing related types would benefit from avoiding such
byte[]
allocations:EDIT 7/25/2017: Update per API review
Separated out of https://github.com/dotnet/corefx/issues/22615 for implementation
One other place where we’d want to be able to use span is
ICryptoTransform
. Lots of APIs, mainly CreateEncryptor and CreateDecryptor, methods return ICryptoTransform, and it has one it a TransformBlock and TransformFinalBlock method that works withbyte[]
. We’d really like support in terms ofReadOnlySpan<byte>
andSpan<byte>
. I see four options here:ISpanCryptoTransform
interface, and a second set of differently-named CreateEncryptor/Decryptor methods for creating instances of these.ISpanCryptoTransform
interface that’s also implemented by our implementations, and have consuming code query for the interface and use it if it exists.My strong preference is for (4), but that’s based on a feature that doesn’t yet exist. As such, my suggestion is to hold off on this until it’s clear whether such a feature will exist: if it will, do (4) once it does, otherwise do (3).
EDIT 7/25/2017: Decision: 4 if/when the feature exists, and 1 until then.
Other namespaces
There are some other namespaces that could likely benefit from Span/Buffer, but we should probably handle separately from this initial push:
Span<T>
? Callback-based methods for exposing the internal storage via aReadOnlySpan<T>
(if that makes sense for the relevant collection)? This needs more thought.T[]
has an implicit operator to aSpan<T>
, we should consider adding anImmutableArray<T>
operator for converting to aReadOnlySpan<T>
.Known Open Issues
There are several cross-cutting open issues that apply to many of these APIs:
ReadOnlySpan<T>
andSpan<T>
, and “buffer” forReadOnlyBuffer<T>
andBuffer<T>
? What if the existing overload uses a different name, like “input” or “b”? What about cases where currently there’s a single argument (e.g. “input”) and then returns an array, and the new overload will have two span args, one input and one output?Next Steps
The text was updated successfully, but these errors were encountered: