-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Span<T> Binary Reader/Writer APIs #23601
Comments
Api review feedback:
TryReadBigEndiann -> TryReadBigEndian
|
Additional questions:
|
I'm concerned that a common mistake will be to try to use this API to write multiple values to a single var span = stackalloc byte[8];
span.Write(Int32.MinValue);
span.Write(Int32.MaxValue); Similar code works for other Also, how should that code be written correctly? Do I have to manually use var span = stackalloc byte[8];
span.Slice(0).Write(Int32.MinValue);
span.Slice(4).Write(Int32.MaxValue); (The first |
Why System.Buffers and not System.Memory where Span and friends live?
+1 |
This was brought up in the review. The rationale is that we only have the span/memory primitive type and APIs/extension methods closely related to them in System.Memory (IndexOf, AsBytes, etc). Operations like Read/Write, transformation APIs like text encode, and text parse/format would live in System.Buffers.
Yes.
Yes. Are there any concerns with that?
One approach could be to remain consistent with TryRead and since we don't have TryReadBytes, change TryWriteBytes elsewhere to TryWrite and keep this as TryWrite as well. |
Well, it is more complex than for .NET Core: Span/Memory primitive types are in System.Runtime, SpanExtension are in System.Memory, and now you are proposing to add extension methods to read/write bytes from Span to System.Buffers that has nothing to do with Span so far. It feels like we are trying to overthink the layering again. It has been proven that we do not know how to do that well. We should preffer larger assemblies that have all related stuff lumped together. |
I think we should remove the generics and instead have the type being written in the method name e.g.
It's too easy to mess up the generic method. Here's some MQTT parser code using these APIs: |
We could change them so they are no longer extension methods on Span (and take Span as a parameter, similar to the other System.Text.Primitive APIs for example var span = stackalloc byte[4];
Binary.Write(span, Int32.MaxValue);
var value = Binary.Read<int>(span); |
Also this problem that @svick brings up is another big usability issue with these APIs and span:
Since the early days of pipelines and span @benaadams, @Drawaes, @mgravell and I have been discussing this scenario. We should also think about a mutable ref like struct that moves forward as the data is written. var writer = new BinaryWriter(span);
writer.WriteInt32(10); // writes and moves to span + sizeof(int32)
writer.WriteInt64(20); |
Funny over the last couple of days I have been writing a fast PE/IL rewriter and while the code is dogs breakfast due to me deep diving the topic I have had to write exactly those methods again. A struct wrapper for a span. And then extension methods that read from (to be replaced with your methods of course mine are a hack) the span. Then slice the span and return it. Then the struct wrapper uses this to keep its internal span up to date. Seeing as every app I write that uses spans heavily ends up with this same pattern it seems a good candidate for a framework item. |
We'd discussed a number of similar concerns yesterday as well:
I'd be happy with just putting an API like this in System.Memory.dll or wherever: namespace System.Buffers
{
public ref struct SpanReader
{
public SpanReader(ReadOnlySpan<byte> span);
public int Length { get; }
public int Position { get; set; }
public ReadOnlySpan<byte> Remaining { get; }
public bool TryReadInt32LittleEndian(out int result);
public bool TryReadInt32BigEndian(out int result);
public bool TryReadInt64LittleEndian(out long result);
public bool TryReadInt64BigEndian(out long result);
... // repeated for each primitive type we care about
}
public ref struct SpanWriter
{
public SpanWriter(Span<byte> span);
public int Length { get; }
public int Position { get; set; }
public Span<byte> Remaining { get; }
public bool TryWriteInt32LittleEndian(int result);
public bool TryWriteInt32BigEndian(int result);
public bool TryWriteInt64LittleEndian(long result);
public bool TryWriteInt64BigEndian(long result);
... // repeated for each primitive type we care about
}
} We wouldn't have just the unadorned TryReadInt32 as it's duplicative: if you don't care about endianness because you're just reading/writing on the same system, then you can use either the little or big methods, as long as you're consistent, and if you do care about endianness, then you can use the one you need. I didn't include the non-Try methods above, because if you're writing this kind of code, it seems you do care about the potential expense of exceptions, in which case you either want to use the Try methods to avoid exceptions, or you know for certain that the operation will succeed, in which case you can just use the Try variant; that said, if folks feel strongly about having the non-Try, I don't think it's a big deal, it just seems overkill to me to have both. This avoids concerns about arbitrary Ts and endianness (plus needing primitive checks), it avoids concerns about duplication with BitConverter as it's clearly providing more advanced functionality than just the primitive read/write BitConverter does, it avoids concerns about not advancing because it advances, it fits well with the general .NET notion of XxReader/Writer, etc. |
That works except when I need to read an int24 ;) |
I like the idea or span reader/writer, but I think it should be added in addition, not instead of the proposed APIs. the reader/writer is a necessarily more heavy API. |
Why? It sounds like the reader/writer support is the more common need (to avoid needing to manually slice), and if you really do want the individual operation, you can do: new SpanReader(span).TryReadInt32LittleEndian(out int result); instead of: SpanReader.TryReadInt32LittleEndian(span, out int result); which is barely more characters and saves doubling the number of methods we need to expose, including those that duplicate BitConverter functionality. |
I've recently become a fan of... "working" with the interesting things the type system + Jit does so I'd go for something more like: public interface IEndianness {}
public struct LittleEndian : IEndianness {}
public struct BigEndian : IEndianness {}
public ref struct SpanWriter
{
public SpanWriter(Span<byte> span);
public int Length { get; }
public int Position { get; set; }
public Span<byte> Remaining { get; }
public bool TryWriteInt32(long value);
public bool TryWriteInt32<TEndianness>(long value) where TEndianness : IEndianness;
public bool TryWriteInt64(long value);
public bool TryWriteInt64<TEndianness>(long value) where TEndianness : IEndianness;
// ... repeated for each primitive type we care about
} |
public bool TryWriteInt32<TEndianness>(long value) where TEndianness : IEndianness
{
if (typeof(TEndianness) == typeof(LittleEndian))
{
// do LittleEndian
}
else if (typeof(TEndianness) == typeof(BigEndian))
{
// do BigEndian
}
else
{
throw WTF();
}
} Types being structs should compile it to an optimized method for that Endianness Usage // read LittleEndian
new SpanReader(span).TryReadInt32<LittleEndian>(out int result);
// read BigEndian
new SpanReader(span).TryReadInt32<BigEndian>(out int result);
// read native
new SpanReader(span).TryReadInt32(out int result); |
Assuming bool TryWriteInt32(long value)
=> BitConverter.IsLittleEndian ?
TryWriteInt32<LittleEndian>(value) :
TryWriteInt32<BigEndian>(value); |
Now we are cooking with gas ;) |
For what situation is a native version needed? And if it is needed, why would it be done differently from the little/big endian support? i.e. why have LittleEndian and BigEndian structs but not PlatformEndian or whatever? Also, with the structs, what does it mean to call these methods with some other implementation of the interface? |
It means it it throws right ? I think that is why there was a "wtf" exception. |
Sure... it's just an odd design from that perspective: you allow parameterizing based on an interface anyone can implement, but then fail if provided with something other than one of a few built-in blessed implementations. I understand why it's being suggested, it's just a bit strange. |
When you don't care about endianness because you are always reading and writing to the same architecture |
In which case why not just use little or big consistently? I guess the concern is that you might force unnecessary reversals if, for example, you picked big but were on a little platform? Does that come up in the scenarios where this is used? I thought these APIs were primarily about interacting with data over the network. |
Is wordy; and I still have to look up what endianness intel is on the rare occasions endianness is important; so anecdotally I'd guess a regular user would also need to if they wanted to be native to the platform (rather than always reversing the input) by making the wrong initial choice and then just copy pasting |
To be fair @MarcGravel and I circled around this issue several times and cane to the conclusion that even if you "don't care" it should be explicitly little or big. Reason you might not care today. But say you save the data and load it on say a bigendian Xbox you get in trouble. So there should be no "native" option. |
Suppose it doesn't matter, could always be an extension ;-) |
Ok, that's fair. |
This isn't feature creep. This is designing the thing the code actually needs. The examples provided would mostly (all?) be better off with a reader/writer, or would not suffer from the few extra instructions required to pass the span in via a field on a ref struct rather than as a parameter. |
I tend to agree I have been racking my brain for a time you do a single op on a span and am failing to come up with one. The examples above the first slices the buffer only so it's close. The second example was odd code and I changed it to be correct and now it slices. It might be worth fleshing out the design in the other issue and see if it can be done in a reasonable way. If it has blocking points or seems too hard for now then this issue could be returned to? |
I still don't see how the code samples would be better with a reader/writer. The first code sample reads one int (length) and then does nothing else with binary. The other is overly complicated. Should probably just write the int to the Stream and then the payload. Also, to implement reader/writer we need the low level APIs. |
The low-level APIs are Unsafe.Read/WriteUnaligned. We have those. |
Right. I won't be surprised if we find that the |
Great, then we will use the Unsafe APIs to implement reader/writer. |
I updated the APIs (at the top of the issue) after incorporating feedback from. We are still discussing if we rename Read to Get and Write to Put/Set. |
|
@tmds Take a look at dotnet/corefx#24180 |
Updated the assembly where we will add the APIs (at the top of this issue). The current POR is to use System.Memory.dll. We decided we cannot use System.Buffers.dll as it was in-boxed in .NET Core 2.0 |
FYI: The API review discussion was recorded - see https://youtu.be/m4BUM3nJZRw?t=38 (70 min duration) |
Floating point numbers are not included here. Can they be included?
Perhaps the method names can be shortened by using "LE"/"BE" instead of "LittleEndian/BigEndian"? From what I want to do with this, I'd like a higher-level SpanReader/SpanWriter api as requested by @svick and @Drawaes (https://github.com/dotnet/corefx/issues/24180). I wonder: can't we get rid of the overloads via generic |
As discussed in the API review there is no current way of limiting T to a primative hence no generic. As for floating point the text above says why I don't think there should be a "standard" floating point op because there really is no standard to speak of. I would say if there are floating point ops they should be off on their own or if they truly are just byte flipped you could just read a unit or using and blit it in a simple method ;) |
I wondered if the reason was perf. The methods could throw for non-primitive types.
Indeed. Lack of a standard may be a reason to not to include these. Or you can go by one may in practice safely assume that the endianness is the same for floating-point numbers as for integers. |
So I read the whole thread... I think the explicitness of the type being written/read is a good reason to go for the non-generic version (even for the 'machine' endianness). Also, my use-case is for SpanReader/SpanWriter. It seems that that is the common use-cases. And as shown by @stephentoub a single value can easily read/written using such api's too. This raises the question whether there should be a public API for dealing with a single value. |
Do you have a rough ETA for this being available for CoreFX, there are some places I would like to use it inside SslStream in the next refactor around the handshake and framing. Cheers |
@Drawaes, which functionality do you need? I'd have thought BitConverter would be sufficient. Or you need everything read/written big endian? Don't count on being able to use these APIs. |
I don't need it per se. But the frames do a bigendian read on the size field. Just would have been nice as I prototype this all moving to span / memory to use the official method. It's certainly not going to hold me up. Just wondering. |
@Drawaes we have active discussions on which APIs should be out-of-box vs. in the platform. I believe we are leaning towards having this one out of box, which means nothing in platform can depend on it. At least not now. Please don't take dependency on it, unless you really have to. |
Cool good to know. I don't really need it was just going to use it if it was the new one true way ;) |
This is now done. dotnet/corefx#24400. |
@KrzysztofCwalina I think you meant "now done", right? |
Corrected :-) |
The API allows for reading and writing binary representation of primitve types (bit blitting) from/to spans of bytes. The API is used by SignalR.
A prototype of the API is available in corfxlab: https://github.com/dotnet/corefxlab/tree/master/src/System.Binary
Part of dotnet/corefx#24174
Usage is quite simple:
The LittleEndian and BigEndian versions assume/specify specific endianness. The Write/Read versions assume current machine endianness. Try versions return false if the buffer is too small to read/write the specified data type.
API Design:
Original proposals (click to expand)
The text was updated successfully, but these errors were encountered: