Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Comments on Span<T> Spec #1653

Closed
clrjunkie opened this issue Jul 4, 2017 · 5 comments
Closed

Comments on Span<T> Spec #1653

clrjunkie opened this issue Jul 4, 2017 · 5 comments

Comments

@clrjunkie
Copy link

Regarding:

https://github.com/dotnet/corefxlab/blob/master/docs/specs/span.md

Span is a new type we are adding to the platform to represent contiguous regions of arbitrary memory, with performance characteristics on par with T[]

This wording suggest to me that Span somehow can represent several distinct regions as one.

Consider rephrasing:

"which represents a contiguous region of arbitrary memory"

(T already implies different memory block sizes)

Furthermore, since the context is memory, as reader, my initial concern was about the generality of T
Is T constrained to primitive types, Structs with only primitive types? Consider adding a following explanatory note

// native memory
var nativeMemory = Marshal.AllocHGlobal(100);
Span<byte> nativeSpan;
unsafe {
    nativeSpan = new Span<byte>(nativeMemory.ToPointer(), 100);
}
SafeSum(nativeSpan);
Marshal.FreeHGlobal(nativeMemory);

The snippet reads a bit odd, why is Span required to be created in an Unsafe Context?
In other words: why doesn't the Span take an IntPtr? (Is there a way to allocate unmanaged heap memory without Marshal.AllocHGlobal?)

Is there any disadvantage in passing an IntPtr and then calling ToPointer() ?

According to
https://github.com/dotnet/coreclr/blob/cdb827b6cf72bdb8b4d0dbdaec160c32de7c185f/src/mscorlib/src/System/IntPtr.cs

The size of IntPtr equals pointer

Furthermore, I would avoid, in examples, variable names which suggest acronyms (e.g "nativeSpan") and use a more instructive naming, something like:

"spanOverHeapMemory"

BTW, The comment and Marshal.AllocHGlobal already suggest unmanaged memory

Same goes for the stack example:

stackSpan->spanOverStackMemory

Discontinuous Buffers

The scenario is well known (No need ASP.NET reassurance). The snippet is bad, it's too involved and assumes ASP.NET specific abstractions.

It's not clear how 128 bytes stack memory plays a role here and how all desired bytes are ultimately combined to one uint in case 1 byte arrives then another byte arrives, and so on...

Furthermore, it’s not explained how stack allocated memory survives when the method completes.

In simple words: What’s going on here?

Consider replacing with a simple parser example (like UIntParser) without ASP.NET kung-fu

Native code interop
Today, unmanaged buffers passed over unmanaged to managed boundary are frequently copied to byte[] to allow safe access from managed code. Span can eliminate the need to copy in many such scenarios.

I don't think it's matter of safety, but rather of a necessity.. I mean unless all Api's are Span aware how can you avoid the copy? Please detail one or two representing scenarios out of the many .

A prototype of such fast Span can be found here. Through the magic of the "ref field", it can support slicing without requiring a strong pointer to the root of the sliced object. The GC is able to trace the interior pointer, keep the root object alive, and update the interior pointer if the object is relocated during a collection

I don’t understand. Please elaborate what problem/optimization this new magic “ref field” solves in the context of GC collection and promotion. I assume the average .net developer (myself included) understands ref as a way to create an alias for an object reference and it's implemented via pointer-to-pointer. What the new semantics? Please describe the benefit as a sequence of GC events along a timeline.

Struct Tearing
For most structs, tearing is at most a correctness bug and can be dealt with by making the fields (typed as the tearable struct type)

Is there a new type called "tearable struct type" or are you referring to the "atomicity challenge" concerning multi-field structs as members in general?

Span will be a stack-only type; more precisely, it will be a by-ref type

What exactly is this "by-ref" type? How is it different from a "reference type" or passing a reference type or primitive type be ref

Safe lifetime: Safe code cannot create dangling pointers by storing it on the heap when Span points to unmanaged memory or stack memory. The unsafe stack frame responsible for creating unsafe Span is responsible for ensuring that it won’t escape the scope.

Please elaborate this seems like new lifetime semantics.

Memory<T>

This reads like "Span 2.0". If so, why do we need Span<T> ?

If Memory<T> superseded Span<T> please note that at the beginning of the document so the reader can skip it all together.

@mikedn
Copy link

mikedn commented Jul 5, 2017

Is T constrained to primitive types, Structs with only primitive types?

T can be anything if the span is created from an array. Otherwise T must not contain references.

why doesn't the Span take an IntPtr?

Why should it? Using IntPtr in this case makes the code look somehow safe when it is not. It's IMO rather unfortunate that Marshal uses IntPtr.

consider replacing with a simple parser example (like UIntParser) without ASP.NET kung-fu

That example does look weird. And it's IMO pointless, the first example is enough. And no, the stack allocation doesn't survive.

I don't think it's matter of safety, but rather of a necessity.. I mean unless all Api's are Span aware how can you avoid the copy?

It's both a matter of safety and a necessity (API's like Stream could have offered pointer based overload if they weren't unsafe). And yes, for Span to be really useful new APIs will need to be added - https://github.com/dotnet/corefx/issues/21281

Please elaborate what problem/optimization this new magic ?ref field? solves in the context of GC collection and promotion.

The magic is that there is a ref field (as long as the object it contains is restricted to the stack). It behaves like any other ref and the magic has little to to with the GC. It's actually a matter for the VM and JIT since they have to correctly report the field to the GC.

I assume the average .net developer (myself included) understands ref as a way to create an alias for an object reference and it's implemented via pointer-to-pointer.

An alias for an object reference? That's a rather limited description of ref as you can have a ref to a value type. And more importantly it doesn't highlight one important aspect of ref - that it can point to a field/element inside an object/array. That was what the documentation is trying to explain with its "strong pointer", "interior pointer", "sliced object" and so on.

Is there a new type called "tearable struct type" or are you referring to the "atomicity challenge" concerning multi-field structs as members in general?

All multi field structs are tearable and they have always been, there's no way around that.

What exactly is this "by-ref" type? How is it different from a "reference type" or passing a reference type or primitive type be ref

The C# spec should help with this and the following question: https://github.com/dotnet/csharplang/blob/master/proposals/span-safety.md

If Memory superseded Span please note that at the beginning of the document so the reader can skip it all together.

It doesn't. The document is pretty clear in this regard, there are cases where somehting that doesn't have the stack restriction of Span is needed.

@clrjunkie
Copy link
Author

clrjunkie commented Jul 5, 2017

T can be anything if the span is created from an array. Otherwise T must not contain references.

Makes sense. I suggest adding this note to the document.

Why should it? Using IntPtr in this case makes the code look somehow safe when it is not. It's IMO rather unfortunate that Marshal uses IntPtr.

Substantially less characters to type (unsafe{},ToPointer()) when using Marshal?

Both Marshal.AllocHGlobal and IntPtr have always been part of the framework so I doubt people (who use these) would mistakenly consider these “safe”.

That example does look weird. And it's IMO pointless, the first example is enough. And no, the stack allocation doesn't survive.

That example needs to be removed.

The magic is that there is a ref field (as long as the object it contains is restricted to the stack). It behaves like any other ref and the magic has little to to with the GC. It's actually a matter for the VM and JIT since they have to correctly report the field to the GC

So if I understand correctly: A “ref field” sets a requirement on the field type to be stack only?

Can you please explain why this was needed for the inner working of the Span? What processing flow benefits from it?

An alias for an object reference? That's a rather limited description of ref as you can have a ref to a value type. And more importantly it doesn't highlight one important aspect of ref - that it can point to a field/element inside an object/array. That was what the documentation is trying to explain with its "strong pointer", "interior pointer", "sliced object" and so on.

Your right. I largely oversimplified the description of the ref keyword. Nevertheless, I don’t get the idea of "interior pointer" I mean who benefits from the second level of indirection and how?

Specifically, this paragraph is unclear not me:

it can support slicing without requiring a strong pointer to the root of the sliced object. The GC is able to trace the interior pointer, keep the root object alive, and update the interior pointer if the object is relocated during a collection.

Regarding:

https://github.com/dotnet/csharplang/blob/master/proposals/span-safety.md

This is a good document and it helps understand what a ref-like is. I might have some comments on the content but I would need to read it a couple of times and think some more about the various scenarios outlined. Nevertheless, the spec should spell this out correctly: “ref-like” – a particular set of requirements on a type.

I highly recommend merging the Span spec document with this document or at least have the spec hyperlink to it as it contains important context the reader needs to be aware of to fully understand the Span construct.

It doesn't. The document is pretty clear in this regard, there are cases where something that doesn't have the stack restriction of Span is needed.

Personally I would prefer having one construct over two where there is no difference in the overall intent.

Perhaps it would better to have only a single Class named ‘Span’ providing advanced developers the option to give it stack semantics by marking it as such, something like:

stackalloc Span<byte> s;

or

nativeSpan = stackalloc Span<byte>(nativeMemory.ToPointer(), 100);

In any case ‘Memory’ sounds too general to me and the name doesn’t specifically hint a relationship with Span. Perhaps HeapSpan would be a better name.

See a prototype of Memory here. Note that the prototype is currently not tearing safe. We will make it safe in the upcoming weeks.

The link is broken.

@mikedn
Copy link

mikedn commented Jul 5, 2017

Substantially less characters to type (unsafe{},ToPointer()) when using Marshal?

Doesn't sound like a good enough reason to me. The proper solution at the framework level would be to provide some sort of native memory management APIs. AFAIK there already are some experiments in this area ( https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Experimental/System/Buffers/OwnedNativeBuffer.cs for example).

So if I understand correctly: A ?ref field? sets a requirement on the field type to be stack only?

"field type" is ambiguous. If you meant "field's declaring type" then yes, any type containing a ref field is stack only.

Can you please explain why this was needed for the inner working of the Span? What processing flow benefits from it?

Your right. I largely oversimplified the description of the ref keyword. Nevertheless, I don't get the idea of "interior pointer" I mean who benefits from the second level of indirection and how?

Pay attention to the fast/slow representations.

The slow representation has to keep an object reference (_relocatableObject) and an offset inside the object (_pointer). It also has to distinguish between object slices and native memory slices. For object slices both _relocatableObject and _pointer are valid and the start address of the slice has to be computed by adding them. For native slices only _pointer is valid and it is the start address of the slice.

The fast representation is trivial, it only contains a managed pointer (ref T _pointer) and that can point either inside the object or to native memory.

Nevertheless, the spec should spell this out correctly: ref-like�� a particular set of requirements on a type.

This is basically work in progress. The whole ref-like thing didn't even exist when the span effort started.

I highly recommend merging the Span spec document with this document or at least have the spec hyperlink to it as it contains important context the reader needs to be aware of to fully understand the Span construct.

It's complicated. That one is a language spec and this one is the spec of a framework (and framework to some extent) feature. And both are work in progress. And it's not always clear what goes where (language/framework/runtime spec).

Personally I would prefer having one construct over two where there is no difference in the overall intent.

They're different. Span is basically a "view" while Memory is the real thing.

The link is broken.

Work in progress. Memory is now Buffer so it's here: https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/Buffer.cs. There are various issues opened about its design and even about updating documents.

@clrjunkie
Copy link
Author

Doesn't sound like a good enough reason to me. The proper solution at the framework level would be to provide some sort of native memory management APIs.

Perhaps we can reach middle ground with a Constructor overload.

This is basically work in progress. The whole ref-like thing didn't even exist when the span effort started.

It's complicated. That one is a language spec and this one is the spec of a framework (and framework to some extent) feature. And both are work in progress. And it's not always clear what goes where (language/framework/runtime spec).

They're different. Span is basically a "view" while Memory is the real thing.

Work in progress. Memory is now Buffer so it's here: https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/Buffer.cs. There are various issues opened about its design and even about updating documents.

I'm going to take a step back here. I naïvely assumed that these documents are considered kinda “final” since all the past "work in progress" designs I looked at where in “issues” (not files). I recommend stating “Work in progress” at the very beginning of these spec documents. I would be happy to review their updated versions when they mature.

I would highly appreciate a ping to this issue when the documents are revised or even better if you guys can open a new read-only repo just for announcements. Currently, I don’t think there is an option to subscribe for change notifications on a file level.

Thanks @mikedn!

@joshfree
Copy link
Member

Closing this discussion issue as Span is now in the corefx repo for .NET Core 2.1. Future issues should be opened in the dotnet/corefx repo for System.Span.

https://blogs.msdn.microsoft.com/dotnet/2017/11/15/welcome-to-c-7-2-and-span/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants