Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add Span Fill and Clear methods #8498

Closed
wants to merge 10 commits into from
Closed

Add Span Fill and Clear methods #8498

wants to merge 10 commits into from

Conversation

AlexRadch
Copy link

@AlexRadch AlexRadch commented Dec 7, 2016

Provide a safe yet fast way of filling of any type of contiguous memory; managed or unmanaged.
See https://github.com/dotnet/corefx/issues/14189 and https://github.com/dotnet/corefx/issues/13915

@AlexRadch
Copy link
Author

@AlexGhiondea, @joperezr CR please

@joperezr
Copy link
Member

joperezr commented Dec 8, 2016

Hello @AlexRadch and thanks for your contribution. If you inspect the System.Private.CoreLib.dll that you are producing(located under bin\Product\Windows_NT.x64.Debug), you'll see that the methods you added are actually not there. That is because in order for them to show up, you need to add lines:

<Member Name="Clear" />
<Member Name="Fill(T)" />

to the model.xml file for the Span type here

Without that change, we won't be able to see these methods on corefx side, so we can't expose them.

@joperezr
Copy link
Member

joperezr commented Dec 8, 2016

cc: @jkotas

@joperezr joperezr changed the title Add Span Fill method Add Span Fill and Clear methods Dec 8, 2016
@joperezr
Copy link
Member

joperezr commented Dec 8, 2016

@dotnet-bot test Linux ARM Emulator Cross Debug Build please (Build timed out)

fixed (byte* pDestination = &Unsafe.As<T, byte>(ref Unsafe.Add(ref destination, copiedCount)))
{
#if BIT64
Buffer.Memmove(pDestination, pSource, (ulong)copyLen * (ulong)size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work ok for smaller sizes where everything fits into the cache. I would expect that this is going to be slower for larger sizes because of it will be faulting twice as many cache lines than simple loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is correct number for cash size?

else
{
for (int i = elementsCount - 1; i >= 0; i--)
Unsafe.Add(ref destination, i) = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a similar fast path here as well?

Copy link
Author

@AlexRadch AlexRadch Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correct. If type T ContainsReferences then we can not just copy memory. We must do item by item filling. I am copied logic from SpanHelper.CopyTo method. Am I correct?

public void Clear()
{
if (_length > 0)
SpanHelper.Fill<T>(ref GetRawPointer(), default(T), _length);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that Clear should use Fill. Clear can be implemented more efficiently as it doesn't care about the size of T nor about references. It simply needs to fill the memory with 0.

Copy link
Author

@AlexRadch AlexRadch Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correct. If type T ContainsReferences then we can not just clear all array items to 0. We must do item by item clear. I am copied logic from SpanHelper.CopyTo method. Am I correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References are special because updating a reference in memory requires a GC write barrier. But if the reference you are writing to memory is null then no write barrier is needed, you can simply write a 0 and call it a day.

That said, you may still need to account for references depending on the way zeroes are written. For example, you can't just call Buffer.ZeroMemory because that one writes one byte at a time and if the GC stops you in the middle then it may end up seeing a partially updated and thus corrupted reference.

But typical clear implementations use large aligned writes so in general this shouldn't be an issue.

@jkotas
Copy link
Member

jkotas commented Dec 8, 2016

cc @nietras

@jkotas
Copy link
Member

jkotas commented Dec 8, 2016

cc @ahsonkhan @KrzysztofCwalina


int copiedCount; // Fill up to copiedCount in first loop
if (size <= sizeof(int))
copiedCount = elementsCount <= 512 / 4 * 3 ? elementsCount : 512 / 4 * 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few magic numbers here. Please add a comment as to what they are and/or turn them into constants.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Memmove use 512 as magic const when it make copy in loop or call system function.

@KrzysztofCwalina
Copy link
Member

@AlexRadch, are you going to add the same methods to corefx span? We need to keep them in sync.

@nietras
Copy link

nietras commented Dec 8, 2016

@jkotas as I told @karelz I was waiting for API approval of https://github.com/dotnet/corefx/issues/13427 and PR dotnet/corefx#13848 to be merged depending on this, before making a PR for the implementation I had in mind.

The implementation I have in mind is entirely different than this PR, so I think it will be easier to just do the PR than trying to change this to that. Whether or not you want to move ahead with this PR is up to you.

I could make a PR, which would not use the Unsafe functions I had in mind, but they were kind of essential given that I was thinking it should be possible to do a "fast" version without pinning any memory with these (for specific types of specific length). If that is initblock can be used for reference free value types on a ref without pinning. Clear should pretty much just be a call to initblock with value 0, shouldn't it?

Using memmove for this is just bad in my view, as you said @jkotas it will pollute the cache more than necessary. And where are the tests?

@AlexRadch
Copy link
Author

AlexRadch commented Dec 8, 2016

@nietras It will be great if Unsafe will have methods to copy and init memory. Also I sagest to add in Unsafe Fill method. When we should repeat one memory block many times in other block. Fast Fill methods should deal with optimal cash size. And I think this should be coded in one place.

else
{
for (int i = elementsCount - 1; i >= 0; i--)
Unsafe.Add(ref destination, i) = default(T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation would be ok for the portable Span implementation in corefx. The fast implementation built into CoreCLR should use FCall to clear these references.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy this code from SpanHelper.CopyTo. Is it safe to use FCall ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to use FCall in the CoreCLR corelib implementation of Span.

{
fixed (byte* pDestination = &Unsafe.As<T, byte>(ref destination))
{
Buffer.ZeroMemory(pDestination, (long)elementsCount * Unsafe.SizeOf<T>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer.ZeroMemory is pretty slow naïve implementation. It would be better to do this e.g. using Unsafe.InitBlockUnaligned. Note that Unsafe type in corelib is internal implementation detail so you do not need to block on the API approval to add it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see Unsafe.InitBlockUnaligned. I think it is not merged yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not merged yet in corefx. But getting it merged in corefx won't make it show up in coreclr corelib. It will be need to be added separately. And since Unsafe class in corelib is internal (not a public API), it can be added there without blocking on the API review.

public void Clear()
{
if (_length > 0)
SpanHelper.Clear<T>(ref DangerousGetPinnableReference(), _length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason for splitting this into two methods? I think it would be better for the implementation to be directly here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be to remove unsafe modifier from Clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing unsafe modifier is not a good reason. unsafe modifier does not matter - it does not do anything to the public shape of the API.

@AlexRadch
Copy link
Author

I want to close this PR without merge. I wrote code for Fill and Clear similar to CopyTo code. I think @nietras can write correct code for this API. I can't.

@karelz
Copy link
Member

karelz commented Dec 9, 2016

Closing per @AlexRadch's comment. It is a non-trivial change.

@karelz karelz closed this Dec 9, 2016
@jkotas
Copy link
Member

jkotas commented Dec 9, 2016

@AlexRadch It should be fine to split this into two parts:

  • First part to just add the APIs with a trivial non-optimized implementation, get it wired in corefx, add some tests.
  • Second part to optimize the implementation

If it is what was done for Array.Fill added recently - it has a trivial implementation, and issue is filled on getting it optimized. Feel free to pick up the first part if you would like to.

@nietras
Copy link

nietras commented Dec 9, 2016

It would be great to have tests and appropriate perf scenario tests 😄 Perhaps some guidance on perf scenarios/how to add tests for that?

@jkotas the fast span implementation in coreclr, when will that get Clear/Fill and will it then use methods similar to Buffer/ZeroMemory? And can initblk be used for zeroing memory that contains references? And can that then not be used without pinning?

@joperezr
Copy link
Member

joperezr commented Dec 9, 2016

It would be great to have tests and appropriate perf scenario tests 😄 Perhaps some guidance on perf scenarios/how to add tests for that?

I'm not sure if Span is a special case, but usually we write the code on coreclr side, and write the tests on corefx side once we expose the APIs through contracts.

@nietras
Copy link

nietras commented Dec 9, 2016

@joperezr I think Span is special because it will exist as a portable version, that can be used on other runtimes...

@karelz
Copy link
Member

karelz commented Dec 10, 2016

@brianrob @DrewScoggins do we have official (public) info about perf tests in CoreFX?

@jkotas
Copy link
Member

jkotas commented Dec 10, 2016

usually we write the code on coreclr side, and write the tests on corefx side once we expose the APIs through contracts

I believe it is pretty standard to work on implementation in coreclr together with the tests in corefx. (And getting a trivial implementation in first makes this somewhat easier because of you can then just copy over coreclr to corefx, without worrying about contracts, etc.)

the fast span implementation in coreclr, when will that get Clear/Fill and will it then use methods similar to Buffer/ZeroMemory?

For Clear, I think it can call initblk for T without GC references, and call FCall for values with GC references that explicitly fills in one pointer at a time.

For Fill, there are number of different options and tradeoffs. It maybe worth optimizing it for bytes using initblk. For the rest, I do not have a strong opinion - simple loop, maybe with a bit of manual unrolling, would be fine with me.

BTW: The current Span Copy method in CoreCLR also needs work. It was just copied from the portable Span. It needs to be optimized using FCall as well (in particular the GC references containing case).

@nietras
Copy link

nietras commented Dec 10, 2016

I'm sorry, I completely missed that this PR was actually for coreclr and not corefx which confused me with many comments. Now I understand some of the things much better. My bad.

I probably need some guidance on a few of these suggestions, in particular how and were the missing Unsafe initblk cpbll for refs, if I understand correctly, should be added?

And the specific FCalls.

Is it a target to avoid pinning in this case? Or does that not matter?

@nietras
Copy link

nietras commented Dec 10, 2016

If we just need a working implementation first, then this could be a starting point. Perhaps with some minor changes. @jkotas you decide, my idea was to start with portable corefx impl, since that is where I would like it first, and I have tried adding stuff before. I'll be happy to look at the coreclr version too...

@jkotas
Copy link
Member

jkotas commented Dec 10, 2016

my idea was to start with portable corefx impl

Sounds like a fine plan to me.

@AlexRadch AlexRadch deleted the Span-Fill branch December 10, 2016 03:03
@nietras
Copy link

nietras commented Dec 10, 2016

Ok, I started it and we will get back to the coreclr version later.

@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants