Move memset/memcpy helpers to managed impl #98623

EgorBo · 2024-02-17T23:49:28Z

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs

...reclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/CompilerHelpers/MemoryHelpers.cs

jkotas · 2024-02-18T06:25:41Z

We have two places in CoreLib that call Unsafe.InitBlockUnaligned. These places are effectively going to call SpanHelpers.Fill now - we can fix them up to do that directly.

runtime/src/libraries/System.Private.CoreLib/src/System/Span.cs

Line 309 in 2f43856

    
           Unsafe.InitBlockUnaligned(ref Unsafe.As<T, byte>(ref _reference), *(byte*)&value, (uint)_length);

runtime/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

Line 25 in 2f43856

Unsafe.InitBlockUnaligned(ref b, 0, (uint)byteLength);

EgorBo · 2024-02-18T10:20:37Z

We have two places in CoreLib that call Unsafe.InitBlockUnaligned. These places are effectively going to call SpanHelpers.Fill now - we can fix them up to do that directly.

The problem that SpanHelpers.Fill is not optimized by jit for constant length. It was never needed, because e.g. Span.Clear invoked Unsafe.InitBlockUnaligned

No longer happens if we change it to call SpanHelpers.Fill directly. I'll look in a separate PR to unroll Fill as well (should be a simple fix).

src/coreclr/jit/compiler.cpp

src/coreclr/tools/Common/TypeSystem/IL/HelperExtensions.cs

EgorBo · 2024-02-18T16:52:30Z

@jkotas we don't use memset/memcpy helpers in JIT for x86 and always use rep stosb/movb - wonder if we should switch those to helpers or not? I presume rep might also lead to gc starvation.

jkotas · 2024-02-18T17:01:23Z

I presume rep might also lead to gc starvation.

rep ... should be emitted as fully interruptible code to avoid gc starvation. It is same as a loop without a call.

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

src/coreclr/tools/aot/ILCompiler.Compiler/IL/ILImporter.Scanner.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

…teMemOps.cs Co-authored-by: Jan Kotas <[email protected]>

EgorBo · 2024-02-25T00:20:31Z

Btw, I found & fixed an existing bug - Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.:

Buffer.MemoryCopy(null, null, 10, 10);

this doesn't fail (likely can be reproduced in other APIs which use Memmove under the hood)

jkotas · 2024-02-25T00:33:42Z

Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.

This is not a bug that needs fixing. Passing invalid buffer to Memmove is UB.

EgorBo · 2024-02-25T00:35:40Z

Memmove used to not throw an NRE if both src and dest are null + length is not zero e.g.

This is not a bug that needs fixing. Passing invalid buffer to Memmove is UB.

if I don't fix it, then:

void Test(MyStruct* a, MyStruct* b) => *a = *b; // or its ref version

won't fail on both being null (or will depend on opts level)

jkotas · 2024-02-25T00:37:11Z

Ok, I agree that it makes sense to fix this for internal JIT use of memory copy.

docs/design/coreclr/botr/guide-for-porting.md

EgorBo · 2024-02-25T12:03:15Z

Done. I wrote some benchmarks
where Core_Root_PR_tailcall is this PR + a conservative fix for taillcalls (don't see big improvements from this). The fix is conservative because it disables unrollings for constant sizes.

NOTE: PGO doesn't unroll these unknown-size memops because the opt I landed for this currently require R2R=0 and I didn't disable it.

Overall looks good to me, IMO. It's possible that in some cases helper calls are always indirect because they're promoted to Tier1 together with the benchmarks, for cast helpers we did this trick, but I am not sure we want to add the whole SpanHelpers here as well.

CopyBlock4000/InitBlock4000 likely struggle from going to native memset/memmove anyway from the managed helpers
It is also possible that these numbers vary a bit from run to run due to GC alignment (although, I tried to use [MemoryRandomization] I am not sure it helps).

jkotas · 2024-02-25T14:55:44Z

src/coreclr/jit/importer.cpp

@@ -10343,18 +10343,34 @@ void Compiler::impImportBlockCode(BasicBlock* block)
 // TODO: enable for X86 as well, it currently doesn't support memset/memcpy helpers


x86 supports memset/memcpy helpers now (this can be a follow up change)

Yes, it's part of a follow up PR because it involves removal of GT_STORE_DYN_BLK op, definitely didn't want to do it as part of this PR

jkotas · 2024-02-25T14:57:27Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.ByteMemOps.cs

+#else
+        private const nuint MemmoveNativeThreshold = 2048;
+#endif
+        // TODO: Determine optimal value


Suggested change

// TODO: Determine optimal value

Anything TODO here?

um.. do you mean it needs a link? I didn't spend enough time testing this treshold on different cpus/platforms yet. I presume it also can't be too big to avoid regressing NAOT compiled for generic cpu.

If you believe that somebody should spend time on this, it should have a link.

It is complicated to pick one universal threshold, so I am not sure whether it would be time well spent. You end up picking your winners and losers. Anything in the 1kB-10kB range sounds about right to me. The point of threshold is to amortize the cost of the PInvoke transition.

If there is something to look at here, I think it would be the infinite threshold for Arm64 memcpy. I would expect that the unmanaged memcpy is going to have more tricks that we have here. The threshold for Arm64 was set years ago when the Arm64 optimizations in the overall ecosystem were just starting to show up.

I wasn't able to find any regressions from that infinite threshold on modern hw in microbenchmarks, but it's very possible the picture is different in real world due to those non-temporal stores etc.
I agree it won't hurt to change that treshold to some big value like 32kb etc

I'll remove this comment as part of the GT_STORE_DYN_BLK clean up PR to avoid spinning CI for comment change if you don't mind

jkotas · 2024-02-25T15:15:11Z

a conservative fix for taillcalls

Where would you expect it to help with the current state of the PR? The wrappers for JIT helpers are not there anymore, so the JIT helpers are not affected by this.

for cast helpers we did this trick,

This sounds like a general problem with ordering of tiered compilation. For direct calls, it would be preferable to compile the bottom layer methods faster than upper layer methods. I do not think we have anything in place to help with that (correct me if I am wrong). If there are a few more bottom layer types that we care about in particular, I do not see a problem with adding them to this workaround.

jkotas · 2024-02-25T15:16:42Z

Is the SpanHelpers byte overload more efficient for sizeof(T) == 1 or there is no difference?

No difference for now, but I wanted to improve the byte version so there will be

Is this left for a follow up?

jkotas

Looks great. Thank you!

EgorBo · 2024-02-25T15:37:48Z

Where would you expect it to help with the current state of the PR?

For ordinary managed calls for Clear/Memmove inside void methods, example:

void Test(Span<byte> span) => span.Clear();

^ the codegen is expected to perform a tail call to SpanHelpers.ClearWithoutReferences (it doesn't happen with this PR yet)

If there are a few more bottom layer types that we care about in particular, I do not see a problem with adding them to this workaround.

Overall I agree, it's just that I'd rather want SpanHelpers to be promoted with a delay to avoid setting in stone weights too early (one day we'll implement the context-sensitive PGO to solve this problem differently). Also, all R2R'd methods can stay in R2R form longer. This timing issue can only manifest in simple microbenchmarks I presume (I still not sure the perf difference between indirect and direct calls are noticeable)

Is this left for a follow up?

Yep. We probably can ask our ARM64/Intel contributors for help here, although, I am not sure non-zero Memset is a popular operation (very few hits accross SPMI).

src/coreclr/vm/callcounting.cpp

ghost assigned EgorBo Feb 17, 2024

dotnet-issue-labeler bot added the area-VM-coreclr label Feb 17, 2024

stephentoub reviewed Feb 17, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

EgorBo force-pushed the remove-jit-memset branch from d7967ad to 66f6799 Compare February 18, 2024 02:02

Remove JIT_MemSet/JIT_MemCpy

508eb9c

EgorBo force-pushed the remove-jit-memset branch from 66f6799 to 508eb9c Compare February 18, 2024 03:18

build-analysis bot mentioned this pull request Feb 18, 2024

[browser][MT] Assert failed: Cannot find Promise for JSHandle -2 #98406

Closed

jkotas reviewed Feb 18, 2024

View reviewed changes

...reclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/CompilerHelpers/MemoryHelpers.cs Outdated Show resolved Hide resolved

jkotas reviewed Feb 18, 2024

View reviewed changes

...reclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/CompilerHelpers/MemoryHelpers.cs Outdated Show resolved Hide resolved

jkotas mentioned this pull request Feb 18, 2024

[NativeAOT] Add null checks into memcpy/memset helpers #98547

Merged

Add a test

32ec60e

Address feedback

0d4f4f4

EgorBo mentioned this pull request Feb 18, 2024

MemsetMemcpyNullref test fails on mono #98628

Open

This was referenced Feb 18, 2024

Intermittend failures in ThreadPoolTests on Mono Linux ARM64 #85960

Closed

Failing test - System.Threading.ThreadPools.Tests.ThreadPoolTests.ThreadPoolMinMaxThreadsEventTest on Linux/arm64 #95873

Closed

jkotas reviewed Feb 18, 2024

View reviewed changes

src/coreclr/jit/compiler.cpp Outdated Show resolved Hide resolved

jkotas reviewed Feb 18, 2024

View reviewed changes

src/coreclr/tools/Common/TypeSystem/IL/HelperExtensions.cs Outdated Show resolved Hide resolved

EgorBo added 2 commits February 18, 2024 17:33

Address feedback

321bb14

Address feedback

4904b4b

EgorBo force-pushed the remove-jit-memset branch from 2f9a35d to 4904b4b Compare February 18, 2024 16:37

jkotas reviewed Feb 18, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs Outdated Show resolved Hide resolved

jkotas reviewed Feb 18, 2024

View reviewed changes

src/coreclr/tools/aot/ILCompiler.Compiler/IL/ILImporter.Scanner.cs Outdated Show resolved Hide resolved

jkotas reviewed Feb 18, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs Outdated Show resolved Hide resolved

EgorBo added 2 commits February 18, 2024 18:32

Address feedback

1573925

Copy Fill's impl

cbcfaf4

jkotas reviewed Feb 18, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs Outdated Show resolved Hide resolved

EgorBo and others added 2 commits February 25, 2024 01:07

Update src/libraries/System.Private.CoreLib/src/System/SpanHelpers.By…

0a72fab

…teMemOps.cs Co-authored-by: Jan Kotas <[email protected]>

Address feedback

8aa3377

jkotas reviewed Feb 25, 2024

View reviewed changes

docs/design/coreclr/botr/guide-for-porting.md Outdated Show resolved Hide resolved

Update guide-for-porting.md

f5e0662

jkotas reviewed Feb 25, 2024

View reviewed changes

jkotas approved these changes Feb 25, 2024

View reviewed changes

EgorBo merged commit fab69ef into dotnet:main Feb 25, 2024
182 of 185 checks passed

EgorBo deleted the remove-jit-memset branch February 25, 2024 16:37

AustinWise reviewed Feb 25, 2024

View reviewed changes

src/coreclr/vm/callcounting.cpp Show resolved Hide resolved

EgorBo mentioned this pull request Feb 26, 2024

Remove GT_STORE_DYN_BLK #98905

Merged

MichalStrehovsky mentioned this pull request Feb 26, 2024

Replace memmove with Unsafe.CopyBlock in hydration code #98929

Merged

DrewScoggins mentioned this pull request Feb 27, 2024

[Perf] Linux/x64: 4 Regressions on 2/25/2024 4:37:10 PM #99002

Closed

build-analysis bot mentioned this pull request Feb 28, 2024

profiler\\multiple\\multiple\\multiple.cmd failing on windows arm64 #98817

Closed

TIHan mentioned this pull request Feb 28, 2024

Test failure: JIT/Regression/JitBlue/Runtime_63942/Runtime_63942/Runtime_63942.cmd #98971

Closed

matouskozak mentioned this pull request Feb 28, 2024

[Perf] Linux/x64: 19 Regressions on 2/25/2024 4:37:10 PM dotnet/perf-autofiling-issues#29872

Closed

This was referenced Feb 28, 2024

Special case Mono in SpanHelpers.Fill/CleanWithoutReference #99059

Closed

[Perf] Linux/arm64: 7 Regressions on 2/25/2024 4:37:10 PM #99122

Closed

fanyang-mono mentioned this pull request Mar 1, 2024

[mono] Intrinsify API's in SpanHelpers.ByteMemOps.cs #99161

Closed

3 tasks

This was referenced Mar 16, 2024

Evaluate JIT_MemCpy with wider copies using AVX2 on hardware that support it #6702

Closed

Evaluate JIT_MemCpy without overlap check #6701

Open

github-actions bot locked and limited conversation to collaborators Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move memset/memcpy helpers to managed impl #98623

Move memset/memcpy helpers to managed impl #98623

EgorBo commented Feb 17, 2024 •

edited by jkotas

Loading

jkotas commented Feb 18, 2024 •

edited

Loading

EgorBo commented Feb 18, 2024 •

edited

Loading

EgorBo commented Feb 18, 2024

jkotas commented Feb 18, 2024

EgorBo commented Feb 25, 2024 •

edited

Loading

jkotas commented Feb 25, 2024

EgorBo commented Feb 25, 2024 •

edited

Loading

jkotas commented Feb 25, 2024

EgorBo commented Feb 25, 2024 •

edited

Loading

jkotas Feb 25, 2024

EgorBo Feb 25, 2024

jkotas Feb 25, 2024

EgorBo Feb 25, 2024 •

edited

Loading

jkotas Feb 25, 2024

EgorBo Feb 25, 2024

EgorBo Feb 25, 2024

jkotas commented Feb 25, 2024

jkotas commented Feb 25, 2024

jkotas left a comment

EgorBo commented Feb 25, 2024

		@@ -10343,18 +10343,34 @@ void Compiler::impImportBlockCode(BasicBlock* block)
		// TODO: enable for X86 as well, it currently doesn't support memset/memcpy helpers

Move memset/memcpy helpers to managed impl #98623

Move memset/memcpy helpers to managed impl #98623

Conversation

EgorBo commented Feb 17, 2024 • edited by jkotas Loading

jkotas commented Feb 18, 2024 • edited Loading

EgorBo commented Feb 18, 2024 • edited Loading

EgorBo commented Feb 18, 2024

jkotas commented Feb 18, 2024

EgorBo commented Feb 25, 2024 • edited Loading

jkotas commented Feb 25, 2024

EgorBo commented Feb 25, 2024 • edited Loading

jkotas commented Feb 25, 2024

EgorBo commented Feb 25, 2024 • edited Loading

jkotas Feb 25, 2024

Choose a reason for hiding this comment

EgorBo Feb 25, 2024

Choose a reason for hiding this comment

jkotas Feb 25, 2024

Choose a reason for hiding this comment

EgorBo Feb 25, 2024 • edited Loading

Choose a reason for hiding this comment

jkotas Feb 25, 2024

Choose a reason for hiding this comment

EgorBo Feb 25, 2024

Choose a reason for hiding this comment

EgorBo Feb 25, 2024

Choose a reason for hiding this comment

jkotas commented Feb 25, 2024

jkotas commented Feb 25, 2024

jkotas left a comment

Choose a reason for hiding this comment

EgorBo commented Feb 25, 2024

EgorBo commented Feb 17, 2024 •

edited by jkotas

Loading

jkotas commented Feb 18, 2024 •

edited

Loading

EgorBo commented Feb 18, 2024 •

edited

Loading

EgorBo commented Feb 25, 2024 •

edited

Loading

EgorBo commented Feb 25, 2024 •

edited

Loading

EgorBo commented Feb 25, 2024 •

edited

Loading

EgorBo Feb 25, 2024 •

edited

Loading