-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable storing pinned locals in registers #63397
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis is a follow up from (RIP) #55273, specifically this comment from @jkotas. OverviewConsider this snippet: static unsafe void A(ref byte r)
{
fixed (void* p = &r)
{
B(p);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static unsafe void B(void* p)
{
} This currently results in: sub rsp, 0x28
mov [rsp+0x20], rcx
call 0x00007ffaecb70028
xor eax, eax
mov [rsp+0x20], rax
add rsp, 0x28
ret Currently, all pinned locals are always stored on the stack. This makes pinning not really ideal for hot paths. mov rbx, rcx
call 0x00007ffaecb70028
xor rbx, rbx
ret In this example I just used Goes without saying that this optimization could bring some nice codegen/perf wins in interop-heavy code 😄
|
I'm not super convinced of the wins here myself. I think that being able to elide the The "friendly signature" for something like Regular pinning is effectively a |
I can still think of many cases where the stack spill for the pinning would be the only one. Like, this would be the case for |
cc @kunalspathak. |
For reference, I did try to use public static unsafe int GetHashCode(object? o)
{
if (o is not null)
{
uint syncBlockValue;
fixed (byte* pData = &o.GetRawData())
{
syncBlockValue = ((ObjectHeader*)&((nint*)pData)[-2])->SyncBlockValue;
}
const uint BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX = 0x08000000;
const uint BIT_SBLK_IS_HASHCODE = 0x04000000;
const uint BITS_IS_VALID_HASHCODE = BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX | BIT_SBLK_IS_HASHCODE;
const int HASHCODE_BITS = 26;
const uint MASK_HASHCODE = (1u << HASHCODE_BITS) - 1u;
if ((syncBlockValue & BITS_IS_VALID_HASHCODE) == BITS_IS_VALID_HASHCODE)
{
return unchecked((int)(syncBlockValue & MASK_HASHCODE));
}
}
return InternalGetHashCode(o);
} The performance though was way worse than the current one:
The asm in particular was... Interesting: ; ObjectGetHashCodeBenchmark.GetHashCode()
sub rsp,28
xor eax,eax
mov [rsp+20],rax
mov rcx,[rcx+8]
test rcx,rcx
je short M00_L00
lea rax,[rcx+8]
mov [rsp+20],rax
mov rax,[rsp+20]
mov eax,[rax+0FFF4]
xor edx,edx
mov [rsp+20],rdx
mov edx,eax
and edx,0C000000
cmp edx,0C000000
jne short M00_L00
and eax,3FFFFFF
jmp short M00_L01
M00_L00:
call System.Runtime.CompilerServices.RuntimeHelpers.InternalGetHashCode(System.Object)
M00_L01:
nop
add rsp,28
ret
; Total bytes of code 78 Looks like there's lots of room for improvements to the codegen in this case? 🤔 I will also say: I still think an As in, the code above could then just be: public static unsafe int GetHashCode(object? o)
{
if (o is not null)
{
ref byte dataRef = ref o.GetRawData();
nint headerData = Unsafe.AtomicAddByteOffsetAndRead<nint>(ref dataRef, -8);
uint syncBlockValue = ((ObjectHeader*)&headerData)->SyncBlockValue;
// Rest of the code (with no GC holes)
}
return InternalGetHashCode(o);
} |
@Sergio0694 more like |
I mean yes, that was just an example (plus you'd get the same anyway with a negative offset). Of course, such an API would only be allowed to read primitives, or even just |
This would likely fix #35748 too. |
I'd like to +1 this. In NativeAOT we do have The part that pinning introduces stack locals hurts the scenario somewhat. |
Also see: #97997 , that reminded me of this issue. |
This is a follow up from (RIP) #55273, specifically this comment from @jkotas.
Opening this issue for tracking and avoid that getting lost.
Overview
Consider this snippet:
This currently results in:
Currently, all pinned locals are always stored on the stack. This makes pinning not really ideal for hot paths.
It would be nice if the JIT added support for using a register to store pinned locals, when possible.
As mentioned by @tannergooding, the register would need to be cleared when out of scope to stop tracking.
The method
A
from above could then become something like this:Here I just used
rbx
to store the pinned local (just picked the first callee-saved register). I do realize there's plenty of work to make this work and all the various GC data structures need to be updated accordingly to enable tracking, this is the general idea.Goes without saying that this optimization could bring some nice codegen/perf wins in interop-heavy code 😄
Additionally, given this could be used to restore the
RuntimeHelpers.GetHashCode
optimization by porting the happy path to C# (as I did in #55273, but possibly without the GC hole ahah), it would automatically speedup virtually every dictionary out there using random reference types as keys. Or, any other data structure that would callGetHashCode
at some point on an object that didn't override the defaultobject.GetHashCode
implementation.cc. @EgorBo @SingleAccretion
category:cq
theme:pinning
The text was updated successfully, but these errors were encountered: