-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
} | ||
else | ||
{ | ||
if (typeof(TKey) == typeof(byte)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that we want to use this pattern and modify every collection in CoreFX to use it. I think it is going in a wrong direction and only fixes the problem for primitive types.
Instead, the JIT should recognize EqualityComparer<TKey>.Default
as intrinsic - https://github.com/dotnet/coreclr/issues/6688 is about this. So this would be just: EqualityComparer.Default.Equals(key0, key1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #if EQUALITYCOMPARER_INTRINSIC
, that work which uses EqualityComparer<TKey>.Default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only fixes the problem for primitive types
Would be good if C# could express it for all types and all functions, not just relying on the Jit to pick up on EqualityComparer<TKey>.Default.Equals
only dotnet/csharplang#905
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas can you elaborate on what you have in mind for #6688? The source link in that issue seems not to work anymore.
Are you suggesting that IEqualityComparer<TKey>
's methods be marked as intrinsics, and then the jit can inject tests at call sites via this interface when TKey
is known to see if the value is the EqualityComparer<TKey>.Default
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This this is the file the orignal link pointed to EqualityComparerHelpers looks like some constrained static methods it would swap with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the jit can inject tests at call sites via this interface when TKey is known to see if the value is the EqualityComparer.Default
I meant that the JIT should recognize EqualityComparer<TKey>.Default.Equals(a,b)
pattern and replace it with the expanded code. For example, the following https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/ValueTuple.cs#L740 should be just comparing integers directly for tuple of integers. It does the virtual calls today.
If it is too hard to recognize EqualityComparer<TKey>.Default.Equals(a,b)
pattern in the JIT, I would be ok to add a static EqualityComparer<T>.Equals(a,b)
method and recognize that as intrinsic. It would need to be approved as public API to be widely usable and all relevant places changed to call it, so it is less preferred.
Dictionary`2:TryGetValue(char,byref):bool:this Before ; Assembly listing for method Dictionary`2:TryGetValue(char,byref):bool:this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 this [V00,T00] ( 4, 3.50) ref -> rsi this class-hnd
; V01 arg1 [V01,T02] ( 3, 3 ) char -> rdx
; V02 arg2 [V02,T01] ( 4, 3 ) byref -> rdi
; V03 loc0 [V03,T03] ( 4, 3 ) int -> rax
; V04 tmp0 [V04,T04] ( 3, 3 ) ref -> rdx
; V05 OutArgs [V05 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
;
; Lcl frame size = 40
G_M28467_IG01:
57 push rdi
56 push rsi
4883EC28 sub rsp, 40
488BF1 mov rsi, rcx
498BF8 mov rdi, r8
G_M28467_IG02:
0FB7D2 movzx rdx, dx
488BCE mov rcx, rsi
E800000000 call Dictionary`2:FindEntry(char):int:this
85C0 test eax, eax
7C29 jl SHORT G_M28467_IG04
488B5610 mov rdx, gword ptr [rsi+16]
3B4208 cmp eax, dword ptr [rdx+8]
732C jae SHORT G_M28467_IG06
4863C8 movsxd rcx, eax
488D0C49 lea rcx, [rcx+2*rcx]
488B54CA10 mov rdx, gword ptr [rdx+8*rcx+16]
488BCF mov rcx, rdi
E800000000 call CORINFO_HELP_CHECKED_ASSIGN_REF
B801000000 mov eax, 1
G_M28467_IG03:
4883C428 add rsp, 40
5E pop rsi
5F pop rdi
C3 ret
G_M28467_IG04:
33C0 xor rax, rax
488907 mov qword ptr [rdi], rax
G_M28467_IG05:
4883C428 add rsp, 40
5E pop rsi
5F pop rdi
C3 ret
G_M28467_IG06:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 86, prolog size 6 for method Dictionary`2:TryGetValue(char,byref):bool:this After ; Assembly listing for method Dictionary`2:TryGetValue(char,byref):bool:this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 this [V00,T00] ( 3, 3 ) ref -> rcx this class-hnd
; V01 arg1 [V01,T02] ( 3, 3 ) char -> rdx
; V02 arg2 [V02,T01] ( 3, 3 ) byref -> rsi
; V03 loc0 [V03,T06] ( 2, 1.50) byref -> rax
; V04 loc1 [V04 ] ( 3, 3 ) bool -> [rsp+0x28] do-not-enreg[X] addr-exposed ld-addr-op
;* V05 loc2 [V05 ] ( 0, 0 ) ref -> zero-ref ld-addr-op class-hnd
; V06 tmp0 [V06,T03] ( 3, 2 ) byref -> rsi
; V07 tmp1 [V07,T04] ( 3, 2 ) byref -> rcx
; V08 tmp2 [V08,T05] ( 3, 2 ) ref -> rdx
; V09 OutArgs [V09 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
;
; Lcl frame size = 48
G_M28468_IG01:
56 push rsi
4883EC30 sub rsp, 48
498BF0 mov rsi, r8
G_M28468_IG02:
0FB7D2 movzx rdx, dx
4C8D442428 lea r8, bword ptr [rsp+28H]
E800000000 call Dictionary`2:FindEntry(char,byref):byref:this
807C242800 cmp byte ptr [rsp+28H], 0
7507 jne SHORT G_M28468_IG03
488BCE mov rcx, rsi
33D2 xor rdx, rdx
EB06 jmp SHORT G_M28468_IG04
G_M28468_IG03:
488BCE mov rcx, rsi
488B10 mov rdx, gword ptr [rax]
G_M28468_IG04:
E800000000 call CORINFO_HELP_CHECKED_ASSIGN_REF
8B442428 mov eax, dword ptr [rsp+28H]
0FB6C0 movzx rax, al
G_M28468_IG05:
4883C430 add rsp, 48
5E pop rsi
C3 ret
; Total bytes of code 59, prolog size 5 for method Dictionary`2:TryGetValue(char,byref):bool:this |
Before Dictionary`2:FindEntry(char):int:this ; Assembly listing for method Dictionary`2:FindEntry(char):int:this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 this [V00,T03] ( 29, 53 ) ref -> rsi this class-hnd
; V01 arg1 [V01,T13] ( 7, 7.50) char -> rdx
; V02 loc0 [V02,T09] ( 14, 24.50) int -> rbx
; V03 loc1 [V03,T01] ( 25, 72 ) int -> rbp
;* V04 tmp0 [V04 ] ( 0, 0 ) ref -> zero-ref class-hnd exact
; V05 tmp1 [V05,T12] ( 13, 13 ) ref -> rcx
; V06 tmp2 [V06,T11] ( 18, 18 ) int -> rdx
; V07 tmp3 [V07,T00] ( 15,120 ) ref -> rcx
; V08 tmp4 [V08,T10] ( 4, 16 ) ref -> [rsp+0x28]
; V09 tmp5 [V09,T06] ( 10, 40 ) ref -> rdx
; V10 tmp6 [V10,T07] ( 10, 40 ) int -> r8
; V11 tmp7 [V11,T02] ( 9, 72 ) ref -> rax
; V12 OutArgs [V12 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
; V13 cse0 [V13,T04] ( 15, 50 ) byref -> rcx
; V14 cse1 [V14,T05] ( 11, 44 ) long -> r14
; V15 cse2 [V15,T15] ( 17, 8.50) int -> r8
; V16 cse3 [V16,T08] ( 9, 30 ) ref -> rcx
; V17 cse4 [V17,T16] ( 11, 5.50) ref -> rax
; V18 cse5 [V18,T14] ( 9, 9 ) int -> rdi
;
; Lcl frame size = 48
G_M38464_IG01:
4156 push r14
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC30 sub rsp, 48
488BF1 mov rsi, rcx
G_M38464_IG02:
48837E0800 cmp gword ptr [rsi+8], 0
0F848C000000 je G_M38464_IG05
488B4E18 mov rcx, gword ptr [rsi+24]
0FB7FA movzx rdi, dx
8BD7 mov edx, edi
4C8D1D00000000 lea r11, [(reloc)]
3909 cmp dword ptr [rcx], ecx
41FF13 call qword ptr [r11]IEqualityComparer`1:GetHashCode(char):int:this
8BD8 mov ebx, eax
81E3FFFFFF7F and ebx, 0x7FFFFFFF
488B4608 mov rax, gword ptr [rsi+8]
488BC8 mov rcx, rax
448B4008 mov r8d, dword ptr [rax+8]
8BC3 mov eax, ebx
99 cdq
41F7F8 idiv edx:eax, r8d
413BD0 cmp edx, r8d
7376 jae SHORT G_M38464_IG09
4863D2 movsxd rdx, edx
8B6C9110 mov ebp, dword ptr [rcx+4*rdx+16]
85ED test ebp, ebp
7C4E jl SHORT G_M38464_IG05
G_M38464_IG03:
488B4E10 mov rcx, gword ptr [rsi+16]
3B6908 cmp ebp, dword ptr [rcx+8]
7362 jae SHORT G_M38464_IG09
4863D5 movsxd rdx, ebp
4C8D3452 lea r14, [rdx+2*rdx]
4A8D4CF110 lea rcx, bword ptr [rcx+8*r14+16]
395908 cmp dword ptr [rcx+8], ebx
7522 jne SHORT G_M38464_IG04
488B5618 mov rdx, gword ptr [rsi+24]
440FB74110 movzx r8, word ptr [rcx+16]
488BCA mov rcx, rdx
418BD0 mov edx, r8d
448BC7 mov r8d, edi
4C8D1D00000000 lea r11, [(reloc)]
3909 cmp dword ptr [rcx], ecx
41FF13 call qword ptr [r11]IEqualityComparer`1:Equals(char,char):bool:this
85C0 test eax, eax
7522 jne SHORT G_M38464_IG07
G_M38464_IG04:
488B4610 mov rax, gword ptr [rsi+16]
3B6808 cmp ebp, dword ptr [rax+8]
7326 jae SHORT G_M38464_IG09
428B6CF01C mov ebp, dword ptr [rax+8*r14+28]
85ED test ebp, ebp
7DB2 jge SHORT G_M38464_IG03
G_M38464_IG05:
B8FFFFFFFF mov eax, -1
G_M38464_IG06:
4883C430 add rsp, 48
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
C3 ret
G_M38464_IG07:
8BC5 mov eax, ebp
G_M38464_IG08:
4883C430 add rsp, 48
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
C3 ret
G_M38464_IG09:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 199, prolog size 13 for method Dictionary`2:FindEntry(char):int:this After DefaultComparer Dictionary`2:FindEntry(char,byref):byref:this Marking Dictionary`2:FindEntry(char,byref,ref):byref:this as NOINLINE because of too many il bytes
Marking Dictionary`2:FindEntry(char,byref,ref):byref:this as NOINLINE because of too many il bytes
**************** Inline Tree
Inlines into 060039B9 Dictionary`2:FindEntry(char,byref):byref:this
[0 IL=0028 TR=000079 060039BA] [FAILED: too many il bytes] Dictionary`2:FindEntry(char,byref,ref):byref:this
[0 IL=0038 TR=000034 060039BA] [FAILED: too many il bytes] Dictionary`2:FindEntry(char,byref,ref):byref:this
Budget: initialTime=192, finalTime=192, initialBudget=1920, currentBudget=1920
Budget: initialSize=1134, finalSize=1134
; Assembly listing for method Dictionary`2:FindEntry(char,byref):byref:this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 this [V00,T00] ( 9, 6 ) ref -> rsi this class-hnd
; V01 arg1 [V01,T02] ( 4, 3 ) char -> rbx
; V02 arg2 [V02,T01] ( 4, 3 ) byref -> rdi
; V03 loc0 [V03,T07] ( 3, 2.50) ref -> rbp class-hnd
;* V04 tmp0 [V04 ] ( 0, 0 ) ref -> zero-ref class-hnd exact
; V05 tmp1 [V05,T03] ( 4, 3.50) long -> rcx
; V06 tmp2 [V06,T04] ( 4, 3.50) long -> rdx
; V07 tmp3 [V07,T05] ( 4, 3.50) long -> rcx
; V08 tmp4 [V08,T06] ( 4, 3.50) long -> rdx
; V09 OutArgs [V09 ] ( 1, 1 ) lclBlk (40) [rsp+0x00]
;
; Lcl frame size = 56
G_M38464_IG01:
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC38 sub rsp, 56
48894C2430 mov qword ptr [rsp+30H], rcx
488BF1 mov rsi, rcx
8BDA mov ebx, edx
498BF8 mov rdi, r8
G_M38464_IG02:
488B6E18 mov rbp, gword ptr [rsi+24]
4885ED test rbp, rbp
7543 jne SHORT G_M38464_IG05
488B0E mov rcx, qword ptr [rsi]
488B5130 mov rdx, qword ptr [rcx+48]
488B12 mov rdx, qword ptr [rdx]
488B5228 mov rdx, qword ptr [rdx+40]
4885D2 test rdx, rdx
750F jne SHORT G_M38464_IG03
488D1500000000 lea rdx, [(reloc)]
E800000000 call CORINFO_HELP_RUNTIMEHANDLE_CLASS
488BD0 mov rdx, rax
G_M38464_IG03:
4533C0 xor r8, r8
4C89442420 mov gword ptr [rsp+20H], r8
440FB7C3 movzx r8, bx
488BCE mov rcx, rsi
4C8BCF mov r9, rdi
E800000000 call Dictionary`2:FindEntry(char,byref,ref):byref:this
90 nop
G_M38464_IG04:
4883C438 add rsp, 56
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
C3 ret
G_M38464_IG05:
488B0E mov rcx, qword ptr [rsi]
488B5130 mov rdx, qword ptr [rcx+48]
488B12 mov rdx, qword ptr [rdx]
488B5220 mov rdx, qword ptr [rdx+32]
4885D2 test rdx, rdx
750F jne SHORT G_M38464_IG06
488D1500000000 lea rdx, [(reloc)]
E800000000 call CORINFO_HELP_RUNTIMEHANDLE_CLASS
488BD0 mov rdx, rax
G_M38464_IG06:
48896C2420 mov gword ptr [rsp+20H], rbp
440FB7C3 movzx r8, bx
488BCE mov rcx, rsi
4C8BCF mov r9, rdi
E800000000 call Dictionary`2:FindEntry(char,byref,ref):byref:this
90 nop
G_M38464_IG07:
4883C438 add rsp, 56
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
C3 ret
; Total bytes of code 161, prolog size 13 for method Dictionary`2:FindEntry(char,byref):byref:this Dictionary`2:FindEntry(char,byref,ref):byref:this Marking Dictionary`2:FindEntry(char,byref,ref):byref:this as NOINLINE because of too many il bytes
Devirtualized virtual call to EqualityComparer`1:Equals; now direct call to GenericEqualityComparer`1:Equals [exact]
Successfully inlined Char:GetHashCode():int:this (9 IL bytes) (depth 1) [below ALWAYS_INLINE size]
Successfully inlined HashHelpers:FindBucket(int,int):int (4 IL bytes) (depth 1) [below ALWAYS_INLINE size]
Successfully inlined EqualityComparer`1:get_Default():ref (6 IL bytes) (depth 1) [below ALWAYS_INLINE size]
Successfully inlined GenericEqualityComparer`1:Equals(char,char):bool:this (45 IL bytes) (depth 1) [aggressive inline attribute]
Successfully inlined Char:Equals(char):bool:this (6 IL bytes) (depth 2) [below ALWAYS_INLINE size]
**************** Inline Tree
Inlines into 060039BA Dictionary`2:FindEntry(char,byref,ref):byref:this
[1 IL=0053 TR=000044 060010FC] [below ALWAYS_INLINE size] Char:GetHashCode():int:this
[2 IL=0120 TR=000063 06003858] [below ALWAYS_INLINE size] HashHelpers:FindBucket(int,int):int
[3 IL=0175 TR=000110 060039D7] [below ALWAYS_INLINE size] EqualityComparer`1:get_Default():ref
[4 IL=0188 TR=000116 060039E0] [aggressive inline attribute] GenericEqualityComparer`1:Equals(char,char):bool:this
[5 IL=0025 TR=000200 060010FE] [below ALWAYS_INLINE size] Char:Equals(char):bool:this
[0 IL=0260 TR=000023 060039BB] [FAILED: noinline per IL/cached result] Dictionary`2:get_NotFound():byref:this
Budget: initialTime=858, finalTime=928, initialBudget=8580, currentBudget=8656
Budget: increased by 76 because of force inlines
Budget: initialSize=6196, finalSize=6196
; Assembly listing for method Dictionary`2:FindEntry(char,byref,ref):byref:this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 this [V00,T09] ( 6, 4.50) ref -> rcx this class-hnd
;* V01 TypeCtx [V01 ] ( 0, 0 ) long -> zero-ref
; V02 arg1 [V02,T08] ( 5, 5 ) char -> [rsp+0x50] do-not-enreg[F] ld-addr-op
; V03 arg2 [V03,T12] ( 4, 3.50) byref -> r9
;* V04 arg3 [V04 ] ( 0, 0 ) ref -> zero-ref class-hnd
; V05 loc0 [V05,T10] ( 10, 6 ) ref -> r8 class-hnd
; V06 loc1 [V06,T06] ( 7, 7 ) int -> rsi
; V07 loc2 [V07,T02] ( 6, 10 ) ref -> rdi class-hnd
; V08 loc3 [V08,T00] ( 11, 26.50) int -> rbx
; V09 loc4 [V09,T01] ( 6, 18.50) byref -> rdx
; V10 loc5 [V10,T07] ( 7, 7 ) int -> r11
; V11 tmp1 [V11,T13] ( 2, 4 ) bool -> r8
; V12 tmp2 [V12,T04] ( 2, 8 ) char -> r8 ld-addr-op
;* V13 tmp3 [V13 ] ( 0, 0 ) ref -> zero-ref class-hnd exact
; V14 tmp4 [V14,T05] ( 2, 8 ) char -> r10 ld-addr-op
;* V15 tmp5 [V15 ] ( 0, 0 ) ref -> zero-ref class-hnd exact
;* V16 tmp6 [V16 ] ( 0, 0 ) char -> zero-ref ld-addr-op
; V17 OutArgs [V17 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
; V18 cse0 [V18,T11] ( 11, 5.50) int -> rdx
; V19 cse1 [V19,T03] ( 5, 9.50) int -> rax
; V20 cse2 [V20,T14] ( 6, 3 ) int -> r10
;
; Lcl frame size = 32
G_M37161_IG01:
57 push rdi
56 push rsi
53 push rbx
4883EC20 sub rsp, 32
4489442450 mov dword ptr [rsp+50H], r8d
G_M37161_IG02:
41C60101 mov byte ptr [r9], 1
4C8B4108 mov r8, gword ptr [rcx+8]
4D85C0 test r8, r8
7477 je SHORT G_M37161_IG05
0FB7442450 movzx rax, word ptr [rsp+50H]
0FB7542450 movzx rdx, word ptr [rsp+50H]
C1E210 shl edx, 16
0BC2 or eax, edx
8BF0 mov esi, eax
81E6FFFFFF7F and esi, 0x7FFFFFFF
488B7910 mov rdi, gword ptr [rcx+16]
458B5008 mov r10d, dword ptr [r8+8]
458BDA mov r11d, r10d
8BC6 mov eax, esi
33D2 xor rdx, rdx
41F7F3 div edx:eax, r11d
413BD2 cmp edx, r10d
7366 jae SHORT G_M37161_IG09
4863C2 movsxd rax, edx
418B5C8010 mov ebx, dword ptr [r8+4*rax+16]
85DB test ebx, ebx
7C3D jl SHORT G_M37161_IG05
8B4708 mov eax, dword ptr [rdi+8]
G_M37161_IG03:
3BD8 cmp ebx, eax
7353 jae SHORT G_M37161_IG09
4863D3 movsxd rdx, ebx
488D1452 lea rdx, [rdx+2*rdx]
488D54D710 lea rdx, bword ptr [rdi+8*rdx+16]
397208 cmp dword ptr [rdx+8], esi
751E jne SHORT G_M37161_IG04
440FB74210 movzx r8, word ptr [rdx+16]
448B542450 mov r10d, dword ptr [rsp+50H]
450FB7D2 movzx r10, r10w
453BC2 cmp r8d, r10d
410F94C0 sete r8b
450FB6C0 movzx r8, r8b
4585C0 test r8d, r8d
7519 jne SHORT G_M37161_IG07
G_M37161_IG04:
8B5A0C mov ebx, dword ptr [rdx+12]
85DB test ebx, ebx
7DC6 jge SHORT G_M37161_IG03
G_M37161_IG05:
41C60100 mov byte ptr [r9], 0
E800000000 call Dictionary`2:get_NotFound():byref:this
90 nop
G_M37161_IG06:
4883C420 add rsp, 32
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
G_M37161_IG07:
488BC2 mov rax, rdx
G_M37161_IG08:
4883C420 add rsp, 32
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
G_M37161_IG09:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 179, prolog size 12 for method Dictionary`2:FindEntry(char,byref,ref):byref:this |
The two functions inline for structs; but for reference keys fail with
Equals is a bit weird; but Adjusting code |
9118ff4
to
ed66712
Compare
In the cases where key or value are reference types, we'll generate a shared methods. In shared methods, inlining generic methods from some other class is not supported. Would have liked to see the jit fare better in this stretch (char compare)... actually not clear what is going on here, seems like the jit is comparing words and bytes. 450FB75110 movzx r10, word ptr [r9+16]
448B5C2468 mov r11d, dword ptr [rsp+68H]
450FB7DB movzx r11, r11w
453BD3 cmp r10d, r11d
410F94C2 sete r10b
450FB6D2 movzx r10, r10b
4585D2 test r10d, r10d |
Let me re-run the tests locally; from the failures seem to have broken something from a last minute late night change... 😒 |
I see that it compares 2 words widened to dword and then compares to result of the previous comparation with 0. Looks like the old #914 |
IEqualityComparer<TKey> customComparer = _customComparer; | ||
if (customComparer == null) | ||
{ | ||
if (default(TKey) == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. result = key?.GetHashCode() ?? 0;
should work well for all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also key is checked everywhere before being added; and throws ArgumentNullExceptions so can just be key.GetHashCode()
?
Testing error is definitely caused by the code change; though is something I haven't seen before
|
Guessing the Dictionary is used in generating the Exception; so throws another exception... etc |
Missing |
Is this a recognized bounds check eliminating pattern? var buckets = _buckets;
var val = buckets[(uint)hashCode % buckets.Length] |
Not sure, but would guess it's not. Will check in later today if you don't. It should be, as well as the related |
Nope. On my things to look at list :). Watch out with the cast to |
ed66712
to
31ba5dd
Compare
Updated asm with current; doesn't look like |
2 items not currently covered. Nullables - I think they will go class path with Remove functions... I've split all the traversal functions into 2 to hoist branching on comparer. However Remove started as 2 traversal functions (for the |
Dictionary``2:TryInsertDefaultComparer(ref,ref,ubyte):bool:thiss has three exit unwinds which makes me 😢 G_M24340_IG01:
4157 push r15
4156 push r14
4155 push r13
4154 push r12
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC38 sub rsp, 56
488BF1 mov rsi, rcx
488BFA mov rdi, rdx
498BD8 mov rbx, r8
418BE9 mov ebp, r9d
G_M24340_IG02:
4C8B7608 mov r14, gword ptr [rsi+8]
4D85F6 test r14, r14
750D jne SHORT G_M24340_IG03
488BCE mov rcx, rsi
33D2 xor edx, edx
E800000000 call Dictionary`2:Initialize(int):ref:this
4C8BF0 mov r14, rax
G_M24340_IG03:
488BCF mov rcx, rdi
488B07 mov rax, qword ptr [rdi]
488B4040 mov rax, qword ptr [rax+64]
FF5010 call qword ptr [rax+16]Object:GetHashCode():int:this
448BF8 mov r15d, eax
4181E7FFFFFF7F and r15d, 0x7FFFFFFF
4C8B6610 mov r12, gword ptr [rsi+16]
4533ED xor r13d, r13d
418B4E08 mov ecx, dword ptr [r14+8]
418BC7 mov eax, r15d
33D2 xor rdx, rdx
F7F1 div edx:eax, ecx
3BD1 cmp edx, ecx
0F8309000000 jae G_M24340_IG15
4863CA movsxd rcx, edx
418B448E10 mov eax, dword ptr [r14+4*rcx+16]
85C0 test eax, eax
7C4B jl SHORT G_M24340_IG06
G_M24340_IG04:
413B442408 cmp eax, dword ptr [r12+8]
0F8309000000 jae G_M24340_IG15
4863C8 movsxd rcx, eax
488D0C49 lea rcx, [rcx+2*rcx]
498D44CC10 lea rax, bword ptr [r12+8*rcx+16]
4889442428 mov bword ptr [rsp+28H], rax
4C8BC0 mov r8, rax
45397810 cmp dword ptr [r8+16], r15d
751A jne SHORT G_M24340_IG05
4C89442430 mov bword ptr [rsp+30H], r8
498B08 mov rcx, gword ptr [r8]
488BD7 mov rdx, rdi
4C8B09 mov r9, qword ptr [rcx]
4D8B4940 mov r9, qword ptr [r9+64]
41FF5108 call qword ptr [r9+8]Object:Equals(ref):bool:this
85C0 test eax, eax
7538 jne SHORT G_M24340_IG07
G_M24340_IG05:
488B442428 mov rax, bword ptr [rsp+28H]
8B4014 mov eax, dword ptr [rax+20]
85C0 test eax, eax
7DB5 jge SHORT G_M24340_IG04
G_M24340_IG06:
8B5644 mov edx, dword ptr [rsi+68]
85D2 test edx, edx
7E67 jle SHORT G_M24340_IG11
448B6E40 mov r13d, dword ptr [rsi+64]
453B6C2408 cmp r13d, dword ptr [r12+8]
0F8309000000 jae G_M24340_IG15
4963CD movsxd rcx, r13d
488D0C49 lea rcx, [rcx+2*rcx]
418B4CCC24 mov ecx, dword ptr [r12+8*rcx+36]
894E40 mov dword ptr [rsi+64], ecx
FFCA dec edx
895644 mov dword ptr [rsi+68], edx
EB6E jmp SHORT G_M24340_IG13
G_M24340_IG07:
4C8B442430 mov r8, bword ptr [rsp+30H]
440FB6E5 movzx r12, bpl
4183FC01 cmp r12d, 1
7515 jne SHORT G_M24340_IG08
498D4808 lea rcx, bword ptr [r8+8]
488BD3 mov rdx, rbx
E800000000 call CORINFO_HELP_CHECKED_ASSIGN_REF
FF463C inc dword ptr [rsi+60]
41BD01000000 mov r13d, 1
G_M24340_IG08:
4183FC02 cmp r12d, 2
0F8400000000 je G_M24340_IG14
G_M24340_IG09:
418BC5 mov eax, r13d
G_M24340_IG10:
4883C438 add rsp, 56
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415C pop r12
415D pop r13
415E pop r14
415F pop r15
C3 ret
G_M24340_IG11:
8B6E38 mov ebp, dword ptr [rsi+56]
41396C2408 cmp dword ptr [r12+8], ebp
751C jne SHORT G_M24340_IG12
8BCD mov ecx, ebp
E800000000 call HashHelpers:ExpandPrime(int):int
8BD0 mov edx, eax
488BCE mov rcx, rsi
4533C0 xor r8d, r8d
E800000000 call Dictionary`2:ResizeDefaultComparer(int,bool):this
4C8B7608 mov r14, gword ptr [rsi+8]
4C8B6610 mov r12, gword ptr [rsi+16]
G_M24340_IG12:
448BED mov r13d, ebp
FF4638 inc dword ptr [rsi+56]
G_M24340_IG13:
418B4E08 mov ecx, dword ptr [r14+8]
418BC7 mov eax, r15d
33D2 xor rdx, rdx
F7F1 div edx:eax, ecx
3BD1 cmp edx, ecx
0F8309000000 jae G_M24340_IG15
4863D2 movsxd rdx, edx
498D6C9610 lea rbp, bword ptr [r14+4*rdx+16]
453B6C2408 cmp r13d, dword ptr [r12+8]
0F8309000000 jae G_M24340_IG15
4963D5 movsxd rdx, r13d
488D1452 lea rdx, [rdx+2*rdx]
4D8D74D410 lea r14, bword ptr [r12+8*rdx+16]
45897E10 mov dword ptr [r14+16], r15d
8B5500 mov edx, dword ptr [rbp]
41895614 mov dword ptr [r14+20], edx
498BCE mov rcx, r14
488BD7 mov rdx, rdi
E800000000 call CORINFO_HELP_CHECKED_ASSIGN_REF
498D4E08 lea rcx, bword ptr [r14+8]
488BD3 mov rdx, rbx
E800000000 call CORINFO_HELP_CHECKED_ASSIGN_REF
44896D00 mov dword ptr [rbp], r13d
FF463C inc dword ptr [rsi+60]
41BD01000000 mov r13d, 1
E95AFFFFFF jmp G_M24340_IG09
************** Beginning of cold code **************
G_M24340_IG14:
488BCF mov rcx, rdi
E800000000 call ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(ref)
CC int3
G_M24340_IG15:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 457, prolog size 28 for method Dictionary`2:TryInsertDefaultComparer(ref,ref,ubyte):bool:this |
Non-ref version G_M11705_IG01:
4157 push r15
4156 push r14
4155 push r13
4154 push r12
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC48 sub rsp, 72
488BF1 mov rsi, rcx
488BFA mov rdi, rdx
418BD8 mov ebx, r8d
418BE9 mov ebp, r9d
G_M11705_IG02:
4C8B7608 mov r14, gword ptr [rsi+8]
4D85F6 test r14, r14
750D jne SHORT G_M11705_IG03
488BCE mov rcx, rsi
33D2 xor edx, edx
E800000000 call Dictionary`2:Initialize(int):ref:this
4C8BF0 mov r14, rax
G_M11705_IG03:
488BC7 mov rax, rdi
48C1F820 sar rax, 32
33C7 xor eax, edi
448BF8 mov r15d, eax
4181E7FFFFFF7F and r15d, 0x7FFFFFFF
4C8B6610 mov r12, gword ptr [rsi+16]
4533ED xor r13d, r13d
418B4E08 mov ecx, dword ptr [r14+8]
418BC7 mov eax, r15d
33D2 xor rdx, rdx
F7F1 div edx:eax, ecx
3BD1 cmp edx, ecx
0F8319000000 jae G_M11705_IG15
4863CA movsxd rcx, edx
418B448E10 mov eax, dword ptr [r14+4*rcx+16]
85C0 test eax, eax
7C3F jl SHORT G_M11705_IG06
418B4C2408 mov ecx, dword ptr [r12+8]
G_M11705_IG04:
3BC1 cmp eax, ecx
0F8319000000 jae G_M11705_IG15
4863C0 movsxd rax, eax
488D0440 lea rax, [rax+2*rax]
498D44C410 lea rax, bword ptr [r12+8*rax+16]
488BD0 mov rdx, rax
44393A cmp dword ptr [rdx], r15d
7517 jne SHORT G_M11705_IG05
4C8B4208 mov r8, qword ptr [rdx+8]
4C8BCF mov r9, rdi
4D3BC1 cmp r8, r9
410F94C0 sete r8b
450FB6C0 movzx r8, r8b
4585C0 test r8d, r8d
7536 jne SHORT G_M11705_IG07
G_M11705_IG05:
8B4004 mov eax, dword ptr [rax+4]
85C0 test eax, eax
7DC6 jge SHORT G_M11705_IG04
G_M11705_IG06:
8B4E44 mov ecx, dword ptr [rsi+68]
85C9 test ecx, ecx
7E5A jle SHORT G_M11705_IG11
448B6E40 mov r13d, dword ptr [rsi+64]
418B6C2408 mov ebp, dword ptr [r12+8]
443BED cmp r13d, ebp
0F8319000000 jae G_M11705_IG15
4963C5 movsxd rax, r13d
488D0440 lea rax, [rax+2*rax]
418B44C414 mov eax, dword ptr [r12+8*rax+20]
894640 mov dword ptr [rsi+64], eax
FFC9 dec ecx
894E44 mov dword ptr [rsi+68], ecx
EB60 jmp SHORT G_M11705_IG13
G_M11705_IG07:
400FB6CD movzx rcx, bpl
83F901 cmp ecx, 1
750C jne SHORT G_M11705_IG08
885A10 mov byte ptr [rdx+16], bl
FF463C inc dword ptr [rsi+60]
41BD01000000 mov r13d, 1
G_M11705_IG08:
83F902 cmp ecx, 2
0F8400000000 je G_M11705_IG14
G_M11705_IG09:
418BC5 mov eax, r13d
G_M11705_IG10:
4883C448 add rsp, 72
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415C pop r12
415D pop r13
415E pop r14
415F pop r15
C3 ret
G_M11705_IG11:
448B6E38 mov r13d, dword ptr [rsi+56]
418B6C2408 mov ebp, dword ptr [r12+8]
413BED cmp ebp, r13d
751D jne SHORT G_M11705_IG12
418BCD mov ecx, r13d
E800000000 call HashHelpers:ExpandPrime(int):int
8BD0 mov edx, eax
488BCE mov rcx, rsi
4533C0 xor r8d, r8d
E800000000 call Dictionary`2:ResizeDefaultComparer(int,bool):this
4C8B7608 mov r14, gword ptr [rsi+8]
4C8B6610 mov r12, gword ptr [rsi+16]
G_M11705_IG12:
FF4638 inc dword ptr [rsi+56]
G_M11705_IG13:
418B4E08 mov ecx, dword ptr [r14+8]
418BC7 mov eax, r15d
33D2 xor rdx, rdx
F7F1 div edx:eax, ecx
3BD1 cmp edx, ecx
0F8319000000 jae G_M11705_IG15
4863C2 movsxd rax, edx
498D448610 lea rax, bword ptr [r14+4*rax+16]
453B6C2408 cmp r13d, dword ptr [r12+8]
0F8319000000 jae G_M11705_IG15
4963CD movsxd rcx, r13d
488D0C49 lea rcx, [rcx+2*rcx]
498D4CCC10 lea rcx, bword ptr [r12+8*rcx+16]
448939 mov dword ptr [rcx], r15d
8B10 mov edx, dword ptr [rax]
895104 mov dword ptr [rcx+4], edx
48897908 mov qword ptr [rcx+8], rdi
885910 mov byte ptr [rcx+16], bl
448928 mov dword ptr [rax], r13d
FF463C inc dword ptr [rsi+60]
41BD01000000 mov r13d, 1
E96CFFFFFF jmp G_M11705_IG09
************** Beginning of cold code **************
G_M11705_IG14:
488D0D00000000 lea rcx, [(reloc)]
E800000000 call CORINFO_HELP_NEWSFAST
488BC8 mov rcx, rax
48897908 mov qword ptr [rcx+8], rdi
E800000000 call ThrowHelper:ThrowAddingDuplicateWithKeyArgumentException(ref)
CC int3
G_M11705_IG15:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 426, prolog size 28 for method Dictionary`2:TryInsertDefaultComparer(long,bool,ubyte):bool:this |
In Dictionary``2:FindEntryDefaultComparer(char,byref):byref:this This is a bit weird
|
Well, it does... in my branch - mikedn@a587bd3. Saves ~1K in corelib, all from existing hashtable code. Could save a bit more (in Linq's Lookup for example) if parse VN PHIs. They're a bit convoluted but it seems possible. |
Added a goto 5e5affb 😢 which removes the multiple returns for |
cc: @JosephTremoulet |
Yeah, we still have the heuristic that leaves the epilogs separate if you don't have more than four of them in the method. I'd expect us to move them out of the loop, though, if the jit recognizes it as a loop; @benaadams is that not what you were seeing? (I don't have a copy of the pre-edit asm) |
Its uses a weird loop increment for a for; is more of a while Before goto asm is here https://gist.github.com/benaadams/f32d9468281e37c0b8f32dcdceb718d2 |
Ok, so that did place the epilogs (which are under |
What you are saying is I need to put another return in? 😉 |
Thanks! For reference: This is what can generate those numbers, but take it no account that it assumes 32 bit multiplication into separate 32 bit words, and shifting of the high-word, so when doing it with 64 bit multiplication you need to shift by an additional 32 bits to the right... I wonder if it would make sense to generate an huge switch case with the literal code or store a table like the cpp code I referenced does... |
Couple of posts above
|
Would need 2 numbers or 1 plus an array lookup with extra bounds check; also no way to elide the modulus based bounds check in future. |
@mikedn thanks, missed it... Wouldn't adding a single index entry to the Dictionary be worth it? especially if the index is basically just used in a big-ole switch inside HashHelpers a-la:
|
Hard to say. Dictionaries are very common, it is possible that increasing the object size by 8 bytes does matter in some scenarios. And at the same time the faster modulus operation may not matter in some scenarios. It's the usual tradeoff - memory vs. CPU...
In the best case that switch will turn into a binary search tree - a bunch of compares and branches. That will add a few cycles to the magic modulus operation. Magic modulus is faster than the normal div instruction but it's not that fast and if you add more cycles, well... you may end up were you started. And this doesn't include the cost of branch mispredictions which may be high in this case... |
I thought that's why we are talking about adding another byte/int to the dictionary, to avoid the binary search tree and turn that switch into a simple lookup table based on the prime index, otherwise, why would you want to add the additional byte/int to begin with..., I assumed it would serve as a "prime index" |
What is "a simple lookup table"? I'm pretty sure I know what you mean but I don't see how it could be useful in this case from a performance point of view. And for clarity - the binary search tree I'm talking about looks something like if (x < 123) {
if (x == 7) {
}
else if (x == 42) {
}
}
else if (x == 123) {
}
else {
if (x == 547) {
}
} The C# compiler generates this kind of code when a normal switch instruction cannot be used because the jump table would be too big/sparse. |
I was assuming that the extra int we were discussing would just be and index into a prime table, where the prime numbers currently reside in If it's a plain index running sequentially, then a switch would/should be rendered into a simple jump table... unless I'm totally off base. The other option would be to NOT store and extra int, and use the array's |
Why store an index instead of storing the magic number directly? Granted, the magic number is an int and then we need another int for the shift so we really need 2 ints. But on 64 bit targets it doesn't matter adding a single int increases the object size by 8 bytes due to alignment requirements. So storing an index is useful only on 32 bit targets and it also cost you an indirection to read the actual values from the table. |
Good question, I guess my hunch/assumption is that storing two members and using them to do |
Yeah, you might save couple of cycles in some cases if you do that. And you could take advantage of the fact that RyuJIT already does magic division and write something like: switch (buckets.Length) {
case 3:
return i % 3;
case 5:
return i % 5;
...
default:
return i % buckets.Length;
} But we still have the problem that such a switch results in a bunch of conditional branches and possible suffer from bad branch prediction. |
@damageboy getting the 2 values is likely to be far more efficient (I tried for something different where I needed to get rid of an
|
Rebased, updated some asm #14030 (comment) The EqualityComparer is devirtualizing
However the find is a two call step via The code can be improved further with So will put on hold for now; then reevaluate after those features are available |
Closing in preference of #15411 |
Had a go at using magic division for rehash and expand in #15453 since they are bulk operations |
Returning to the idea of magic division instead of mod... It that over thinking it? Could just use static int LzcntMod(int value, int prime)
{
var mask = (1 << 32 - (int)Lzcnt.LeadingZeroCount((uint)prime)) - 1;
var index = value & mask;
if (index >= prime)
{
index -= prime;
}
return index;
} Mod G_M55235_IG03:
418BC0 mov eax, r8d
99 cdq
F7F9 idiv edx:eax, ecx LzcntMod G_M38796_IG03:
F30FBDCA lzcnt ecx, edx
F7D9 neg ecx
83C120 add ecx, 32
41B901000000 mov r9d, 1
41D3E1 shl r9d, cl
418D49FF lea ecx, [r9-1]
23C8 and ecx, eax
3BCA cmp ecx, edx
7C02 jl SHORT G_M38796_IG04
2BCA sub ecx, edx Results
Would that work? (Could even be a Jit time optimization for variable divisor?) *crossgen issues: intrinsic test needs to be inline, or won't inline - method call will eliminate most of the gains |
nvm - I went a bit crazy :) |
@benaadams the benchmark you mention #14030 (comment) do you have the code for that somewhere? |
Wonder if https://arxiv.org/abs/1902.01961 might give us some new ideas for trying to get rid of that pesky divide. |
User from reddit points out they gave the 64bit version for what the paper is doing as an answer on SE.Math in 2015 |
Isn't constant, should would need to carry the shape or pointer to lookup in the dict and pull it in with the prime. IDIV is ~20-27+ cycle latency so definitely has legs |
Yes, it's not clear if there's a win to be had here -- the need to have a lookup table or similar means for getting at the magic values is a drawback. But now the mod might need just one multiply, not two. |
Still looks to use two multiplys; but drops everything else: // GCC 6.2 | // Clang 4.0 | // our fast version + GCC 6.2
mov eax , edi | mov eax , edi | movabs rax , 194176253407468965
mov edx , 1491936009 | imul rax , rax , 1491936009 | mov edi , edi
mul edx | shr rax , 32 | imul rdi , rax
mov eax , edi | mov ecx , edi | mov eax , 95
sub eax , edx | sub ecx , eax | mul rdi
shr eax | shr ecx | mov rax , rdx
add eax , edx | add ecx , eax | //
shr eax , 6 | shr ecx , 6 | //
imul eax , eax , 95 | imul eax , ecx , 95 | //
sub edi , eax | sub edi , eax | //
mov eax | mov eax , edi | // |
Incorporate @AndyAyersMS Potential CQ improvements in Dictionary<K,T>
For default comparer, use regular inline method calls rather than a virtual function call on interface type
Split default and custom comparer paths
For BCL ValueType primitives
byte
,sbyte
,ushort
,short
,uint
,int
,ulong
,long
,IntPtr
,UIntPtr
,Guid
cast via object to the type and use .Equals directly them directly; then rely on the Jit to elide the boxing - meaning the mostly completely inline (e.g. Guid doesn't; though isn't a virtual call)Find the entry and pass that back using ref returns; rather than index to array that then needs to be looked up again.
More extensive use of ref locals.
Will follow up with metrics