-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize if range check to replace jumps to bit operation #87656
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIt detects if statement with specifc range check patterns and optimizes them to use bit operation instead of jumps for each pattern. Pattern 1 optimized in this PR:
Before:
After:
Contributes to #8418.
|
Reference: https://egorbo.com/llvm-range-checks.html |
@JulieLeeMSFT you can set the code blocks to colorize, I modified your top post, basically just put C# or asm immediately after the first three ` |
It's an interesting trick we can use in c# and the range doesn't have to be contiguous (!), simply no more than register width between highest and lowest values. @stephentoub does Regex take advantage of this already? Perhaps it's an element of how searching is already optimized? |
@MihaZupan but assuming this is already well known. |
In concept. It generates a branch-free version, e.g. for the set static bool IsReservedPattern(int a)
{
uint charMinusLowUInt32;
return ((int)((0xD4000000U << (short)(charMinusLowUInt32 = (ushort)(a - 'a'))) & (charMinusLowUInt32 - 32)) < 0);
} C.IsReservedPattern(Int32)
L0000: add ecx, 0xffffff9f
L0003: movzx eax, cx
L0006: movsx rdx, ax
L000a: mov ecx, 0xd4000000
L000f: shlx edx, ecx, edx
L0014: add eax, 0xffffffe0
L0017: and eax, edx
L0019: shr eax, 0x1f
L001c: ret The code for this in the source generator is here: runtime/src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Lines 4857 to 4920 in 05e25ff
|
Yes, I tested the code for non contiguous values and unsorted values. It works now. I am checking some other details. |
gcc emits closer to what Julie does: https://godbolt.org/z/rj1cPYz1a, basically, a single jump to filter out obviously wrong values and then a bit test (bt). No idea what's faster in reality. Although, it's not related to this PR directly, because what it does is it just emits a SWITCH opcode instead of series of comparisons |
Other patterns tested: Pattern 2: handles character check. The first condition with GT_COMMA is skipped, and the rest of conditions are optimized. if (c == 'a' || c == 'b' || c == 'd' || c == 'f')
{
return true;
}
return false; Pattern 3: handles unsorted patterns. if (a == 102 || a == 97 || a == 98 || a == 100)
return true;
return false; Pattern 4: handles min and max limit. if (a == -9223372036854775806 || a == -9223372036854775807 || a == -9223372036854775804)
return true;
return false;
if (a == 9223372036854775806 || a == 9223372036854775807 || a == 9223372036854775804)
return true;
return false; |
Hi @EgorBo, this is ready for review. CC @dotnet/jit-contrib. Asmdiff shows size regression, but they are all converted to a bit test instead of multiple compares and jumps. |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr outerloop, runtime-coreclr pgo |
Azure Pipelines successfully started running 2 pipeline(s). |
Slightly refactored the impl to make it smaller and extendable in future, so far it finds ~4000 contexts to improve. Regressions also look like perf improvements because of less branches. We can increase minimal number of tests from 3 to 4 to get rid of size regressions but I guess we'd better do it based on microbenchmarks reports as the overall diffs aren't bad. |
(changing milestone to 9.0.0, I assume it won't be backported) |
Verified that
[MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(char c)
{
if (c == ' ' || c == '\t' || c == '\r' || c == '\n')
return true;
return false;
} IN0001: 000000 movzx rcx, cx
IN0002: 000003 cmp ecx, 32
IN0003: 000006 je SHORT G_M24766_IG07
IN0004: 000008 cmp ecx, 13
IN0005: 00000B ja SHORT G_M24766_IG05
IN0006: 00000D mov eax, 0x19FF
IN0007: 000012 bt eax, ecx
IN0008: 000015 jae SHORT G_M24766_IG07
G_M24766_IG05: ; offs=0x000017, size=0x0002, bbWeight=0.50, PerfScore 0.12, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB04 [0005], byref
IN0009: 000017 xor eax, eax
IN000b: 000019 ret
G_M24766_IG07: ; offs=0x00001A, size=0x0005, bbWeight=0.50, PerfScore 0.12, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB05 [0004], gcvars, byref
IN000a: 00001A mov eax, 1
IN000c: 00001F ret
[MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(short c)
{
if (c == ' ' || c == '\t' || c == '\r' || c == '\n')
return true;
return false;
} IN0001: 000000 movsx rcx, cx
IN0002: 000004 cmp ecx, 32
IN0003: 000007 je SHORT G_M54667_IG07
IN0004: 000009 cmp ecx, 13
IN0005: 00000C ja SHORT G_M54667_IG05
IN0006: 00000E mov eax, 0x19FF
IN0007: 000013 bt eax, ecx
IN0008: 000016 jae SHORT G_M54667_IG07
G_M54667_IG05: ; offs=0x000018, size=0x0002, bbWeight=0.50, PerfScore 0.12, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB04 [0005], byref
IN0009: 000018 xor eax, eax
IN000b: 00001A ret
G_M54667_IG07: ; offs=0x00001B, size=0x0005, bbWeight=0.50, PerfScore 0.12, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB05 [0004], gcvars, byref
IN000a: 00001B mov eax, 1
IN000c: 000020 ret
[MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(byte c)
{
if (c == ' ' || c == '\t' || c == '\r' || c == '\n')
return true;
return false;
} IN0001: 000000 movzx rcx, cl
IN0002: 000003 cmp ecx, 32
IN0003: 000006 je SHORT G_M29734_IG07
IN0004: 000008 cmp ecx, 13
IN0005: 00000B ja SHORT G_M29734_IG05
IN0006: 00000D mov eax, 0x19FF
IN0007: 000012 bt eax, ecx
IN0008: 000015 jae SHORT G_M29734_IG07
G_M29734_IG05: ; offs=0x000017, size=0x0002, bbWeight=0.50, PerfScore 0.12, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB04 [0005], byref
IN0009: 000017 xor eax, eax
IN000b: 000019 ret
G_M29734_IG07: ; offs=0x00001A, size=0x0005, bbWeight=0.50, PerfScore 0.12, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB05 [0004], gcvars, byref
IN000a: 00001A mov eax, 1
IN000c: 00001F ret |
If |
Thanks, @JulieLeeMSFT. FYI, it fails to do so if you don't use [MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(char c) => c == ' ' || c == '\t' || c == '\r' || c == '\n'; |
I will check on it.
|
[MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(char c) => c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '!'; ***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x003 )
N008 ( 8, 9) [000003] -A--------- * JTRUE void $VN.Void
N007 ( 6, 7) [000002] JA-----N--- \--* EQ int $101
N005 ( 4, 5) [000030] -A--------- +--* COMMA int $100
N003 ( 3, 4) [000028] DA--------- | +--* STORE_LCL_VAR int V02 cse0 d:1 $VN.Void
N002 ( 3, 4) [000022] ----------- | | \--* CAST int <- ushort <- int $100
N001 ( 2, 2) [000000] ----------- | | \--* LCL_VAR int V00 arg0 u:1 $80
N004 ( 1, 1) [000029] ----------- | \--* LCL_VAR int V02 cse0 u:1 $100
N006 ( 1, 1) [000001] ----------- \--* CNS_INT int 32 $43
------------ BB02 [0001] [005..00A) -> BB06(0.5),BB03(0.5) (cond), preds={BB01} succs={BB03,BB06}
***** BB02 [0001]
STMT00002 ( 0x005[E-] ... 0x008 )
N004 ( 5, 5) [000009] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000008] J------N--- \--* EQ int $102
N001 ( 1, 1) [000031] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000007] ----------- \--* CNS_INT int 9 $44
------------ BB03 [0002] [00A..00F) -> BB06(0.5),BB04(0.5) (cond), preds={BB02} succs={BB04,BB06}
***** BB03 [0002]
STMT00003 ( 0x00A[E-] ... 0x00D )
N004 ( 5, 5) [000013] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000012] J------N--- \--* EQ int $103
N001 ( 1, 1) [000032] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000011] ----------- \--* CNS_INT int 13 $45
------------ BB04 [0003] [00F..014) -> BB06(0.5),BB05(0.5) (cond), preds={BB03} succs={BB05,BB06}
***** BB04 [0003]
STMT00004 ( 0x00F[E-] ... 0x012 )
N004 ( 5, 5) [000017] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000016] J------N--- \--* EQ int $104
N001 ( 1, 1) [000033] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000015] ----------- \--* CNS_INT int 10 $42
------------ BB05 [0004] [014..01A) (return), preds={BB04} succs={}
***** BB05 [0004]
STMT00005 ( 0x014[E-] ... 0x019 )
N004 ( 7, 4) [000021] ----------- * RETURN int $VN.Void
N003 ( 6, 3) [000020] ----------- \--* EQ int $105
N001 ( 1, 1) [000034] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000019] ----------- \--* CNS_INT int 33 $47
------------ BB06 [0005] [01A..01C) (return), preds={BB01,BB02,BB03,BB04} succs={}
***** BB06 [0005]
STMT00001 ( 0x01A[E-] ... 0x01B )
N002 ( 2, 2) [000005] ----------- * RETURN int $VN.Void
N001 ( 1, 1) [000026] ----------- \--* CNS_INT int 1 $46 IN0001: 000000 movzx rcx, cx
IN0002: 000003 cmp ecx, 32
IN0003: 000006 je SHORT G_M24766_IG05
IN0004: 000008 cmp ecx, 13
IN0005: 00000B ja SHORT G_M24766_IG07
IN0006: 00000D mov eax, 0x19FF
IN0007: 000012 bt eax, ecx
IN0008: 000015 jb SHORT G_M24766_IG07
G_M24766_IG05:
IN0009: 000017 mov eax, 1
IN000d: 00001C ret
G_M24766_IG07:
IN000a: 00001D cmp ecx, 33
IN000b: 000020 sete al
IN000c: 000023 movzx rax, al
IN000e: 000026 ret |
BBs for [MethodImpl(MethodImplOptions.NoInlining)]
static bool TestRange(char c)
{
if (c == ' ' || c == '\t' || c == '\r' || c == '\n')
return true;
return false;
} ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 [000..005)-> BB06(0.5),BB02(0.5) ( cond ) i
BB02 [0001] 1 BB01 0.50 [005..00A)-> BB06(0.5),BB03(0.5) ( cond ) i
BB03 [0002] 1 BB02 0.50 [00A..00F)-> BB06(0.5),BB04(0.5) ( cond ) i
BB04 [0003] 1 BB03 0.50 [00F..014)-> BB05(0.5),BB06(0.5) ( cond ) i
BB05 [0005] 1 BB04 0.50 [016..018) (return) i
BB06 [0004] 4 BB01,BB02,BB03,BB04 0.50 [014..016) (return) i
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------ BB01 [0000] [000..005) -> BB06(0.5),BB02(0.5) (cond), preds={} succs={BB02,BB06}
***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x003 )
N008 ( 8, 9) [000003] -A--------- * JTRUE void $VN.Void
N007 ( 6, 7) [000002] JA-----N--- \--* EQ int $101
N005 ( 4, 5) [000028] -A--------- +--* COMMA int $100
N003 ( 3, 4) [000026] DA--------- | +--* STORE_LCL_VAR int V02 cse0 d:1 $VN.Void
N002 ( 3, 4) [000020] ----------- | | \--* CAST int <- ushort <- int $100
N001 ( 2, 2) [000000] ----------- | | \--* LCL_VAR int V00 arg0 u:1 $80
N004 ( 1, 1) [000027] ----------- | \--* LCL_VAR int V02 cse0 u:1 $100
N006 ( 1, 1) [000001] ----------- \--* CNS_INT int 32 $43
------------ BB02 [0001] [005..00A) -> BB06(0.5),BB03(0.5) (cond), preds={BB01} succs={BB03,BB06}
***** BB02 [0001]
STMT00002 ( 0x005[E-] ... 0x008 )
N004 ( 5, 5) [000009] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000008] J------N--- \--* EQ int $102
N001 ( 1, 1) [000029] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000007] ----------- \--* CNS_INT int 9 $44
------------ BB03 [0002] [00A..00F) -> BB06(0.5),BB04(0.5) (cond), preds={BB02} succs={BB04,BB06}
***** BB03 [0002]
STMT00003 ( 0x00A[E-] ... 0x00D )
N004 ( 5, 5) [000013] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000012] J------N--- \--* EQ int $103
N001 ( 1, 1) [000030] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000011] ----------- \--* CNS_INT int 13 $45
------------ BB04 [0003] [00F..014) -> BB05(0.5),BB06(0.5) (cond), preds={BB03} succs={BB06,BB05}
***** BB04 [0003]
STMT00004 ( 0x00F[E-] ... 0x012 )
N004 ( 5, 5) [000017] ----------- * JTRUE void $VN.Void
N003 ( 3, 3) [000016] N------N-U- \--* NE int $104
N001 ( 1, 1) [000031] ----------- +--* LCL_VAR int V02 cse0 u:1 $100
N002 ( 1, 1) [000015] ----------- \--* CNS_INT int 10 $42
------------ BB05 [0005] [016..018) (return), preds={BB04} succs={}
***** BB05 [0005]
STMT00005 ( 0x016[E-] ... 0x017 )
N002 ( 2, 2) [000019] ----------- * RETURN int $VN.Void
N001 ( 1, 1) [000024] ----------- \--* CNS_INT int 0 $40
------------ BB06 [0004] [014..016) (return), preds={BB01,BB02,BB03,BB04} succs={}
***** BB06 [0004]
STMT00001 ( 0x014[E-] ... 0x015 )
N002 ( 2, 2) [000005] ----------- * RETURN int $VN.Void
N001 ( 1, 1) [000025] ----------- \--* CNS_INT int 1 $46 |
@JulieLeeMSFT regarding The |
Very good! Let's check that in.
Right, we didn't have time for that and left it for future work. |
It detects if statement with specifc range check patterns and optimizes them to use bit operation instead of jumps for each pattern.
Pattern 1 optimized in this PR:
Before:
After:
Contributes to #8418.