Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf -24%] System.Memory.Span<Int32>.SequenceCompareTo #2286

Open
performanceautofiler bot opened this issue Oct 13, 2020 · 1 comment
Open

[Perf -24%] System.Memory.Span<Int32>.SequenceCompareTo #2286

performanceautofiler bot opened this issue Oct 13, 2020 · 1 comment

Comments

@performanceautofiler
Copy link

Run Information

Architecture x64
OS ubuntu 18.04
Changes diff

Regressions in System.Memory.Span

Benchmark Baseline Test Test/Base Modality Baseline Outlier Baseline ETL Comapre ETL
SequenceCompareTo 455.51 ns 566.65 ns 1.24 True

graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Memory.Span<Int32>*'

Histogram

System.Memory.Span.SequenceCompareTo(Size: 512)

[437.646 ; 472.689) | @@@@@@@@@@@@@@@@@@@@@@@@@@
[472.689 ; 506.283) | 
[506.283 ; 549.953) | 
[549.953 ; 584.743) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@kunalspathak
Copy link
Collaborator

I spent some time investigating this one and I came up with same conclusion as in #2319. On my local machine, the benchmark appears faster with 32B alignment changes.

With 32B alignment:

|            Method | Size |     Mean |   Error |  StdDev |   Median |      Min |      Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------ |----- |---------:|--------:|--------:|---------:|---------:|---------:|------:|------:|------:|----------:|
| SequenceCompareTo |  512 | 406.6 ns | 3.87 ns | 3.43 ns | 407.1 ns | 399.6 ns | 412.9 ns |     - |     - |     - |         - |
| SequenceCompareTo |  512 | 405.4 ns | 3.46 ns | 3.24 ns | 405.0 ns | 401.4 ns | 410.9 ns |     - |     - |     - |         - |
_M65109_IG01:              ;; offset=0000H
 00007ff8`a2107a40        4156                 push     r14
 00007ff8`a2107a42        57                   push     rdi
 00007ff8`a2107a43        56                   push     rsi
 00007ff8`a2107a44        55                   push     rbp
 00007ff8`a2107a45        53                   push     rbx
 00007ff8`a2107a46        4883EC30             sub      rsp, 48
 00007ff8`a2107a4a        488BF1               mov      rsi, rcx
 00007ff8`a2107a4d        8BEA                 mov      ebp, edx
 00007ff8`a2107a4f        498BF8               mov      rdi, r8
 00007ff8`a2107a52        418BD9               mov      ebx, r9d
                                                ;; bbWeight=1    PerfScore 6.25
G_M65109_IG02:              ;; offset=0015H
 00007ff8`a2107a55        8BCD                 mov      ecx, ebp
 00007ff8`a2107a57        F7D1                 not      ecx
 00007ff8`a2107a59        C1E91F               shr      ecx, 31
 00007ff8`a2107a5c        48BA6030B2E477020000 mov      rdx, 0x277E4B23060
; =========================== 32B boundary ===========================
; ....
; ....
 00007ff8`a2107a9e        85C0                 test     eax, eax
; =========================== 32B boundary ===========================
 00007ff8`a2107aa0        7E33                 jle      SHORT G_M65109_IG16
                                                ;; bbWeight=1    PerfScore 1.50
G_M65109_IG09:              ;; offset=0062H
 00007ff8`a2107aa2        4863CA               movsxd   rcx, edx
 00007ff8`a2107aa5        448B048F             mov      r8d, dword ptr [rdi+4*rcx]
 00007ff8`a2107aa9        488D0C8E             lea      rcx, bword ptr [rsi+4*rcx]
 00007ff8`a2107aad        443901               cmp      dword ptr [rcx], r8d
 00007ff8`a2107ab0        7D08                 jge      SHORT G_M65109_IG11
                                                ;; bbWeight=4    PerfScore 23.00
G_M65109_IG10:              ;; offset=0072H
 00007ff8`a2107ab2        41B9FFFFFFFF         mov      r9d, -1
 00007ff8`a2107ab8        EB10                 jmp      SHORT G_M65109_IG14
                                                ;; bbWeight=2    PerfScore 4.50
G_M65109_IG11:              ;; offset=007AH
 00007ff8`a2107aba        443901               cmp      dword ptr [rcx], r8d     ; <--- hot instruction
 00007ff8`a2107abd        7E08                 jle      SHORT G_M65109_IG13
                                                ;; bbWeight=1    PerfScore 3.00
G_M65109_IG12:              ;; offset=007FH
 00007ff8`a2107abf        41B901000000         mov      r9d, 1
; =========================== 32B boundary ===========================
 00007ff8`a2107ac5        EB03                 jmp      SHORT G_M65109_IG14
                                                ;; bbWeight=2    PerfScore 4.50
G_M65109_IG13:              ;; offset=0087H
 00007ff8`a2107ac7        4533C9               xor      r9d, r9d     ; <--- hot instruction
                                                ;; bbWeight=2    PerfScore 0.50
G_M65109_IG14:              ;; offset=008AH
 00007ff8`a2107aca        4585C9               test     r9d, r9d
 00007ff8`a2107acd        7511                 jne      SHORT G_M65109_IG17
                                                ;; bbWeight=4    PerfScore 5.00
G_M65109_IG15:              ;; offset=008FH
 00007ff8`a2107acf        FFC2                 inc      edx
 00007ff8`a2107ad1        3BD0                 cmp      edx, eax
 00007ff8`a2107ad3        7CCD                 jl       SHORT G_M65109_IG09
                                                ;; bbWeight=4    PerfScore 6.00
G_M65109_IG16:              ;; offset=0095H
 00007ff8`a2107ad5        3BEB                 cmp      ebp, ebx
 00007ff8`a2107ad7        7D15                 jge      SHORT G_M65109_IG19
 00007ff8`a2107ad9        B8FFFFFFFF           mov      eax, -1
 00007ff8`a2107ade        EB1B                 jmp      SHORT G_M65109_IG22
; =========================== 32B boundary ===========================

Without 32B alignment:

|            Method | Size |     Mean |   Error |  StdDev |   Median |      Min |      Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------ |----- |---------:|--------:|--------:|---------:|---------:|---------:|------:|------:|------:|----------:|
| SequenceCompareTo |  512 | 576.7 ns | 3.18 ns | 2.65 ns | 576.5 ns | 572.8 ns | 581.4 ns |     - |     - |     - |         - |
| SequenceCompareTo |  512 | 577.3 ns | 4.70 ns | 4.40 ns | 576.8 ns | 570.8 ns | 586.1 ns |     - |     - |     - |         - |
G_M65109_IG01:              ;; offset=0000H
 00007ff8`a20d7990        4156                 push     r14
 00007ff8`a20d7992        57                   push     rdi
 00007ff8`a20d7993        56                   push     rsi
 00007ff8`a20d7994        55                   push     rbp
 00007ff8`a20d7995        53                   push     rbx
 00007ff8`a20d7996        4883EC30             sub      rsp, 48
 00007ff8`a20d799a        488BF1               mov      rsi, rcx
 00007ff8`a20d799d        8BEA                 mov      ebp, edx
 00007ff8`a20d799f        498BF8               mov      rdi, r8
; =========================== 32B boundary ===========================
 00007ff8`a20d79a2        418BD9               mov      ebx, r9d
; ....
; ....
                                                ;; bbWeight=1    PerfScore 1.50
G_M65109_IG09:              ;; offset=0062H
 00007ff8`a20d79f2        4863CA               movsxd   rcx, edx
 00007ff8`a20d79f5        448B048F             mov      r8d, dword ptr [rdi+4*rcx]
 00007ff8`a20d79f9        488D0C8E             lea      rcx, bword ptr [rsi+4*rcx]
 00007ff8`a20d79fd        443901               cmp      dword ptr [rcx], r8d
; =========================== 32B boundary ===========================
 00007ff8`a20d7a00        7D08                 jge      SHORT G_M65109_IG11
                                                ;; bbWeight=4    PerfScore 23.00
G_M65109_IG10:              ;; offset=0072H
 00007ff8`a20d7a02        41B9FFFFFFFF         mov      r9d, -1
 00007ff8`a20d7a08        EB10                 jmp      SHORT G_M65109_IG14
                                                ;; bbWeight=2    PerfScore 4.50
G_M65109_IG11:              ;; offset=007AH
 00007ff8`a20d7a0a        443901               cmp      dword ptr [rcx], r8d     ; <--- hot instruction
 00007ff8`a20d7a0d        7E08                 jle      SHORT G_M65109_IG13
                                                ;; bbWeight=1    PerfScore 3.00
G_M65109_IG12:              ;; offset=007FH
 00007ff8`a20d7a0f        41B901000000         mov      r9d, 1
 00007ff8`a20d7a15        EB03                 jmp      SHORT G_M65109_IG14
                                                ;; bbWeight=2    PerfScore 4.50
G_M65109_IG13:              ;; offset=0087H
 00007ff8`a20d7a17        4533C9               xor      r9d, r9d     ; <--- hot instruction
                                                ;; bbWeight=2    PerfScore 0.50
G_M65109_IG14:              ;; offset=008AH
 00007ff8`a20d7a1a        4585C9               test     r9d, r9d
 00007ff8`a20d7a1d        7511                 jne      SHORT G_M65109_IG17
                                                ;; bbWeight=4    PerfScore 5.00
G_M65109_IG15:              ;; offset=008FH
 00007ff8`a20d7a1f        FFC2                 inc      edx
; =========================== 32B boundary ===========================

Looking at the vtune profiler, it looks like the 2 hot instructions cmp and xor are falls in different chunk of 32B, if methods are aligned on 32B boundary, but in same chunk of 32B, if a method is not aligned on 32B boundary. I am not sure how that affects the numbers one way or the other. Regardless, I also tried my loop alignment changes and it doesn't add any padding because when method is with 32B alignment, the inner loop falls in 32B boundary and there is no need to add padding, so we do not add anything to it. Let us continue to monitor this benchmark to see if it at least stays stable.

Cc: @adamsitnik , @AndyAyersMS

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants