Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM64] Performance regression: DictionarySequentialKeys.ContainsKey #41743

Closed
adamsitnik opened this issue Sep 2, 2020 · 10 comments
Closed
Assignees
Milestone

Comments

@adamsitnik
Copy link
Member

After running benchmarks for 3.1 vs 5.0 using "Ubuntu arm64 Qualcomm Machines" owned by the JIT Team, I've found few regressions related to dictionary.ContainsKey.

Dictionary got "optimized" in 5.0 in dotnet/coreclr#27299 and #406 It's possible that these optimizations were a bad idea for ARM64 or they are simply unrelated to the regression.

@DrewScoggins is there any way to see the full historical data for ARM64?

cc @benaadams

Repro

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --architecture arm64 --filter 'System.Collections.Tests.DictionarySequentialKeys.ContainsKey*'

BenchmarkDotNet=v0.12.1.1405-nightly, OS=ubuntu 16.04
Unknown processor
.NET Core SDK=6.0.100-alpha.1.20451.3
[Host] : .NET Core 3.1.8 (CoreCLR 4.700.20.41105, CoreFX 4.700.20.41903), Arm64 RyuJIT
Job-SUXCQE : .NET Core 3.1.8 (CoreCLR 4.700.20.41105, CoreFX 4.700.20.41903), Arm64 RyuJIT
Job-FEGRDD : .NET Core 5.0.0 (CoreCLR 5.0.20.41714, CoreFX 5.0.20.41714), Arm64 RyuJIT

Method 3.1 Mean 5.0 Mean
ContainsKey_17_Int_32ByteValue 20.41 ns 25.94 ns
ContainsKey_17_Int_32ByteRefsValue 20.29 ns 25.49 ns
ContainsKey_3k_Int_32ByteValue 20.11 ns 26.42 ns
ContainsKey_3k_Int_32ByteRefsValue 20.23 ns 25.49 ns

/cc @JulieLeeMSFT

More data:

Legend

  • Statistical Test threshold: 5%, the noise filter: 1 ns
  • Result is conslusion: Slower|Faster|Same
  • Base is median base execution time in nanoseconds
  • Diff is median diff execution time in nanoseconds
  • Ratio = Base/Diff (the higher the better)
  • Alloc Delta = Allocated bytes diff - Allocated bytes base (the lower the better)
  • Base V = Base Runtime Version
  • Diff V = Diff Runtime Version

System.Collections.Tests.DictionarySequentialKeys.ContainsKey_3k_Int_32ByteRefsValue

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Same 3.51 4.18 0.84 +0 Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X 3.1.6 5.0.20.41714
Faster 6.89 5.55 1.24 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Same 6.62 5.68 1.17 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40416
Same 7.41 6.47 1.15 +0 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Same 5.69 5.67 1.00 +0 Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 3.1.6 5.0.20.40416
Same 5.07 4.49 1.13 +0 Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Same 6.03 5.40 1.12 +0 Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 3.1.6 5.0.20.41714
Same 5.88 5.43 1.08 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Faster 7.43 5.84 1.27 +0 manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.41 6.04 1.06 +0 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake) 3.1.6 5.0.20.41714
Same 4.94 4.48 1.10 +0 alpine 3.11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.41714
Same 4.92 4.37 1.13 +0 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Slower 20.27 25.46 0.80 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Slower 20.15 25.47 0.79 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.7 5.0.20.41714
Slower 19.93 25.46 0.78 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 19.01 19.86 0.96 +0 ubuntu 18.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 6.93 7.19 0.96 +0 Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.41714
Same 8.06 8.39 0.96 +0 Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Slower 9.58 10.81 0.89 +0 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz 3.1.6 5.0.20.40416
Faster 8.63 7.20 1.20 +0 macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 3.1.6 5.0.20.41714
Same 7.08 6.14 1.15 +0 macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 7.09 6.29 1.13 +0 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40203

System.Collections.Tests.DictionarySequentialKeys.ContainsKey_3k_Int_32ByteValue

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Same 3.52 3.98 0.88 +0 Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X 3.1.6 5.0.20.41714
Same 6.63 5.53 1.20 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Same 6.62 5.59 1.18 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40416
Same 7.40 6.42 1.15 +0 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Same 5.68 5.92 0.96 +0 Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 3.1.6 5.0.20.40416
Same 5.07 4.35 1.17 +0 Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Same 6.06 6.80 0.89 +0 several? Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 3.1.6 5.0.20.41714
Faster 6.48 5.32 1.22 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Faster 7.02 5.65 1.24 +0 manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.56 6.63 0.99 +0 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake) 3.1.6 5.0.20.41714
Same 4.93 4.31 1.14 +0 alpine 3.11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.41714
Same 4.99 4.30 1.16 +0 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Slower 20.23 26.11 0.77 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Slower 19.95 26.10 0.76 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.7 5.0.20.41714
Slower 20.12 25.08 0.80 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 15.32 15.99 0.96 +0 ubuntu 18.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 6.84 6.98 0.98 +0 Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.41714
Same 8.04 7.75 1.04 +0 Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Slower 9.55 10.92 0.87 +0 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz 3.1.6 5.0.20.40416
Faster 8.28 7.09 1.17 +0 macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.81 6.09 1.12 +0 macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 7.09 6.19 1.15 +0 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40203

System.Collections.Tests.DictionarySequentialKeys.ContainsKey_17_Int_32ByteValue

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Same 3.64 3.88 0.94 +0 Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X 3.1.6 5.0.20.41714
Same 6.41 5.77 1.11 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Same 6.67 5.78 1.15 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40416
Faster 7.46 6.40 1.16 +0 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Same 5.98 6.26 0.96 +0 Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 3.1.6 5.0.20.40416
Faster 6.40 5.31 1.21 +0 Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Same 6.19 5.26 1.18 +0 bimodal Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 3.1.6 5.0.20.41714
Same 5.90 5.68 1.04 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Faster 7.05 5.61 1.26 +0 manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.40 6.56 0.98 +0 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake) 3.1.6 5.0.20.41714
Same 5.11 4.51 1.13 +0 alpine 3.11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.41714
Same 5.12 4.66 1.10 +0 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Slower 20.32 26.20 0.78 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Slower 20.31 25.90 0.78 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.7 5.0.20.41714
Slower 20.03 25.11 0.80 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 22.09 16.27 1.36 +0 bimodal ubuntu 18.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 7.15 7.10 1.01 +0 Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.41714
Same 8.35 8.03 1.04 +0 Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Slower 10.24 11.45 0.89 +0 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz 3.1.6 5.0.20.40416
Faster 8.24 7.06 1.17 +0 macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.83 5.94 1.15 +0 macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 7.15 6.15 1.16 +0 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40203

System.Collections.Tests.DictionarySequentialKeys.ContainsKey_17_Int_32ByteRefsValue

Result Base Diff Ratio Alloc Delta Modality Operating System Bit Processor Name Base V Diff V
Same 3.64 4.30 0.85 +0 Windows 10.0.19041.388 X64 AMD Ryzen 9 3900X 3.1.6 5.0.20.41714
Faster 6.66 5.57 1.20 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Faster 6.68 5.60 1.19 +0 Windows 10.0.18363.959 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40416
Same 7.47 6.49 1.15 +0 Windows 10.0.19041.450 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Same 5.99 6.00 1.00 +0 Windows 10.0.19041.450 X64 Intel Core i7-6700 CPU 3.40GHz (Skylake) 3.1.6 5.0.20.40416
Same 5.15 4.55 1.13 +0 Windows 10.0.19042 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Same 6.12 5.42 1.13 +0 several? Windows 10.0.19041.450 X64 Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) 3.1.6 5.0.20.41714
Same 5.95 5.34 1.11 +0 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.40203
Same 7.33 6.21 1.18 +0 several? manjaro X64 Intel Core i7-4771 CPU 3.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 6.38 5.96 1.07 +0 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake) 3.1.6 5.0.20.41714
Same 5.12 4.73 1.08 +0 alpine 3.11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.41714
Same 5.11 4.66 1.10 +0 ubuntu 18.04 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) 3.1.6 5.0.20.40416
Slower 20.25 25.44 0.80 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Slower 19.99 25.56 0.78 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.7 5.0.20.41714
Slower 20.23 25.48 0.79 +0 ubuntu 16.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 17.00 20.10 0.85 +0 bimodal ubuntu 18.04 Arm64 Unknown processor 3.1.6 5.0.20.41714
Same 7.14 7.47 0.96 +0 several? Windows 10.0.18363.959 X86 Intel Xeon CPU E5-1650 v4 3.60GHz 3.1.6 5.0.20.41714
Same 8.31 8.34 1.00 +0 Windows 10.0.19041.450 X86 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40416
Slower 10.27 11.52 0.89 +0 Windows 10.0.18363.1016 Arm Microsoft SQ1 3.0 GHz 3.1.6 5.0.20.40416
Faster 8.55 7.29 1.17 +0 macOS Catalina 10.15.6 X64 Intel Core i5-4278U CPU 2.60GHz (Haswell) 3.1.6 5.0.20.41714
Faster 7.11 6.07 1.17 +0 macOS Catalina 10.15.6 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell) 3.1.6 5.0.20.41714
Same 7.17 6.24 1.15 +0 macOS Mojave 10.14.5 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell) 3.1.6 5.0.20.40203
@adamsitnik adamsitnik added arch-arm64 tenet-performance Performance related issue labels Sep 2, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Sep 2, 2020
@ghost
Copy link

ghost commented Sep 2, 2020

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

@benaadams
Copy link
Member

Is strange Arm regressed by similar amount as only 64bit should be effected by those changes?

0.89	Windows 10.0.18363.1016	Arm	Microsoft SQ1 3.0 GHz

As they are inside a #if BIT64 block

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Sep 4, 2020
@layomia layomia added this to the 5.0.0 milestone Sep 4, 2020
@eiriktsarpalis eiriktsarpalis self-assigned this Sep 11, 2020
@danmoseley
Copy link
Member

@DrewScoggins could you please help answer @adamsitnik question above?

@adamsitnik could you help characterize which dictionary scenarios regressed and which did not? 25% is a lot for a mainstream case - but you didn't mention any of the other dictionary lookup benchmark we have. Is the common factor that the values are larger than ref-size (ie., structs)? That would make the Entries array fatter, so maybe it's a memory locality thing.

Incidentally I found 32ByteRefsValue to be a confusing name, because I expected the value to be a ref type, in fact it's a struct containing ref types. I am not sure why it would have different performance to 32ByteValue but I guess the idea was that the GC would behave differently (??)

@danmoseley
Copy link
Member

@DrewScoggins where do I find 3.1->5.0 comparisons for ARM64?
image

@DrewScoggins
Copy link
Member

I talked with Adam about this offline and got him access to the full test history for ARM64 runs that we have done in the lab. Here is the link for posterity, https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/master_arm64_ubuntu%2018.04/AllTestindex.html.

As for the comparison results, we were running into an issue where we were running the index report before all of the report generation jobs had completed. So we were showing the result as not existing. I have regenerated the index report and have a fix in mind that should remove this error going forward. You can just look at the new index report or follow this link, https://pvscmdupload.blob.core.windows.net/reports/09_15_2020/report_Daily_ca=ARM64_cb=master_co=Ubuntu1804ARM_cr=dotnetcoresdk_cc=CompliationMode=tiered-RunKind=micro_Baseline_bb=release-3.1.2xx_2020-09-15.html.

@danmoseley
Copy link
Member

Thanks.

These scenarios simply seem very bimodal:
image
actually most or all of the GetValue/GetKey scenarios are
image

Similar for CopyTo Array, etc.

Is this just yet more alignment? I do not see a regressoin in this data.

@danmoseley
Copy link
Member

I added this info to dotnet/BenchmarkDotNet#1513

@DrewScoggins
Copy link
Member

Thanks.

These scenarios simply seem very bimodal:
image
actually most or all of the GetValue/GetKey scenarios are
image

Similar for CopyTo Array, etc.

Is this just yet more alignment? I do not see a regressoin in this data.

I agree that this does not look like a regression to me.

@danmoseley
Copy link
Member

@adamsitnik thoughts? shall we close?

@jeffhandley
Copy link
Member

@adamsitnik I'm going to go ahead and close this based on the discussion above. Feel free to reopen if you think we should investigate further though.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants