-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: InvariantCultureIgnoreCase 2x to 6x times slower on Linux #29565
Comments
This will be good to be investigated by the Dictionary owner first to look at if we have a locking in the collection side that may cause showing much slower in the ASP.NET scenario. @sebastienros would be nice if you can attache the ASP.NET sample repro you used when seeing the 6x degradation. |
Here is the controller to add to a new project from the default API template:
|
@sebastienros do we know whether this was a regression from past release? Also, is this critical for meeting 3.0 perf goals? |
I don't think this is a regression. |
This is not a regression, I already filed a related issue a year ago. |
We should fix this right? 2x-6x for such a critical operation seems bad. |
Yes we should look at it but I don't think this is urgent for 3.0 as it is not a regression. I am going to move it to the future milestone. |
Tagging @adamsitnik as Dan assigned him to take a look |
@mjsabby is right, it's significant enough we should investigate. |
From a quick look at the profiles and the code, I can tell that we are using different APIs to get the sortKey when computing the hash when comparing strings (Windows offers us something nice here). I will try to find a way to make the Linux implementation faster. |
@adamsitnik from @sebastienros investigation, I believe he mentioned not seeing that much difference when executing in a single threaded scenario while the regression is seen in the ASP.NET scenario with mutli-threaded. in other word, @sebastienros data show if measuring the difference between Linux and Windows only in console app when calling InvariantCultureIgnoreCase, not seeing that much regression. @sebastienros feel free to correct me if I spelled anything wrong here.
This is expected as we depend on different components on different OS's. |
I just want to make sure we use the best available method in the right way according to spec. |
Yes, that is fine. I would suggest to measure this scenario in isolation first before jumping in trying to optimize it. As I mentioned there could be another factor in the ASP.NET scenario (something like involving locking with the collections for instance) causing this difference. |
When I wrote "From a quick look at the profiles" I meant that I have already profiled that. Anyway, thanks for the hint. The more people repeat "measure, measure, measure" the better! |
FYI this issue has probably the same root cause as this one: https://github.com/dotnet/corefx/issues/29462 |
I doubt that because the string comparisons are not using the sortkeys. but maybe similar issue. |
I have used the recommendations provided in the ICU User Guide and made it two times faster dotnet/coreclr#24889 |
Great that we were able to optimize this! @sebastienros what was the overall impact of this on the techempower scenario? |
Yes @billwert point is important as I am expecting there is some other places contributing to the perf differences. |
@billwert For the DataUpdate scenario this was terrible when used with Npgsql and Dapper, which I assume is very common. We'll see how much of an issue this was for the other scenarios when the fix is in. And also how much is this change fixing the DataUpdates scenario. As an example, ADO.NET is 10 times faster than Dapper on this scenario. I would expect a difference of 10% instead. |
@sebastienros do you have a trace that shows the overall cost of the slowdown on the scenario? I'd like for us to prioritize this bug based on the potential benefit we'd get from a fix. |
@brianrob I pasted a screenshot of the best details I could get from the trace. I can show you the command line I used, maybe you would know how to get a better one, or get better data out of it. |
Just to tell, @jkotas mentioned important comment dotnet/coreclr#24889 (comment) which most likely contributing to the delay in the multi-threading scenario. |
@adamsitnik maybe you still need this bug to track further investigation |
Sure, I did not know that GH is going to close the issue from other repo OOTB when I use the "fixes" keyword. I am still looking at the traces with lock. It looks like a different lock than the Jan pointed out is a problem (I have removed the pointed one and changed the implementation to ThreadStatic and it did not help). |
我们遇到了同样的问题,dotnet core在Linux平台下的性能,比windows差好几倍,核心态CPU非常高。今天下载了3.0 Preview验证,发现此问题仍然存在。真实使用场景是向Oracle批量插入数据时,ODP内部频繁调用的String.Compare(s1, s2, true)触发了calling System.Globalization.Native.so!pthread_mutex_lock。lldb的堆栈日志显示如下 Thread 233 |
@wangzhaoguan thanks for sending the stack. I posted on the PR a couple of questions regarding your targeting version. does you app can target 3.0? |
The lock has been removed in dotnet/coreclr#24973 and the issue is now solved for good. Tomorrow I am going to fill the servicing request to merge my backport to 2.1 dotnet/coreclr#25042 |
sorry closed this by mistake. @adamsitnik can close it though. |
np, I wanted to close it once I port the fix to 2.1, but since we are not going to do that I can close it right now. Btw I've also benchmarks to track the performance of |
I don't think the issue has been fixed, running on ASP.NET I still get |
FYI the command to run the benchmarks (if you know how to use our service).
|
And some event counters:
|
@sebastienros just double checking, are you sure you were running with @adamsitnik fix? |
Yes, I verified the build I am using has it (currently the latest |
This is weird, I get sth around 90k RPS for Windows and 60k for Linux dotnet run -- --server http://asp-perf-win:5001 --client http://asp-perf-load:5002 --repository https://github.com/sebastienros/invariantcultureperf --project-file InvariantCulture.csproj --path /api/values/InvariantCultureIgnoreCase/100 --warmup 1 --duration 5 --runtime 3.0.0-*
dotnet run -- --server http://asp-perf-lin:5001 --client http://asp-perf-load:5002 --repository https://github.com/sebastienros/invariantcultureperf --project-file InvariantCulture.csproj --path /api/values/InvariantCultureIgnoreCase/100 --warmup 1 --duration 5 --runtime 3.0.0-*
|
I have run the oldest preview7 that was available at dotnet-core feed (BTW why there are not preview6 packages there?) to make sure it did not contain dotnet/coreclr#24973 : --runtimeVersion 3.0.0-preview7-27807-12 --collect-trace
The locks were all over the place: With the fix: --runtime 3.0.0-* --collect-trace
The locks are gone |
@sebastienros could you please take a look at the commands I run? The results we get for Windows are very different: 271k vs 90k RPS. In the meantime, I am going to take a look at the two hottest method from the stack trace:
|
Ok, it looks like |
Ok, I was able to get 110k RPS using my local build of
I have created a PR in CoreCLR: dotnet/coreclr#25117 I hope it's not a very bad idea. |
That's an interesting difference... |
Presumably coming from asynchronously completing socket operations that need to queue from the epoll thread back to the thread pool to run user code. |
This is exactly how a fix should be validated, testing your local changes on the scenario before sending the PR, that's awesome!
Yeah they are perfect, I was just using the TechEmpower machines which are more beefy, and were probably more prone to locking as it handles more load. The fact you got faster on Linux than Windows means it's really fixed. You can also try |
FYI for everyone, I added the support for Event Counters on the benchmarking service (--collect-counters) and it's tracked with every ASP.NET scenario now. We'll get continuous charts for these numbers too now. Sneak peek:
|
Ok, I was able to use the same machine as @sebastienros and confirm that the problem is gone. Changesdotnet/coreclr#24889 - I have reduced the number of calls to an expensive native API by half, which made dotnet/coreclr#24973 - I have removed the lock that was part of native implementation of dotnet/coreclr#25117 - I have removed the expensive ref counting of 25117The most recent change had the biggest impact on the beefy Citrine Physical Machines (14 Core(s), 28 Logical Processor(s)) before
after
Linux vs Windows after all changesLinux vs Windows after all fixes for the ASP.NET perf lab machines Local Physical servers (E5-1650 v3 @ 3.50 GHz w/ 15MB Cache 6 Cores / 12 Threads 32GB of RAM.)110k RPS for Linux and 90k for Windows Citrine Physical Machines (Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz, 2195 Mhz, 14 Core(s), 28 Logical Processor(s) 32GB of RAM.)240k RPS for Linux and 265k RPS for Windows Azure VMs (D3_v2 4 virtual cores. 4GB of RAM)28k RPS for both Linux and Windows SummaryFor the Azure VMs scenarios both OSes are even, for Local Physical servers Linux is now 20% faster, for Citrine Physical Machines Windows is still 10% faster. I currently run out of ideas for how to close the 10% gap for the Citrine Physical Machines machines, the trace does not leave much space for optimizations. @sebastienros please close the issue after you verify that the problem is gone |
@adamsitnik can we close this issue now? |
Yes, we can. @sebastienros I verified that the fix works with the latest coreclr packages that were pushed to core-setup in the last 24h. |
Before and after the changes from @adamsitnik |
While investigating why the Data Updates TechEmpower benchmark was much slower on Linux than on Windows, we identified the usage of
InvariantCultureIgnoreCase
in the Npgsql driver was responsible.Seems worst on ASP.NET (6 times slower) than benchmarkdotnet (2 times slower), where there are no concurrent calls.
Here is the code that was used in BDN and ASP.NET to repro the numbers:
On ASP.NET, with 100 inserts, we have 80K RPS on Windows and 15K RPS on Linux.
The trace doesn't go further than this
Linux
Windows
The text was updated successfully, but these errors were encountered: