-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance issue in ReusableStringBuilder.cs with large string and many appends #1593
Comments
@kfritz-msft was it really going through this loop only once? I did a trivial test below and it takes 0.3sec. Which isn't to say it can't be improved. namespace ConsoleApp1
{
class Program
{
static int Main(string[] args)
{
var str1 = new string('x', 45000000);
var str2 = new string('x', 45000000);
var sb1 = new StringBuilder(str1);
var sb2 = new StringBuilder(str2);
var sw = new Stopwatch();
sw.Start();
for (int i = sb1.Length - 1; i >= 0; --i)
{
if (sb1[i] != sb2[i])
{
Console.WriteLine("not equal");
return 1;
}
}
sw.Stop();
Console.WriteLine("equal");
Console.WriteLine($"Elapsed {sw.Elapsed}");
return 0;
}
}
} |
Thanks for taking a look. Yes, that version is fast as the StringBuilder is initialized with the right size and there are no appends. I should have included a full repro app, sorry. The problem only happens when indexing into a StringBuilder that was filled up with a lot of appends. If you pre-init the size, it is fine. I think the StringBuilder was created with default size, and the file I had was really big, and each line got appended into the StringBuilder. I didn't debug actual MSBuild code, but this should repro the behavior I saw when running MSBuild (it will appear to hang when the last loop is executing): using System; namespace ConsoleApplication1
} |
I realized I never opened an issue for this. I now opened one. Any fix would not be in Desktop for some time, so it would probably be reasonable for MSBuild to work around this (in both places here and here) with a simple check of the length eg., over 500 or 1000 then do Perhaps you'd like to offer such a PR @kfritz-msft ? I am not on the MSBuild team but I expect they'd take it if you included some simple measurements. |
@Alois-xx I would imagine the change would be something like if (length > 40_000)
{
return String.Equals(_borrowedBuilder.ToString(), other, StringComparison.Ordinal);
} in the two places linked, plus explanatory comment. Note: one of those puts the length in a local, both should probably do that, as it improves the JITted code. Plus a test that puts >40K into a string to hit both places. And MSBuild folks may want to ask for some perf measurement because of the extra branch in this "hot" path. @davkean maybe you could assign to @Alois-xx if you agree with this change |
@danmosemsft ,@davkean: I have create a PR for that 1593_ReusableStringBuilderPerf. Local profilling with a sample msbuild target that appends all dlls from Windows\System32*.dll up to 5 times and then compares it against a string shows an improvement of a factor two. Other issues talking about an exponential perf hit are not due to the StringBuilder issue but could be due to msbuild batching where a target is call n^x times which could indeed lead to dramatic slowdowns. The Stringbuilder issue a "simple" O(n) problem which can be improved by a factor two, but if the target is called many times one should rework the msbuild script to create smaller lists. |
Nice, don't see the PR? |
…#2834) Fix for issue #1593 where MSBuild spends a long time checking if a string is equal char by char of a large StringBuilder instance. * Profiling showed that StringBuilder[xx] character access performance is much slower compared to create the string and compare the two string directly. * A second optimization is to cache the created string until the StringBuilder is modified again which gives another 15%. This issue pops up when large directories with many files are stored in MSBuild lists which are then compared for equality which will internally cause MSBuild to stringify and compare the large lists.
Thanks @Alois-xx ! When MSBuild can depend on .NET Core 2.1, your code can be replaced with the new Equals overload and become more efficient. |
And thanks @kfritz-msft for flagging this so we could fix it. |
I’ve hit a bottleneck related to the stringbuilder usage in msbuild.
The problem is with this code (starting at line 86):
In my scenario, the borrowedBuilder was created by reading a file with about 300K lines (so at least that many Append() calls), and total file size is about 45MB (file is admittedly larger than it needs to be, and will be addressed independently).
I had built a quick demo app that simulates this, and the problem is with indexing into the string builder to get the character. It takes forever to run this loop (maybe an hour?). In my demo app, if I were to create a temporary string from the string builder, the comparison loop is really fast (maybe a second?). I didn’t do actual measurements since it is very obviously different.
I also just noticed that if a StringBuilder is initialized with the file size as capacity, the problem also goes away. I didn't trace throughout all of the code, so maybe that isn't trivial to get? In my hacky demo app, I just did this to mitigate it:
But even if that isn't possible, calling ToString() on the string builder and using that for comparison is a huge improvement when the size is large.
The text was updated successfully, but these errors were encountered: