Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore comments and whitespace when parsing read-only XML files #6232

Merged
merged 7 commits into from
Apr 27, 2021

Conversation

ladipro
Copy link
Member

@ladipro ladipro commented Mar 9, 2021

Fixes #2576

Context

The code already has support for parsing project files as read-only when we know that we're not going to be asked to write them back to disk. This flag is currently unused because the initial implementation in #3584 introduced a regression related to whitespace in attribute values (#4210).

Changes Made

Trivially reverted part of #4213 that addressed the regression and added a hack to make XmlReader behave the same as XmlTextReader.

Testing

Existing unit tests for correctness and ETW for performance. XmlDocumentWithLocation.Load() is ~26% faster with this change compared to baseline when building .NET Core projects. This translates to about 10 ms per one incremental CLI build of a Hello world application.

Notes

@ladipro
Copy link
Member Author

ladipro commented Mar 9, 2021

I should have read the issues more carefully. The problem is in the difference between new XmlTextReader() and XmlReader.Create() regardless of the XmlReaderSettings passed to the latter. Needs more work!

@ladipro ladipro marked this pull request as draft March 9, 2021 16:00
@ladipro ladipro changed the title Ignore comments when parsing read-only XML files Ignore comments and whitespace when parsing read-only XML files Mar 9, 2021
Copy link
Member Author

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied @KirillOsenkov's suggestions and am publishing the PR. Poking at internal members with reflection is admittedly ugly. However,

  • Full Framework is pretty much frozen so the risk of being broken by changes in System.Xml is very small.
  • Core is tracking this as an API suggestion and if the win is confirmed in the perf lab, we should be able to ultimately switch to calling a proper public API.

@ladipro ladipro marked this pull request as ready for review April 23, 2021 12:31
@@ -26,6 +28,9 @@ internal static XmlReaderExtension Create(string filePath, bool loadAsReadOnly)
private readonly Stream _stream;
private readonly StreamReader _streamReader;

private static volatile PropertyInfo _normalizationPropertyInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever one uses volatile one should add a comment explaining what is going on ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if explicitly calling Volatile.Read and Volatile.Write would make it more obvious. Comment added!

@KirillOsenkov
Copy link
Member

None of my comments are a deal breaker, so fix if you like or not, use your judgement.

Copy link
Member Author

@ladipro ladipro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback applied, thank you.

@@ -26,6 +28,9 @@ internal static XmlReaderExtension Create(string filePath, bool loadAsReadOnly)
private readonly Stream _stream;
private readonly StreamReader _streamReader;

private static volatile PropertyInfo _normalizationPropertyInfo;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if explicitly calling Volatile.Read and Volatile.Write would make it more obvious. Comment added!

if (propertyInfo == null)
{
BindingFlags bindingFlags = BindingFlags.NonPublic | BindingFlags.SetProperty | BindingFlags.Instance;
propertyInfo = xmlReaderType.GetProperty("Normalization", bindingFlags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why do you need to introduce a propertyInfo local variable as opposed to just modifying _normalizationPropertyInfo as relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a volatile field so I want to access it only when I really need to to avoid unnecessary memory barriers.

if (_normalizationPropertyInfo == null)
{
  ..
}
return _normalizationPropertyInfo;

would read the field twice on the fast path. The cost is small, limited to still rather exotic platforms, and negligible in the grand scheme of things, but a good practice anyway.

(In retrospect, explicit Volatile.Read and Volatile.Write may have been a better choice for readability.)

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Apr 26, 2021
@ladipro ladipro merged commit 1d1fec7 into dotnet:main Apr 27, 2021
@ladipro ladipro deleted the 2576-ignore-xml-comments branch April 27, 2021 18:45
@ladipro
Copy link
Member Author

ladipro commented May 5, 2021

[Post-mortem] The change hasn't proved impactful in the perf lab with any project types used in the runs. Consequently the runtime team is not moving forward with exposing Normalization. In the unlikely case that we hit issues in the future, this PR can be reverted without much loss.

Micro benchmarks are showing an improvement but the 10 ms end-to-end win I see on my machine is apparently not happening in the lab.

Parsing Microsoft.Common.CurrentVersion.targets (~6k lines of XML):
image

@ManickaP
Copy link
Member

When I run dotnet build with locally build runtime libs in Debug mode (with asserts) this causes the build to crash:

Process terminated. Assertion failed.
XmlTextReaderImpl.Normalization property cannot be changed on reader created via XmlReader.Create.
   at System.Xml.XmlTextReaderImpl.set_Normalization(Boolean value) in /home/manicka/repositories/runtime/src/libraries/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs:line 2093
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.InvokeOneParameter(Object obj, BindingFlags invokeAttr, Binder binder, Object parameter, CultureInfo culture) in /home/manicka/repositories/runtime/src/coreclr/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs:line 470
   at System.Reflection.RuntimePropertyInfo.SetValue(Object obj, Object value, BindingFlags invokeAttr, Binder binder, Object[] index, CultureInfo culture) in /home/manicka/repositories/runtime/src/coreclr/System.Private.CoreLib/src/System/Reflection/RuntimePropertyInfo.cs:line 369
   at System.Reflection.RuntimePropertyInfo.SetValue(Object obj, Object value, Object[] index) in /home/manicka/repositories/runtime/src/coreclr/System.Private.CoreLib/src/System/Reflection/RuntimePropertyInfo.cs:line 350
   at System.Reflection.PropertyInfo.SetValue(Object obj, Object value) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Reflection/PropertyInfo.cs:line 52
   at Microsoft.Build.Internal.XmlReaderExtension.GetXmlReader(String file, StreamReader input, Boolean loadAsReadOnly, Encoding& encoding) in Microsoft.Build.dll:token 0x6000c62+0x6c
   at Microsoft.Build.Internal.XmlReaderExtension..ctor(String file, Boolean loadAsReadOnly) in Microsoft.Build.dll:token 0x6000c5d+0x2c
   at Microsoft.Build.Construction.ProjectRootElement.LoadDocument(String fullPath, Boolean preserveFormatting, Boolean loadAsReadOnly) in Microsoft.Build.dll:token 0x60002da+0x39
   at Microsoft.Build.Construction.ProjectRootElement..ctor(String path, ProjectRootElementCacheBase projectRootElementCache, Boolean preserveFormatting) in Microsoft.Build.dll:token 0x600024a+0x51
   at Microsoft.Build.Construction.ProjectRootElement.CreateProjectFromPath(String projectFile, ProjectRootElementCacheBase projectRootElementCache, Boolean preserveFormatting) in Microsoft.Build.dll:token 0x60002d9+0x28
   at Microsoft.Build.Construction.ProjectRootElement.<>c.<OpenProjectOrSolution>b__208_0(String path, ProjectRootElementCacheBase cache) in Microsoft.Build.dll:token 0x6001e13+0x0
   at Microsoft.Build.Evaluation.ProjectRootElementCache.Get(String projectFile, OpenProjectRootElement openProjectRootElement, Boolean isExplicitlyLoaded, Nullable`1 preserveFormatting) in Microsoft.Build.dll:token 0x6000ae5+0x135
   at Microsoft.Build.Construction.ProjectRootElement.OpenProjectOrSolution(String fullPath, IDictionary`2 globalProperties, String toolsVersion, ProjectRootElementCacheBase projectRootElementCache, Boolean isExplicitlyLoaded) in Microsoft.Build.dll:token 0x60002cc+0x21
   at Microsoft.Build.Execution.ProjectInstance..ctor(String projectFile, IDictionary`2 globalProperties, String toolsVersion, BuildParameters buildParameters, ILoggingService loggingService, BuildEventContext buildEventContext, ISdkResolverService sdkResolverService, Int32 submissionId, Nullable`1 projectLoadSettings) in Microsoft.Build.dll:token 0x6000e3a+0x23
   at Microsoft.Build.BackEnd.BuildRequestConfiguration.<>c__DisplayClass60_0.<LoadProjectIntoConfiguration>b__0() in Microsoft.Build.dll:token 0x60024a0+0x14e
   at Microsoft.Build.BackEnd.BuildRequestConfiguration.InitializeProject(BuildParameters buildParameters, Func`1 loadProjectFromFile) in Microsoft.Build.dll:token 0x60014f7+0x10
   at Microsoft.Build.BackEnd.BuildRequestConfiguration.LoadProjectIntoConfiguration(IBuildComponentHost componentHost, BuildRequestDataFlags buildRequestDataFlags, Int32 submissionId, Int32 nodeId) in Microsoft.Build.dll:token 0x60014f6+0x60
   at Microsoft.Build.BackEnd.RequestBuilder.BuildProject() in Microsoft.Build.dll:token 0x600140f+0x0
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs:line 38
   at Microsoft.Build.BackEnd.RequestBuilder.BuildProject() in Microsoft.Build.dll:token 0x600140f+0x1c
   at Microsoft.Build.BackEnd.RequestBuilder.BuildAndReport() in Microsoft.Build.dll:token 0x6001406+0x26
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs:line 38
   at Microsoft.Build.BackEnd.RequestBuilder.BuildAndReport() in Microsoft.Build.dll:token 0x6001406+0x1c
   at Microsoft.Build.BackEnd.RequestBuilder.RequestThreadProc(Boolean setThreadParameters) in Microsoft.Build.dll:token 0x6001405+0x22
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs:line 38
   at Microsoft.Build.BackEnd.RequestBuilder.RequestThreadProc(Boolean setThreadParameters) in Microsoft.Build.dll:token 0x6001405+0x24
   at Microsoft.Build.BackEnd.RequestBuilder.<StartBuilderThread>b__52_2() in Microsoft.Build.dll:token 0x600141c+0x0
   at System.Threading.Tasks.Task`1.InnerInvoke() in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs:line 503
   at System.Threading.Tasks.Task.<>c.<.cctor>b__284_0(Object obj) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2369
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 183
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2327
   at System.Threading.Tasks.Task.ExecuteEntry() in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs:line 2240
   at System.Threading.Tasks.TaskScheduler.TryExecuteTask(Task task) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/TaskScheduler.cs:line 408
   at Microsoft.Build.BackEnd.RequestBuilder.DedicatedThreadsTaskScheduler.<InjectThread>b__6_0() in Microsoft.Build.dll:token 0x6002467+0x1a
   at System.Threading.Thread.StartHelper.Callback(Object state) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs:line 42
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) in /home/manicka/repositories/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs:line 183
   at System.Threading.Thread.StartCallback() in /home/manicka/repositories/runtime/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs:line 105
Aborted (core dumped)

I know that normally people do not run Debug builds, but I assume the assert was put there on purpose (it's been there for a long time):
https://github.com/dotnet/runtime/blob/ae64899ca74968c1c0dd2d81564c1b9fab12e2b4/src/libraries/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs#L2093

ladipro added a commit to ladipro/msbuild that referenced this pull request Jul 13, 2021
@ladipro
Copy link
Member Author

ladipro commented Jul 13, 2021

Opened #6669 to revert this PR.

benvillalobos pushed a commit that referenced this pull request Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't read comments when reading imports
5 participants