Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize XmlReader Using A Stream #6847

Closed
benvillalobos opened this issue Sep 14, 2021 · 5 comments · Fixed by #6863
Closed

Initialize XmlReader Using A Stream #6847

benvillalobos opened this issue Sep 14, 2021 · 5 comments · Fixed by #6863

Comments

@benvillalobos
Copy link
Member

benvillalobos commented Sep 14, 2021

Issue Description

See https://devdiv.visualstudio.com/DevDiv/_git/VS/pullrequest/349706.

When XmlReader.Create is called and a file path is passed in, it's converted to a URI under the hood. This can mangle multi-byte characters. The solution (in the link above) is to initialize the XmlReader using a stream to the file instead of the path alone.

Note this isn't an issue that we know of yet, the repro steps would fail if they were.

Steps to Reproduce

create dir with 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵 as part of the path

dotnet new console -o proj

create transform.xslt in proj dir, copy contents from below

add foo target (below) to proj

msbuild proj

should see success

Foo target:

 <Target Name="Foo" AfterTargets="Build">

      <XslTransformation XslInputPath="transform.xslt" XmlInputPaths="$(MSBuildThisFileFullPath)"

        OutputPaths="$(IntermediateOutputPath)output.xml"

        Parameters="&lt;Parameter Name='Parameter1' Value='$(Parameter1)'/&gt;" />

  </Target>

transform.xslt

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">

  <html>

  <body>

    <h2>My CD Collection</h2>

    <table border="1">

      <tr bgcolor="#9acd32">

        <th>Title</th>

        <th>Artist</th>

      </tr>

      <tr>

        <td><xsl:value-of select="catalog/cd/title"/></td>

        <td><xsl:value-of select="catalog/cd/artist"/></td>

      </tr>

    </table>

  </body>

  </html>

</xsl:template>

</xsl:stylesheet>

Build succeeds.

Analysis

This isn't an issue yet, but could be down the line.

Our code calls XmlReader.Create in two locations within the XslTransformation class which can be changed easily.

The complication comes from XmlReaderExtension, which initializes an XmlReader via new XmlTextReader(...) and a path is passed in. We should avoid this and call .Create() passing in a stream, but this comment suggests that would be a breaking change:

            // Ignore loadAsReadOnly for now; using XmlReader.Create results in whitespace changes
            // of attribute text, specifically newline removal.
            // https://github.com/Microsoft/msbuild/issues/4210

#4210 is something @ladipro had worked on so you might have a better idea of what to do here. Thoughts?

benvillalobos@6943a55 was my proposed fix before digging into #4210

@benvillalobos benvillalobos added bug needs-triage Have yet to determine what bucket this goes in. labels Sep 14, 2021
@benvillalobos
Copy link
Member Author

Bug Triage Notes: Maybe we can add a unit test for XmlReaderExtension that runs with multi-byte characters and modify how the URI is created. There might be some setting in the constructor to explicitly look for multi-byte characters and it'll "just work"

@benvillalobos benvillalobos removed the needs-triage Have yet to determine what bucket this goes in. label Sep 16, 2021
@benvillalobos
Copy link
Member Author

@tmeschter you've been investigating this quite a bit. If this currently works for us with no issues (loading a project with unicode characters should break the xmlreader but doesn't), do we need to worry about this eventually breaking for us?

@tmeschter
Copy link
Contributor

tmeschter commented Sep 16, 2021

It's not all Unicode characters; it may be specific to characters from the Unicode private use area. I'm not sure the string the "Steps to Reproduce" will be enough to trigger it.

Can you try

Ⅻㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]

instead? That's from the original bug.

@benvillalobos
Copy link
Member Author

Oof, does this look familiar?

"C:\src\temp\9-16\11_25\new\???????????????'`?-?        ????U1[?????]U2[????]U3[?????]\proj1\proj1.csproj" (default target) (1
) ->
(Foo target) ->
  C:\src\temp\9-16\11_25\new\???????????????'`?-?       ????U1[?????]U2[????]U3[?????]\proj1\proj1.csproj(11,5): error MSB370
4: Unable to load the specified Xslt. Could not find a part of the path 'C:\src\temp\9-16\11_25\new\???????????????'`?-
?       ????U1[%EE%80%A5%EE%80%A6%EE%80%A7%EE%80%B8%EE%80%B9]U2[%EE%89%9A%EE%89%9B%EE%89%AC%EE%89%AD]U3[%EE%93%BE%EE%93%BF%EE
%94%80%EE%94%8B%EE%94%8C]\proj1\transform.xslt'.

    0 Warning(s)
    1 Error(s)

Time Elapsed 00:00:01.29

C:\src\temp\9-16\11_25\new\Ⅻㄨㄩ啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]>

Very strange considering XmlReaderExtension creates a URI based off of the filepath. In theory this should be failing on project load, not during the XslTransformation task.

I get this output trying to use XmlReader.Create using a StreamReader to the file.

"C:\src\temp\9-16\11_25\new\???????????????'`?-?        ????U1[?????]U2[????]U3[?????]\proj1\proj1.csproj" (default target) (1) ->
(Foo target) ->
  C:\src\temp\9-16\11_25\new\???????????????'`?-?       ????U1[?????]U2[????]U3[?????]\proj1\proj1.csproj(11,5): error MSB3704: Unable to load the specified Xslt. Could not find a part of the path 'C:\src\temp\9-16\11_25\new\?????????????
??'`?-? ????U1[%EE%80%A5%EE%80%A6%EE%80%A7%EE%80%B8%EE%80%B9]U2[%EE%89%9A%EE%89%9B%EE%89%AC%EE%89%AD]U3[%EE%93%BE%EE%93%BF%EE%94%80%EE%94%8B%EE%94%8C]\proj1\transform.xslt'.

    0 Warning(s)
    1 Error(s)

What happens if I use the XmlReaderExtension in XslTransform?

Done Building Project "C:\src\temp\9-16\11_25\new\???????????????'`?-?  ????U1[?????]U2[????]U3[?????]\proj1\proj1.csproj" (default targets).


Build succeeded.
    0 Warning(s)
    0 Error(s)

Looks like the way XmlReaderExtension loads the URI is the proper way to load unicode characters.

                // Note: Passing in UTF8 w/o BOM into StreamReader. If the BOM is detected StreamReader will set the
                // Encoding correctly (detectEncodingFromByteOrderMarks = true). The default is to use UTF8 (with BOM)
                // which will cause the BOM to be added when we re-save the file in cases where it was not present on
                // load.
                _stream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read);
                _streamReader = new StreamReader(_stream, s_utf8NoBom, detectEncodingFromByteOrderMarks: true);
                Encoding detectedEncoding;

                // The XmlDocumentWithWithLocation class relies on the reader's BaseURI property to be set,
                // thus we pass the document's file path to the appropriate xml reader constructor.
                Reader = GetXmlReader(file, _streamReader, loadAsReadOnly, out detectedEncoding);


...

            string uri = new UriBuilder(Uri.UriSchemeFile, string.Empty) { Path = file }.ToString();


            // Ignore loadAsReadOnly for now; using XmlReader.Create results in whitespace changes
            // of attribute text, specifically newline removal.
            // https://github.com/Microsoft/msbuild/issues/4210
            XmlReader reader = new XmlTextReader(uri, input) { DtdProcessing = DtdProcessing.Ignore };

Looks like we need to replace XmlReader.Create(filepath) with

XmlReader.Create(new StreamReader(_data, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), detectEncodingFromByteOrderMarks: true))

Will get a PR up with this fix. Thanks!

@tmeschter
Copy link
Contributor

Yes, that's exactly the sort of issue error message I was seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants