Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSBuild properties to analyzer config break in presence of newlines #43970

Open
chsienki opened this issue May 4, 2020 · 7 comments
Open

MSBuild properties to analyzer config break in presence of newlines #43970

chsienki opened this issue May 4, 2020 · 7 comments

Comments

@chsienki
Copy link
Contributor

chsienki commented May 4, 2020

We transform MSBuild properties to global analyzer configs via the GenerateMSBuildAnalyzerConfig task.

In MSBuild it's permissible for properites to contain newlines, which transfers into the resulting config, breaking downstream parsing.

For example:

<PropertyGroup>
  <PropWithNewLine>this
is 
a
valid = property
  </PropWithNewLine>
</PropertyGroup>

would result in

is_global = true
msbuild_property.PropWithNewLine = this
is 
a
valid = property

This is true for semicolons too. These are common in msbuild as its the default combine character for a list

For example:

<ItemGroup>
  <Item1 Include="a" />
  <Item1 Include="b" />
  <Item1 Include="c" />
</ItemGroup>
<PropertyGroup>
  <Item1Combined>@(Item1)</Item1Combined>
</PropertyGroup>

Will result in

build_property.Item1Combined = a;b;c

In editorconfig ; is the comment character, meaning only a will be passed to the consumer.

@chsienki
Copy link
Contributor Author

chsienki commented Aug 4, 2020

I think the simplest solution would be to transform the values via something well-known, and special case transforming them back inside the compiler.

My current thinking is to base64encode the values in the config file, and special case build_{property,metadata} parsing to decode them.

@AArnott
Copy link
Contributor

AArnott commented Sep 7, 2021

base64 will be a significant obfuscation, which will make some build failure investigations much harder (like the one that led me to file a dupe bug of this in the first place). If instead you applied URI-encoding to any .editorconfig-reserved character as msbuild itself does to xml-reserved chars, then most values will appear in plaintext, and only those characters that require special treatment will be obfuscated.

@Sergio0694
Copy link
Contributor

Just hit this in #64917. The only workaround is splitting with , and keeping everything on one line, which is extremely clunky to say the least unfortunately. Is this issue on the working set and/or does it have an approximate milestone? Would you accept contributions for this? 🙂

@viceroypenguin
Copy link
Contributor

I have found an alternative (though not pretty). Use \uxxxx type character encodings in the string and Regex.Unescape() to decode them. This is how I'm encoding semicolons and hashes in a property that I need to include in my source generator. Simpler than trying to use alternative characters and translating them in the source generator. Doesn't help with the multi-line stuff though, since it would still have to be in a single line with \u000a separators.

@Youssef1313
Copy link
Member

Youssef1313 commented Jan 1, 2023

The EditorConfig spec was updated so that ; and # are only comments when they are the first character of a line. I think the compiler should adhere to the updated spec, and then we are done partially done with the issue. No special encoding/escaping needed.

https://spec.editorconfig.org/#no-inline-comments

Specification was updated in editorconfig/specification#31

Also related to #44596

#51625 will handle the ; and # cases.

The case of multiline still needs work.

@aetos382
Copy link

aetos382 commented Nov 19, 2023

This is a problem caused by the implementation of AnalyzerConfig.Parse method.

This method assumes that analyzer configuration is given by an EditorConfig style file.
The EditorConfig format is line-oriented and does not assume that properties have multi-line values.
However, analyzer configurations can also be specified in project files, in which case properties can have multi-line values.

The MSBuild documentation states that property values can also be given in XML format, but if they are multi-line, they will not be passed to analyzer correctly.

The EditorConfig specification cannot be controlled by Microsoft or .NET community.
I would prefer that AnalyzerConfig.Parse use its own file format that is upward compatible with EditorConfig, rather than updating the EditorConfig specification.

Note: This does not mean that we can write analyzer configurations in the new file format.
The compiler internally converts analyzer configurations from project files into the EditorConfig format and then parses it. The first step would be to update the intermediate format used in this process.

As a workaround for the moment, analyzer configurations with multi-line values could be given by AdditionalFiles for that purpose, rather than through project files.

@0xced
Copy link

0xced commented Aug 6, 2024

I just stumbled on this while trying to improve EmbeddedResourceGenerator by using the embedded resource LogicalName metadata.

As already mentioned in this issue, the generated editorconfig is problematic and the last part (;.txt) will be discarded when passed to the source generator.

[~/EmbeddedResourceGenerator/EmbeddedResourceAccessGenerator.Tests/2InvalidChars-\;.txt]
build_metadata.EmbeddedResource.LogicalName = EmbeddedResourceAccessGenerator.Tests.2InvalidChars-;.txt

Is there any chance that this problem will ever be solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants