Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around CLR unicode bug with classification of U+2028 #9519

Closed

Conversation

gafter
Copy link
Member

@gafter gafter commented Mar 7, 2016

Fixes #9484

@dotnet/roslyn-compiler Please review

@gafter gafter added Bug Concept-API This issue involves adding, removing, clarification, or modification of an API. Area-Compilers 4 - In Review A fix for the issue is submitted for review. labels Mar 7, 2016
@gafter gafter self-assigned this Mar 7, 2016
@gafter gafter added this to the 2.0 (Preview) milestone Mar 7, 2016
@cston
Copy link
Member

cston commented Mar 7, 2016

LGTM

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

CC @ellismg

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

@gafter future-stabilization is basically an ask mode branch. This doesn't intuitively seem like an ask mode issue. Am I missing a case?

@gafter
Copy link
Member Author

gafter commented Mar 7, 2016

Why did you assign the issue to 2.0 (Preview)? Isn't that the milestone for this preview?

@jaredpar
Copy link
Member

jaredpar commented Mar 7, 2016

@gafter ah okay. That's my mistake.

// U+2028 "LINE SEPARATOR" should be classified by CharUnicodeInfo.GetUnicodeCategory as
// UnicodeCategory.LineSeparator but due to a bug is incorrectly categorized as
// UnicodeCategory.Format. See https://github.com/dotnet/coreclr/issues/3542
replaceWith = "\\u2028";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of calls to CharUnicodeInfo.GetUnicodeCategory in our codebase and some of those sites can still suffer from the bug in this API. For example (in src\Compilers\Core\Portable\UnicodeCharacterUtilities.cs):

            UnicodeCategory cat = CharUnicodeInfo.GetUnicodeCategory(ch);
            return IsLetterChar(cat)
                || IsDecimalDigitChar(cat)
                || IsConnectingChar(cat)
                || IsCombiningChar(cat)
                || IsFormattingChar(cat);

Instead of patching this particular call site in this manner, we should consider creating a wrapper for CharUnicodeInfo.GetUnicodeCategory, which would make necessary adjustments to the value it returns and simply change all call sites to use that wrapper.

@gafter
Copy link
Member Author

gafter commented Mar 7, 2016

Moving to the future branch.

@gafter gafter closed this Mar 7, 2016
@gafter gafter removed the 4 - In Review A fix for the issue is submitted for review. label Mar 7, 2016
@gafter gafter deleted the future-stabilization-9484 branch May 24, 2018 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Bug cla-already-signed Concept-API This issue involves adding, removing, clarification, or modification of an API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants