`StringComparison.OrdinalIgnoreCase` compares `"¡a"` and `"¡B"` incorrectly #71018

bgrainger · 2022-06-20T19:16:37Z

Description

In net5.0 and net48 on Windows, string.Compare("¡a", "¡B", StringComparison.OrdinalIgnoreCase) returns a value < 0 (specifically -1).

But In net6.0 and net7.0, that expression returns a value > 0 (specifically, 31).

The net5.0 result fulfills the meaning of StringComparison.OrdinalIgnoreCase; the net6.0 result does not.

Setting $env:DOTNET_SYSTEM_GLOBALIZATION_USENLS='true' restores the net5.0 behaviour, which indicates that this may be related to #30960?

(Even if this is the actual result returned by ICU for the comparison--and I'm not sure if that's true or not--it doesn't match this programmer's expectations for what StringComparison.OrdinalIgnoreCase means.)

Reproduction Steps

Program.cs:

using System;

// prints -1 for prefix <= U+007F, 31 for prefix >= U+0080
string prefix = "\u00A1";
Console.WriteLine(string.Compare($"{prefix}a", $"{prefix}B", StringComparison.OrdinalIgnoreCase));

Compare.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;net5.0;net6.0;net7.0</TargetFrameworks>
    <LangVersion>10.0</LangVersion>
  </PropertyGroup>

</Project>

Expected behavior

Per https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#ordinal-string-operations:

Case-insensitive ordinal comparisons are the next most conservative approach. These comparisons ignore most casing; for example, "windows" matches "Windows". When dealing with ASCII characters, this policy is equivalent to StringComparison.Ordinal, except that it ignores the usual ASCII casing. Therefore, any character in [A, Z] (\u0041-\u005A) matches the corresponding character in [a,z] (\u0061-\007A). Casing outside the ASCII range uses the invariant culture's tables.

Thus, it is expected that ¡ will be considered equal in both strings, then a will be compared to B and the first string will sort first by an case-insensitive ordinal comparison. That is, "¡a" sorts before "¡B" using a case-insensitive ordinal comparison.

Actual behavior

"¡a" sorts after "¡B" using a case-insensitive ordinal comparison.

Regression?

Yes. This worked correctly in net48 and net5.0 on Windows and Linux; I have not tested net5.0 and earlier on macOS.

Known Workarounds

Use StringComparison.InvariantCultureIgnoreCase.
Set the DOTNET_SYSTEM_GLOBALIZATION_USENLS environment variable to true.

Configuration

SDKs: 6.0.301; 7.0.100-preview.5.22307.18
Windows 10 19044.1766 x64

Other information

No response

The text was updated successfully, but these errors were encountered:

ghost · 2022-06-20T19:16:46Z

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

In net5.0 and net48 on Windows, string.Compare("¡a", "¡B", StringComparison.OrdinalIgnoreCase) returns a value < 0 (specifically -1).

But In net6.0 and net7.0, that expression returns a value > 0 (specifically, 31).

The net5.0 result fulfills the meaning of StringComparison.OrdinalIgnoreCase; the net6.0 result does not.

Setting $env:DOTNET_SYSTEM_GLOBALIZATION_USENLS='true' restores the net5.0 behaviour, which indicates that this may be related to #30960?

(Even if this is the actual result returned by ICU for the comparison--and I'm not sure if that's true or not--it doesn't match this programmer's expectations for what StringComparison.OrdinalIgnoreCase means.)

Reproduction Steps

Program.cs:

using System;

// prints -1 for prefix <= U+007F, 31 for prefix >= U+0080
string prefix = "\u00A1";
Console.WriteLine(string.Compare($"{prefix}a", $"{prefix}B", StringComparison.OrdinalIgnoreCase));

Compare.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFrameworks>net48;net5.0;net6.0;net7.0</TargetFrameworks>
    <LangVersion>10.0</LangVersion>
  </PropertyGroup>

</Project>

Expected behavior

Per https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#ordinal-string-operations:

Case-insensitive ordinal comparisons are the next most conservative approach. These comparisons ignore most casing; for example, "windows" matches "Windows". When dealing with ASCII characters, this policy is equivalent to StringComparison.Ordinal, except that it ignores the usual ASCII casing. Therefore, any character in [A, Z] (\u0041-\u005A) matches the corresponding character in [a,z] (\u0061-\007A). Casing outside the ASCII range uses the invariant culture's tables.

Thus, it is expected that ¡ will be considered equal in both strings, then a will be compared to B and the first string will sort first by an case-insensitive ordinal comparison. That is, "¡a" sorts before "¡B" using a case-insensitive ordinal comparison.

Actual behavior

"¡a" sorts after "¡B" using a case-insensitive ordinal comparison.

Regression?

Yes. This worked correctly in net48 and net5.0 on Windows; I have not tested net5.0 and earlier on macOS or Linux.

Known Workarounds

Use StringComparison.InvariantCultureIgnoreCase.
Set the DOTNET_SYSTEM_GLOBALIZATION_USENLS environment variable to true.

Configuration

SDKs: 6.0.301; 7.0.100-preview.5.22307.18
Windows 10 19044.1766 x64

Other information

No response

Author:	bgrainger
Assignees:	-
Labels:	`area-System.Globalization`
Milestone:	-

bgrainger · 2022-06-20T19:27:52Z

The prefix can be arbitrarily long (e.g., string prefix = "\u00A1aaaaaaaaaaaaaaaaaaaaaaaa";); the common factor seems to be that case differences between different ASCII characters are counted as significant after the first character with code point U+0080 or higher.

tarekgh · 2022-06-20T19:30:44Z

@bgrainger thanks for filing the issue. Yes, this is a bug.

tarekgh · 2022-06-21T00:36:23Z

@bgrainger just checking if this issue is a blocker issue for you for using .NET 6.0? I am asking to know if we need to consider porting it to .NET 6.0 or we can just fix it in .NET 7.0?

bgrainger · 2022-06-21T13:47:45Z

No, this isn't strictly a blocker for me. It did break my code's ability to read (using a binary search) some sorted data that was compiled into the assembly as an embedded resource; however, I was able to restore functionality across all .NET versions by recompiling the assembly (and the embedded data) with InvariantCultureIgnoreCase as the sort order for that data. ( haven't timed this workaround to see if it causes a slight reduction in performance.)

IMO it is a serious regression to leave unfixed in .NET 6.0. Any code that has sorted data according to StringComparer.OrdinalIgnoreCase and persisted it could (a) fail to read it correctly when running under .NET 6, or (b) create invalid data if building it under .NET 6.0. Thanks for your consideration.

tarekgh · 2022-06-21T16:06:26Z

@bgrainger thanks again for the information. We have decided to wait a little to see if there is a demand or more scenarios broken because of that. We'll keep our eyes open on that. If you came across more data regarding this issue, please share it with us. This can help in deciding in porting the fix.

dotnet-issue-labeler bot added the area-System.Globalization label Jun 20, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Jun 20, 2022

tarekgh removed the untriaged New issue has not been triaged by the area owner label Jun 20, 2022

tarekgh added this to the 7.0.0 milestone Jun 20, 2022

tarekgh self-assigned this Jun 20, 2022

tarekgh added bug Regression labels Jun 20, 2022

tarekgh mentioned this issue Jun 20, 2022

Fix Ordinal Ignore Case string compare #71022

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 20, 2022

tarekgh closed this as completed in #71022 Jun 21, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 21, 2022

ghost locked as resolved and limited conversation to collaborators Jul 21, 2022

jeffhandley removed the Regression label Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`StringComparison.OrdinalIgnoreCase` compares `"¡a"` and `"¡B"` incorrectly #71018

`StringComparison.OrdinalIgnoreCase` compares `"¡a"` and `"¡B"` incorrectly #71018

bgrainger commented Jun 20, 2022 •

edited

Loading

ghost commented Jun 20, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

bgrainger commented Jun 20, 2022

tarekgh commented Jun 20, 2022

tarekgh commented Jun 21, 2022

bgrainger commented Jun 21, 2022

tarekgh commented Jun 21, 2022

StringComparison.OrdinalIgnoreCase compares "¡a" and "¡B" incorrectly #71018

StringComparison.OrdinalIgnoreCase compares "¡a" and "¡B" incorrectly #71018

Comments

bgrainger commented Jun 20, 2022 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

ghost commented Jun 20, 2022

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

bgrainger commented Jun 20, 2022

tarekgh commented Jun 20, 2022

tarekgh commented Jun 21, 2022

bgrainger commented Jun 21, 2022

tarekgh commented Jun 21, 2022

`StringComparison.OrdinalIgnoreCase` compares `"¡a"` and `"¡B"` incorrectly #71018

`StringComparison.OrdinalIgnoreCase` compares `"¡a"` and `"¡B"` incorrectly #71018

bgrainger commented Jun 20, 2022 •

edited

Loading