Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why we use CompareInfo.Invariant for Ordinal? #27738

Closed
iSazonov opened this issue Oct 25, 2018 · 8 comments
Closed

Why we use CompareInfo.Invariant for Ordinal? #27738

iSazonov opened this issue Oct 25, 2018 · 8 comments
Milestone

Comments

@iSazonov
Copy link
Contributor

I am not so familiar with unicode in-depth and I am amazed to see that
public int IndexOf(char value, StringComparison comparisonType)
and all IndexOf overloads for strings like
public int IndexOf(string value, int startIndex, int count, StringComparison comparisonType)
call CompareInfo.Invariant.IndexOf() for both InvariantCulture (InvariantCultureIgnoreCase) and Ordinal (OrdinalIgnoreCase).

In the same time we have IndexOf overloads for chars which use really ordinal comparison from SpanHelpers with SIMD accelerations.
public int IndexOf(char value) => SpanHelpers.IndexOf(ref _firstChar, value, Length);

We even have [public static int CompareOrdinal(string strA, int indexA, string strB, int indexB, int length) }(https://source.dot.net/#System.Private.CoreLib/shared/System/String.Comparison.cs,341906879aa2feb9).
(internally SpanHelpers with SIMD accelerations.)


I'd expect the same for strings too if StringComparison is Ordinal or OrdinalIgnoreCase.


In the case we could implement very fast SIMD accelerated methods like IndexOf(), LastIndexOf(), Replace() and perhaps some more.

For OrdinalIgnoreCase we could implement unicode simple case folding (#17233). With this enhancement we could to get great wins in RegEx (see Rust RegEx lib) and simplify Boyer-Moore implementation (which is also useful in IndexOf() and Replace() - #20674 and https://github.com/dotnet/coreclr/issues/6918).
Some languages (like PowerShell) could get great wins in identifier comparisons.

@jkotas
Copy link
Member

jkotas commented Oct 25, 2018

This is duplicate of https://github.com/dotnet/coreclr/issues/19672. The char case was fixed already.

@iSazonov
Copy link
Contributor Author

iSazonov commented Oct 25, 2018

@jkotas Is your link correct?
And my request is for strings not chars.

@jkotas
Copy link
Member

jkotas commented Oct 25, 2018

@jkotas Is your link correct?

It is correct now.

And my request is for strings not chars.

The linked issue is for string.IndexOf(.... If you would like to submit PRs with more fixes towards this, we will be happy to take them.

@jkotas jkotas closed this as completed Oct 25, 2018
@iSazonov
Copy link
Contributor Author

Why the fix is not replicated to CoreFX? https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/String.Searching.cs#L55

I am not ready to submit PRs but I will.

@jkotas
Copy link
Member

jkotas commented Oct 25, 2018

@Anipik The recent change in String.Searching.cs was not mirrored correctly to corefx. Could you please take a look?

@Anipik
Copy link
Contributor

Anipik commented Oct 25, 2018

@jkotas there is no missing commit for this change. we will have to manually open the PR to correct it.
i can open PR to correct it in corefx and make sure the commit is not mirrored back to coreclr or corert(i.e mirror didnot get stuck due to merge conflicts)

@jkotas
Copy link
Member

jkotas commented Oct 25, 2018

That would be great. Could you please do that?

@Anipik
Copy link
Contributor

Anipik commented Oct 25, 2018

yeah i am on it

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants