Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Regex matching behavior in InvariantCulture #58956

Closed
veanes opened this issue Sep 10, 2021 · 5 comments
Closed

Inconsistent Regex matching behavior in InvariantCulture #58956

veanes opened this issue Sep 10, 2021 · 5 comments

Comments

@veanes
Copy link
Contributor

veanes commented Sep 10, 2021

Description

RegexOptions.None and RegexOptions.Compiled give different answers in InvariantCulture when ignoring case and involving \u0130 (Turkish I with dot).
In this case Compiled gives the wrong match that starts with \u0130.
The expected match below is "II".

Configuration

.NET 5.0

Regression?

Not sure if this is a regression. In earlier versions System.Globalization.CultureInfo.CurrentCulture
cannot be set, I believe.

Other information

This is a repo:

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = "(?i:iI+)";
            string input = "abc\u0130IIxyz";
            var culture = System.Globalization.CultureInfo.CurrentCulture;
            System.Globalization.CultureInfo.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture;
            Regex re = new Regex(pattern);
            Regex reC = new Regex(pattern, RegexOptions.Compiled);
            System.Globalization.CultureInfo.CurrentCulture = culture;
            Console.WriteLine("correct:" + re.Match(input).Value);
            Console.WriteLine("incorrect:" + reC.Match(input).Value);
        }
    }
}

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Sep 10, 2021
@ghost
Copy link

ghost commented Sep 10, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

RegexOptions.None and RegexOptions.Compiled give different answers in InvariantCulture when ignoring case and involving \u0130 (Turkish I with dot).
In this case Compiled gives the wrong match that starts with \u0130.
The expected match below is "II".

Configuration

.NET 5.0

Regression?

Not sure if this is a regression. In earlier versions System.Globalization.CultureInfo.CurrentCulture
cannot be set, I believe.

Other information

This is a repo:

using System;
using System.Text.RegularExpressions;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            string pattern = "(?i:iI+)";
            string input = "abc\u0130IIxyz";
            var culture = System.Globalization.CultureInfo.CurrentCulture;
            System.Globalization.CultureInfo.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture;
            Regex re = new Regex(pattern);
            Regex reC = new Regex(pattern, RegexOptions.Compiled);
            System.Globalization.CultureInfo.CurrentCulture = culture;
            Console.WriteLine("correct:" + re.Match(input).Value);
            Console.WriteLine("incorrect:" + reC.Match(input).Value);
        }
    }
}

Author: veanes
Assignees: -
Labels:

area-System.Text.RegularExpressions, untriaged

Milestone: -

@GrabYourPitchforks
Copy link
Member

Possible dupe of #58958.

@jeffhandley jeffhandley added this to the 6.0.0 milestone Sep 10, 2021
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 10, 2021
@stephentoub stephentoub assigned stephentoub and unassigned pgovind Sep 21, 2021
@stephentoub stephentoub removed their assignment Sep 21, 2021
@stephentoub stephentoub modified the milestones: 6.0.0, Future Sep 21, 2021
@stephentoub
Copy link
Member

This is not a regression. Same behavior repros on .NET Framework 4.8.

@jeffhandley
Copy link
Member

@pgovind FYI on Stephen's findings.

This is not a regression. Same behavior repros on .NET Framework 4.8.

@pgovind
Copy link

pgovind commented Sep 21, 2021

Yea we have issues around culture when dealing with u\0130. The fundamental bug here is #36147. IMO, we should close this as a dupe of #36147.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants