Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve single-range IgnoreCase codegen for RegexCompiler / source generator #62647

Closed
stephentoub opened this issue Dec 10, 2021 · 1 comment · Fixed by #67365
Closed

Improve single-range IgnoreCase codegen for RegexCompiler / source generator #62647

stephentoub opened this issue Dec 10, 2021 · 1 comment · Fixed by #67365

Comments

@stephentoub
Copy link
Member

Given a set like [A-F] with RegexOptions.IgnoreCase, we'll produce a set that's actually [A-Fa-f] (we do a limited form of this today and will do a more complete job of it after #61048). When emitting matching code for this, we'll emit it today as:

((ch = span[i]) < 128 && ("\0\0\0\0~\0~\0"[ch >> 4] & (1 << (ch & 0xF))) != 0)

but we could instead emit it as:

(((uint)span[i] | 0x20) - 'a' <= (uint)('f' - 'a'))

This will entail updating

private static string MatchCharacterClass(bool hasTextInfo, RegexOptions options, string chExpr, string charClass, bool caseInsensitive, HashSet<string>? additionalDeclarations)
to handle the additional special-case. It already has a special-case for a range, and it already has a special-case for casing, we just need to add one that combines them. There may be additional cases to consider as well.

@ghost
Copy link

ghost commented Dec 10, 2021

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Given a set like [A-F] with RegexOptions.IgnoreCase, we'll produce a set that's actually [A-Fa-f] (we do a limited form of this today and will do a more complete job of it after #61048). When emitting matching code for this, we'll emit it today as:

((ch = span[i]) < 128 && ("\0\0\0\0~\0~\0"[ch >> 4] & (1 << (ch & 0xF))) != 0)

but we could instead emit it as:

(((uint)span[i] | 0x20) - 'a' <= (uint)('f' - 'a'))

This will entail updating

private static string MatchCharacterClass(bool hasTextInfo, RegexOptions options, string chExpr, string charClass, bool caseInsensitive, HashSet<string>? additionalDeclarations)
to handle the additional special-case. It already has a special-case for a range, and it already has a special-case for casing, we just need to add one that combines them. There may be additional cases to consider as well.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 10, 2021
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Jan 11, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 5, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants