You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using RegexOptions.Multiline, a character $ in a pattern represents end-of-line, which should match both LF or CRLF line endings in case the input is using any of the two variations for line endings, but today we only look for LF endings. This causes issues when using patterns that are supposed to match at the end of each line, since it is not always the case where pattern-authors will consider that the extra return char might probably match depending on the input's line endings. I also checked against different regex engines like PCRE and they do match $ to both types of line endings.
Quick Repro:
If you have a pattern that is trying to get the last word of each line, you might do something like:
vartestString="This is the first example\r\n This is the second example\r\n";vartestString2="This is another example\n This does work\n";varregex=newRegex(@"[^s]+$",RegexOptions.MultiLine);varresult=regex.Matches(testString);// doesn't matchvarresult2=regex.Matches(testString2);// This does work
This will not work today with any of our engines, since \r in won't match [^s] and it won't match $ either.
The text was updated successfully, but these errors were encountered:
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.
Issue Details
When using RegexOptions.Multiline, a character $ in a pattern represents end-of-line, which should match both LF or CRLF line endings in case the input is using any of the two variations for line endings, but today we only look for LF endings. This causes issues when using patterns that are supposed to match at the end of each line, since it is not always the case where pattern-authors will consider that the extra return char might probably match depending on the input's line endings. I also checked against different regex engines like PCRE and they do match $ to both types of line endings.
Quick Repro:
If you have a pattern that is trying to get the last word of each line, you might do something like:
vartestString="This is the first example\r\n This is the second example\r\n";varregex=newRegex(@"[^s]+$",RegexOptions.MultiLine);varresult=regex.Matches(testString);
This will not work today with any of our engines, since \r in won't match [^s] and it won't match $ either.
I think we should close this in favor of #25598. We aren't going to change the default behavior of $ as it would be super breaking. The best we can do is add RegexOptions.AnyNewLine which will tell us to make $ match \r\n or \n.
(By the way, we should probably not use the word "match" in this context. $ doesn't ever match anything, it is an assertion the match must satisfy. The docs don't help, eg., this says "match":
By default, $ matches only the end of the input string. If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.
When using
RegexOptions.Multiline
, a character$
in a pattern represents end-of-line, which should match bothLF
orCRLF
line endings in case the input is using any of the two variations for line endings, but today we only look forLF
endings. This causes issues when using patterns that are supposed to match at the end of each line, since it is not always the case where pattern-authors will consider that the extra return char might probably match depending on the input's line endings. I also checked against different regex engines like PCRE and they do match$
to both types of line endings.Quick Repro:
If you have a pattern that is trying to get the last word of each line, you might do something like:
This will not work today with any of our engines, since
\r
in won't match[^s]
and it won't match$
either.The text was updated successfully, but these errors were encountered: