Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix repeat of preferred empty match #137

Merged
merged 1 commit into from
May 28, 2021
Merged

Conversation

sjamesr
Copy link
Contributor

@sjamesr sjamesr commented May 27, 2021

Original go fix:

golang/go@2a61b3c

Change description from that commit:

In Perl mode, (|a)* should match an empty string at the start of the
input. Instead it matches as many a's as possible.
Because (|a)+ is handled correctly, matching only an empty string,
this leads to the paradox that e* can match more text than e+
(for e = (|a)) and that e+ is sometimes different from ee*.

The current code treats e* and e+ as the same structure, with
different entry points. In the case of e* the preference list ends up
not quite in the right order, in part because the "before e" and
"after e" states are the same state. Splitting them apart fixes the
preference list, and that can be done by compiling e* as if it were
(e+)?.

Fixes #136.

@google-cla google-cla bot added the cla: yes label May 27, 2021
@sjamesr sjamesr requested a review from adonovan May 27, 2021 16:44
@sjamesr sjamesr changed the title port fix of https://github.com/golang/go/issues/46123 to RE2/J fix repeat of preferred empty match May 27, 2021
@sjamesr sjamesr force-pushed the port_go_fix branch 2 times, most recently from c8ff00d to ccfa6ab Compare May 27, 2021 16:56
Ports fix of golang/go#46123 to RE2/J.

Original go fix:

golang/go@2a61b3c

Change description from that commit:

In Perl mode, (|a)* should match an empty string at the start of the
input. Instead it matches as many a's as possible.
Because (|a)+ is handled correctly, matching only an empty string,
this leads to the paradox that e* can match more text than e+
(for e = (|a)) and that e+ is sometimes different from ee*.

The current code treats e* and e+ as the same structure, with
different entry points. In the case of e* the preference list ends up
not quite in the right order, in part because the "before e" and
"after e" states are the same state. Splitting them apart fixes the
preference list, and that can be done by compiling e* as if it were
(e+)?.

Fixes google#136.
Copy link
Collaborator

@adonovan adonovan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is almost textually identical, which makes for easy review.

Thanks!

@sjamesr sjamesr merged commit 982390a into google:master May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regexp: (|a)* matches more text than (|a)+ does
2 participants