-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp: (|a)* matches more text than (|a)+ does #46123
Labels
Milestone
Comments
rsc
added
the
NeedsFix
The path to resolution is known, but the work has not been done.
label
May 12, 2021
Change https://golang.org/cl/318750 mentions this issue: |
sjamesr
added a commit
to sjamesr/re2j
that referenced
this issue
May 27, 2021
Original go fix: golang/go@2a61b3c Change description from that commit: In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the "before e" and "after e" states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Fixes google#136.
sjamesr
added a commit
to sjamesr/re2j
that referenced
this issue
May 27, 2021
Ports fix of golang/go#46123 to RE2/J. Original go fix: golang/go@2a61b3c Change description from that commit: In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the "before e" and "after e" states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Fixes google#136.
sjamesr
added a commit
to sjamesr/re2j
that referenced
this issue
May 27, 2021
Ports fix of golang/go#46123 to RE2/J. Original go fix: golang/go@2a61b3c Change description from that commit: In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the "before e" and "after e" states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Fixes google#136.
sjamesr
added a commit
to sjamesr/re2j
that referenced
this issue
May 27, 2021
Ports fix of golang/go#46123 to RE2/J. Original go fix: golang/go@2a61b3c Change description from that commit: In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the "before e" and "after e" states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Fixes google#136.
sjamesr
added a commit
to google/re2j
that referenced
this issue
May 28, 2021
Ports fix of golang/go#46123 to RE2/J. Original go fix: golang/go@2a61b3c Change description from that commit: In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the "before e" and "after e" states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Fixes #136.
kvch
pushed a commit
to elastic/beats
that referenced
this issue
Oct 11, 2021
* format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies
kvch
pushed a commit
to elastic/beats
that referenced
this issue
Oct 11, 2021
* format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies (cherry picked from commit e6839ab)
newly12
pushed a commit
to newly12/beats
that referenced
this issue
Oct 13, 2021
* format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies
kvch
added a commit
to elastic/beats
that referenced
this issue
Oct 14, 2021
…28335) * Update go release version 1.17.1 (#27543) * format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies (cherry picked from commit e6839ab) * Fix build tags for Go 1.17 (#28338) * additional fixes * Disable generator tests temporarily (#28362) * Fix docker import generator for 1.17.1 (#28374) * Fix liblogparser * Update go.mod * update notice Co-authored-by: apmmachine <[email protected]> Co-authored-by: Aleksandr Maus <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: Alex K <[email protected]> Co-authored-by: Marc Guasch <[email protected]>
fearful-symmetry
pushed a commit
to elastic/beats
that referenced
this issue
Oct 20, 2021
* singleton sysinfo host to avoid frequently collecting host info * add Host object to Stats object * update changelog * set procStats.host to nil if any error calling sysinfo.Host() * Update aws-lambda-go library version to 1.13.3 (#28236) * [cloud][docker] use the private docker namespace (#28286) * [7.x] [DOCS] Update api_key example on elasticsearch output (#28288) * packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (#28297) * seccomp: allow clone3 syscall for x86 (#28117) clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details * Osquerybeat: Improve handling of osquery.autoload file, allow customizations (#28289) Previously the osquery.autoload file was overwritten every time on osquerybeat start and stamped with our extension. After the change we check the content of the file and do not overwrite it on each osquerybeat start. This allows the user to deploy their own extensions if their want and start osquery with that. * Osquerybeat: Runner and Fetcher unit tests (#28290) * Runner and Fetcher unit tests * Fix header formatting * Tweak test * Update go release version 1.17.1 (#27543) * format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies * Move labels and annotations under kubernetes.namespace. (#27917) * Move labels and annotations under kubernetes.namespace. * Remove GCP support from Functionbeat (#28253) * Fix build tags for Go 1.17 (#28338) * [Elastic Agent] Add ability to communicate with Kibana through service token (#28096) * Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand. * Add changelog entry. * Fix go fmt. * Add username to ASA Security negotiation log (#26975) * Add username to ASA Security negotiation log I added the username user.name field to ASA Security negotiation log line. * adding support for both formats * adding changelog entry * updating geo fields in expected output files * reverse formatting * reverting to older version of file * reverting formatting again * regenrate golden files again * remove formatting, ready for review * fixing missing message due to no newline * fix dissect pattern to fit correctly Co-authored-by: Marius Iversen <[email protected]> * x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (#28325) * Redis: remove deprecated fields (#28246) * Redis: remove deprecated fields * Disable generator tests temporarily (#28362) * Windows/perfmon metricset - remove deprecated perfmon.counters configuration (#28282) * remove deprecated config * changelog * [Filebeat] - S3 Input - Add support for only iterating/accessing only… (#28252) * [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (#28230) * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations * Take care of namespace being nil * [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878) partial fix for #27648 , this PR: Detects if the user is running as root then: Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present Attempts to Setuid , Setgid and Setgroups to that user / groups Invokes setcap to drop all privileges except NET_RAW+ep This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that. With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root. * mage fmt Co-authored-by: kaiyan-sheng <[email protected]> Co-authored-by: Victor Martinez <[email protected]> Co-authored-by: Ugo Sangiorgi <[email protected]> Co-authored-by: Dan Kortschak <[email protected]> Co-authored-by: Arnaud Lefebvre <[email protected]> Co-authored-by: Aleksandr Maus <[email protected]> Co-authored-by: apmmachine <[email protected]> Co-authored-by: Michael Katsoulis <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: Blake Rouse <[email protected]> Co-authored-by: LaZyDK <[email protected]> Co-authored-by: Marius Iversen <[email protected]> Co-authored-by: Andrea Spacca <[email protected]> Co-authored-by: Mariana Dima <[email protected]> Co-authored-by: Andrew Cholakian <[email protected]>
Icedroid
pushed a commit
to Icedroid/beats
that referenced
this issue
Nov 1, 2021
* format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies
Icedroid
pushed a commit
to Icedroid/beats
that referenced
this issue
Nov 1, 2021
* singleton sysinfo host to avoid frequently collecting host info * add Host object to Stats object * update changelog * set procStats.host to nil if any error calling sysinfo.Host() * Update aws-lambda-go library version to 1.13.3 (elastic#28236) * [cloud][docker] use the private docker namespace (elastic#28286) * [7.x] [DOCS] Update api_key example on elasticsearch output (elastic#28288) * packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (elastic#28297) * seccomp: allow clone3 syscall for x86 (elastic#28117) clone3 is a linux syscall that is now used by glibc starting version 2.34. It is used when pthread_create() gets called. Current seccomp filters do not allow this syscall leading to crashes like runtime/cgo: pthread_create failed: Operation not permitted See elastic/apm-server#6238 for more details * Osquerybeat: Improve handling of osquery.autoload file, allow customizations (elastic#28289) Previously the osquery.autoload file was overwritten every time on osquerybeat start and stamped with our extension. After the change we check the content of the file and do not overwrite it on each osquerybeat start. This allows the user to deploy their own extensions if their want and start osquery with that. * Osquerybeat: Runner and Fetcher unit tests (elastic#28290) * Runner and Fetcher unit tests * Fix header formatting * Tweak test * Update go release version 1.17.1 (elastic#27543) * format of conditional build tags has changed * matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123 * iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests * go.mod file contains separate require directive for transitive dependencies * Move labels and annotations under kubernetes.namespace. (elastic#27917) * Move labels and annotations under kubernetes.namespace. * Remove GCP support from Functionbeat (elastic#28253) * Fix build tags for Go 1.17 (elastic#28338) * [Elastic Agent] Add ability to communicate with Kibana through service token (elastic#28096) * Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand. * Add changelog entry. * Fix go fmt. * Add username to ASA Security negotiation log (elastic#26975) * Add username to ASA Security negotiation log I added the username user.name field to ASA Security negotiation log line. * adding support for both formats * adding changelog entry * updating geo fields in expected output files * reverse formatting * reverting to older version of file * reverting formatting again * regenrate golden files again * remove formatting, ready for review * fixing missing message due to no newline * fix dissect pattern to fit correctly Co-authored-by: Marius Iversen <[email protected]> * x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (elastic#28325) * Redis: remove deprecated fields (elastic#28246) * Redis: remove deprecated fields * Disable generator tests temporarily (elastic#28362) * Windows/perfmon metricset - remove deprecated perfmon.counters configuration (elastic#28282) * remove deprecated config * changelog * [Filebeat] - S3 Input - Add support for only iterating/accessing only… (elastic#28252) * [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (elastic#28230) * Breaking change for 8.0, namespace_annotations replaced by namespace.annotations * Take care of namespace being nil * [Heartbeat] Setuid to regular user / lower capabilities when possible (elastic#27878) partial fix for elastic#27648 , this PR: Detects if the user is running as root then: Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present Attempts to Setuid , Setgid and Setgroups to that user / groups Invokes setcap to drop all privileges except NET_RAW+ep This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that. With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root. * mage fmt Co-authored-by: kaiyan-sheng <[email protected]> Co-authored-by: Victor Martinez <[email protected]> Co-authored-by: Ugo Sangiorgi <[email protected]> Co-authored-by: Dan Kortschak <[email protected]> Co-authored-by: Arnaud Lefebvre <[email protected]> Co-authored-by: Aleksandr Maus <[email protected]> Co-authored-by: apmmachine <[email protected]> Co-authored-by: Michael Katsoulis <[email protected]> Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: Blake Rouse <[email protected]> Co-authored-by: LaZyDK <[email protected]> Co-authored-by: Marius Iversen <[email protected]> Co-authored-by: Andrea Spacca <[email protected]> Co-authored-by: Mariana Dima <[email protected]> Co-authored-by: Andrew Cholakian <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
@BurntSushi noticed that in Rust regex (rust-lang/regex#779):
(|a)*
matchingaaa
matches the entire text.(|a)+
matchingaaa
matches only the empty string at the start of the text.This is a bit of an unlikely corner case, but it's certainly a significant violation of the rules of regexps for
*
to match more than+
. The correct answer is for(|a)*
to match an empty string too, because each iteration prefers the empty string, so even an infinite number of those matches will never fall back to matching ana
. (This behavior is what Perl and PCRE do.)Go's regexp package and RE2 have the same bug, which ultimately derives from a mismatch between the
e*
picture in my first regexp article and the backtracking prioritization matching in the second article. The implementation ofe*
needs to be a little different to get the match prioritization correct in the case wheree
has a preferred empty match.I found this mismatch between RE2 and Perl in the
e*
case when running Glenn Fowler's regexp test cases against RE2 long ago, but at the time I thought the mismatch was unavoidable in the proper NFA approach. @BurntSushi's observation thate+
handles the empty match correctly proves otherwise. The correct fix is to leave the compilation ofe+
alone and then compilee*
as(e+)?
. (I will add a note to the article as well.)This is a tiny bug fix but may of course result in different matches for certain programs, causing problems for programs that rely on the current buggy behavior. That is always the case for any bug fix, of course. Fixing the bug only in Go would also make Go diverge from RE2 and Rust.
@junyer, @BurntSushi, and I discussed both the potential for breakage and the goal of keeping Go, RE2, and Rust aligned. We decided the potential breakage is minimal and outweighed by the benefits of fixing the bug, to better match Perl and PCRE, and to reestablish the properties that
e*
never matches more thane+
and thate+
andee*
are always the same. We agreed to make the change in all of our implementations./cc @charlesmunger and @adonovan for RE2/J as well.
The text was updated successfully, but these errors were encountered: