-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp: support lookaheads and lookbehinds #18868
Comments
There have been discussions on this before, see https://groups.google.com/forum/#!topic/golang-nuts/7qgSDWPIh_E. The consensus from @rsc and @ianlancetaylor there is that it hasn't been done in an efficient way yet. I don't know if there have been any other developments. |
One possible idea which I'm not sure has been suggested yet is to have multiple read-heads over the input. It is well known that a finite automata with two read-heads has more language-recognition power than a normal FA (for instance -- matching the language 0^n 1^n). In this case, if we wanted to stream the input, we assume the read-heads can only move left-to-right. Also, by increasing the number of read-heads linearly with the number of lookaround assertions, we can handle nested lookarounds. In order to process a lookbehind assertion, we start stepping an additional read-head from the beginning of the input, and if the assertion is accepted by the time the two heads meet, we continue stepping the original read-head. In order to process a lookahead assertion, we move the starting position of an additional read-head to match the position of the original. Then if the assertion is accepted, we switch back to the original head. |
I believe that lookaround assertions can be supported "efficiently" (i.e., with a one-time scan of the input) with an NFA. My colleague, Joel Uckelman, has done a substantial amount of work support lookaround assertions in lightgrep on an unfinished branch. I'm hoping we can pick it up again in the new year. The beauty of NFAs is that if you need to track more things, you just add more states. They aren't fast, though. Regexp engines vary quite a bit in their support for types of lookaround assertions. Providing less than fully regular support may still be reasonable and useful. |
Experience with RE2 has shown that it's OK - not ideal but OK - to live without these. I think it makes more sense for Go to continue to match RE2 than to try to strike out on its own here. And on top of that I still don't know how to implement them efficiently, despite thinking about this for over a decade. |
What version of Go are you using (
go version
)?Go version 1.7.4
What operating system and processor architecture are you using (
go env
)?Linux (Ubuntu and Debian)
What did you do?
We are using telegraf software and the logparser of this tools looks to be impacted by the non support of regex lookaheads or lookbehinds on this version of GO.
You can find more details here :
influxdata/telegraf#2178
influxdata/telegraf#2302
influxdata/telegraf#2343
Do you know if there is already a workaround this request ?
The text was updated successfully, but these errors were encountered: