Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RE2 support #116

Closed
Aloso opened this issue Nov 27, 2024 · 4 comments
Closed

RE2 support #116

Aloso opened this issue Nov 27, 2024 · 4 comments
Labels
C-flavors Issues about adding or modifying flavors enhancement New feature or request
Milestone

Comments

@Aloso
Copy link
Member

Aloso commented Nov 27, 2024

any chance for RE2 support? (go and DuckDB for instance).
I'm using pomsky to generate regexes (w/o the features RE2 lacks) so testing with PCRE is equivalent but having this checked off during tests would be nice.

Originally posted by @fundef1 in #112 (comment)

@Aloso
Copy link
Member Author

Aloso commented Nov 27, 2024

@fundef1 let's discuss it here, since the other issue is only about testing.

@Aloso
Copy link
Member Author

Aloso commented Nov 27, 2024

RE2 is pretty similar to Rust (in that it doesn't support advanced features like lookaround assertions or backreferences, which can have exponential runtime performance).

From the documentation:

  • The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
  • \w,\d, \s and \b are not Unicode aware (like in JavaScript)
  • Supported Unicode properties:
    • General categories (but not LC)
    • Scripts

So there will be some restrictions:

  1. To use [word] or %, you need to disable unicode;
  2. < and > are unsupported
  3. No binary Unicode properties, such as Alphabetic or Emoji
  4. No Unicode script extensions
  5. No lookaround, recursion, backreferences, or atomic groups
  6. No repetition with a bound bigger than 1000
  7. No Grapheme

There might be more, which I will find out once I test RE2 more thoroughly.

@fundef1
Copy link

fundef1 commented Nov 28, 2024

Would be great if you could include explicit RE2 support for generating regexes.
I'm already using pomsky with RE2, using the subset/avoiding the limitations you indentified, so no issue there.

@Aloso
Copy link
Member Author

Aloso commented Nov 28, 2024

Implemented in a157149

@Aloso Aloso closed this as completed Nov 28, 2024
@Aloso Aloso added C-flavors Issues about adding or modifying flavors enhancement New feature or request labels Nov 30, 2024
@Aloso Aloso added this to the v0.12 milestone Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-flavors Issues about adding or modifying flavors enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants