-
Notifications
You must be signed in to change notification settings - Fork 452
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Finally, if a regex is just `foo|bar|baz|...|quux`, we will now use plain old Aho-Corasick. The reason why we weren't doing this before is because Aho-Corasick didn't support proper leftmost-first match semantics. But since aho-corasick 0.7, it does, so we can now use it as a drop-in replacement. This basically fixes a pretty bad performance bug in a really common case, but it is otherwise really hacked. First of all, this only happens when a regex is literally `foo|bar|...|baz`. Even something like `foo|b(a)r|...|baz` will prevent this optimization from happening, which is a little silly. Second of all, this optimization only kicks in after we've compiled the full pattern, which adds quite a bit of overhead. Fixing this isn't trivial, since we may need the compiled program to resolve capturing groups. The way to do this is probably to specialize compilation for certain types of expressions. Maybe. Anyway, we hack this in for now, and punt on further improvements until we can really re-think how this should all work.
- Loading branch information
1 parent
461673d
commit d7c01cc
Showing
2 changed files
with
106 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters