-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a lazy DFA. #164
Add a lazy DFA. #164
Conversation
cc @alexcrichton I don't think this PR is possible to review, it's too big. Two things for you I think:
|
Benchmarks, without the DFA and with the DFA:
Notably, no regressions outside of noise:
As a bonus, a comparison with PCRE:
My take is that we are quite competitive now. There are a few regexes where some performance is left on the table, but I think it's an otherwise pretty strong showing! (I think many of the performance differences could be resolved if something like the |
Previous to this PR, the
|
Holy cow, nice work @BurntSushi! Some thoughts:
|
// | ||
// With the above settings, this comes out to ~3.2MB. Mostly these numbers | ||
// With the contants below, this comes out to ~1.6MB. Mostly these numbers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contants -> constants
@alexcrichton For optional
I guess I could put
Hmm. I wouldn't necessarily be opposed, because using it is almost always wrong now. I guess it could still technically be useful if you want to execute a regex without allocating (which limits one to There are also a few crates using it. |
cc @Geal @Manishearth @llogiq @kbknapp (We are talking about possibly removing the |
Could you modify |
@pczarn I don't think we could use |
insts: Vec<Inst>, | ||
bytes: bool, | ||
reverse: bool, | ||
byte_classes: Vec<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byte_classes can be Vec<u8>
.
Yeah, but we also have a Syntax check for dynamic regexes with clippy, so the advantage is somewhat diminished. |
@BurntSushi about pcre ah oh well, so long as the CI passes on Windows seems fine to me! |
@BurntSushi nice work! And thanks for the heads up! |
A lazy DFA is much faster than executing an NFA because it doesn't repeat the work of following epsilon transitions over and and over. Instead, it computes states during search and caches them for reuse. We avoid exponential state blow up by bounding the cache in size. When the DFA isn't powerful enough to fulfill the caller's request (e.g., return sub-capture locations), it still runs to find the boundaries of the match and then falls back to NFA execution on the matched region. The lazy DFA can otherwise execute on every regular expression *except* for regular expressions that contain word boundary assertions (`\b` or `\B`). (They are tricky to implement in the lazy DFA because they are Unicode aware and therefore require multi-byte look-behind/ahead.) The implementation in this PR is based on the implementation in Google's RE2 library. Adding a lazy DFA was a substantial change and required several modifications: 1. The compiler can now produce both Unicode based programs (still used by the NFA engines) and byte based programs (required by the lazy DFA, but possible to use in the NFA engines too). In byte based programs, UTF-8 decoding is built into the automaton. 2. A new `Exec` type was introduced to implement the logic for compiling and choosing the right engine to use on each search. 3. Prefix literal detection was rewritten to work on bytes. 4. Benchmarks were overhauled and new ones were added to more carefully track the impact of various optimizations. 5. A new `HACKING.md` guide has been added that gives a high-level design overview of this crate. Other changes in this commit include: 1. Protection against stack overflows. All places that once required recursion have now either acquired a bound or have been converted to using a stack on the heap. 2. Update the Aho-Corasick dependency, which includes `memchr2` and `memchr3` optimizations. 3. Add PCRE benchmarks using the Rust `pcre` bindings. Closes #66, #146.
@pczarn Thanks for the review! I've made both of your suggested changes. Nice catch! I've also separated the PCRE benchmarks into their own sub-crate. We'll run them on Travis but not AppVeyor. |
It was not used in stable at all, because it only works in nightly. Now that regex! is almost always slower¹ there's no reason to keep it in. ¹: rust-lang/regex#164
It was not used in stable at all, because it only works in nightly. Now that regex! is almost always slower¹ there's no reason to keep it in. ¹: rust-lang/regex#164
There might be a gap in the logic of assuming regex_macros to be "basically always slower". Benchmarks only take into account the cost of executing already constructed automata. The automata construction isn't benchmarked. But keeping the compiled automata in some static variable or field isn't always convenient. Now, I haven't benchmarked it myself, but from a recent reddit thread numbers on regex performance I'm pretty sure On a different note I'd like to point that the benchmarks probably only compare the new lazy DFA engine with the default PCRE engine. But the default PCRE engine isn't the fastest engine around. If we're serious about performance, then the JIT PCRE engine should be accounted for. And |
AIUI |
Wow, this is great! Nice work, @BurntSushi. |
@ArtemGr I don't think anyone have claimed that. The implementation just happens to be slow in this case. |
The benchmarks speak for themselves. :-)
The automata construction is benchmarked. I've even taken a profiler to it to improve compile times. It is of course true that compilation is not benchmarked in the benchmarks for searching text, because that seems really strange. It is of course true that
Could you explain more? I'm not sure I understand. Is there any particular reason why
I may have made a mistake benchmarking PCRE, but neglecting the JIT is certainly not one of them. I am, in fact, serious about performance! You can check out how PCRE regexes are constructed for the benchmarks here: https://github.com/rust-lang-nursery/regex/blob/master/benches/bench_pcre.rs#L54-L71 --- If I'm doing anything wrong, I would like to correct it. And I have even made sure that the PCRE bindings are really enabling the JIT too. You can see the benchmarks before/after for just plain PCRE and PCRE w/ JIT:
We are already beating JIT PCRE now on many of the micro benchmarks. But yes, certainly something like Ragel should probably be in the To be clear: I welcome improvements to the benchmark suite. I wrote many of them (not all), so the suite is likely biased in favor of regexes that are better executed by this library. A possibly better methodology would be to grep source code for use of regexes and use those instead. That is however a lot of work, especially since performance can vary greatly based on the input, which is harder to capture from real world usage. |
That is actually my favorite feature of the macro. Forcing the regex to be checked at compile time removes an error check I need to handle in my code, even it it's just with |
P.S. One might use
Looks good! |
@ArtemGr Thanks for responding. There's no question that |
Cool, next time I peruse the regex docs I'll watch out for any place that could be improved. |
So, after this DFA update my regex replace operation no longer works. Example code:
This code is to work around an issue with rust-bindgen by replacing extra generated bitfields. After updating to the latest regex crate, this regex no longer works (no matches are found). Please advise. Thanks. |
@jnicholls Probably best to file a new issue. Could you also show some text that should be matched? Thanks. |
@BurntSushi Thanks, #169 created. |
Since rust-lang/regex#164 the "dynamic" regex generated is faster than what the `regex!` macro produces. Still, to avoid the overhead of recreating the regex everytime, we use lazy_static to initialize it once at first use
A lazy DFA is much faster than executing an NFA because it doesn't
repeat the work of following epsilon transitions over and and over.
Instead, it computes states during search and caches them for reuse. We
avoid exponential state blow up by bounding the cache in size. When the
DFA isn't powerful enough to fulfill the caller's request (e.g., return
sub-capture locations), it still runs to find the boundaries of the
match and then falls back to NFA execution on the matched region. The
lazy DFA can otherwise execute on every regular expression except for
regular expressions that contain word boundary assertions (
\b
or\B
). (They are tricky to implement in the lazy DFA because they areUnicode aware and therefore require multi-byte look-behind/ahead.)
The implementation in this PR is based on the implementation in Google's
RE2 library.
Adding a lazy DFA was a substantial change and required several
modifications:
NFA engines) and byte based programs (required by the lazy DFA, but possible
to use in the NFA engines too). In byte based programs, UTF-8 decoding is
built into the automaton.
Exec
type was introduced to implement the logic for compilingand choosing the right engine to use on each search.
track the impact of various optimizations.
HACKING.md
guide has been added that gives a high-leveldesign overview of this crate.
Other changes in this commit include:
recursion have now either acquired a bound or have been converted to
using a stack on the heap.
memchr2
andmemchr3
optimizations.pcre
bindings.Closes #66, #146.