-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow compilation for big bytes pattern #657
Comments
Thanks. It would be useful to just get the actual regex next time. I had to spend a bit of time making it look nicer:
|
This fixes a pretty bad performance bug in the NFA compiler. In particular, c_char was implemented by diverting to c_class, which is correct, but rather costly to do for every single character in a regex. This causes way more things than necessary to go through the class compilation infrastructure, which includes the suffix caching. We fix this by just special casing c_char. This speeds up regex compilation in #657 by around 30%. Fixes #657
Once I profiled it, the problem was pretty clear: the compiler was using the class compilation infrastructure to handle even plain literal characters. In big regexes like this, that adds up quite a bit. I opened #658 to fix that. In my test, it got about a 30% improvement, so I think that should narrow the gap considerably for this particular case. Once that was fixed, I didn't see any other obvious hotspots that were easy to fix. Interestingly, a good chunk of the time was actually spend in the parsing and translation steps. I haven't spent much (if any) time optimizing that, so that may be the next place to go. I ported your benchmark to |
This fixes a pretty bad performance bug in the NFA compiler. In particular, c_char was implemented by diverting to c_class, which is correct, but rather costly to do for every single character in a regex. This causes way more things than necessary to go through the class compilation infrastructure, which includes the suffix caching. We fix this by just special casing c_char. This speeds up regex compilation in #657 by around 30%. Fixes #657
The fix for this is on crates.io in |
I am seeing the same performance improvement on my side.
Ok, thank you for the quick fix. I will be keeping my eye on new releases of
|
As promised in #524, I'm opening this issue because I found a difference in performance between
Re2
(with custom bindings) andregex
when compiling the regex itself.Here is the example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eb83255739973d40d16997574ba9dd35
These are the exact bytes that are constructed after parsing Netbeans' old
.hgignore
and this is the exact way I am using them.regex
takes about 5ms to build the regex whileRe2
takes about 1ms on a very stable, 4 core machine.I cannot offer you an easy reproduction for
Re2
, however should you want to try to see the difference within Mercurial I can help.I am using
rustc
version1.34.2
andregex
version1.3.5
on a Linux amd64 machine.Note: I've also seen a small performance difference when using
is_match
in Mercurial's working directory traversal code, but that is a separate issue probably.The text was updated successfully, but these errors were encountered: