-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex contention is way too high #1066
Comments
Duplicate of #934.
It already has a couple of paragraphs dedicated to it in the performance section of the docs: https://docs.rs/regex/latest/regex/#sharing-a-regex-across-threads-can-result-in-contention It even provides a suggestion for how to work around it. It's unclear whether that work-around is suitable for your case.
There are ideas discussed in #934. Needless to say, the problem is not so simple that a single sentence can appropriately describe a solution for it.
What "other regexes" work without mutable state? PCRE2 doesn't. RE2 doesn't. Most general purpose regex engines require some kind of mutable space to write to during searches. It is possible to implement a subset of the You aren't yelling at clouds. It's a real issue. It's just tricky to solve and hasn't bubbled to the top of my priority list. Also because the crate docs explicitly mention a work-around. You can also drop down to using the meta regex engine directly, which then permits you to use APIs that require explicitly passing the mutable scratch space. That way, you can decide how synchronization happens. |
I mean without mutable state shared between invocations. I used regexes in Java previously, and AFAIU they don't do that. And they seem to work fine, at least for common use cases. I'm not 100% it does not share mutable state between invocations, but we used regexes extensively and I think we would have noticed if regexes permanently allocated too much memory. |
Java uses backtracking and is subject to ReDoS: https://github.com/BurntSushi/rebar#cloud-flare-redos The entire point of this crate is that it doesn't use backtracking and is generally not susceptible to ReDoS: https://docs.rs/regex/latest/regex/#untrusted-input And if Java isn't using any mutable space during a search, then since it's a backtracker, that likely implies it's using recursion. Which now means you're susceptible to stack overflow. Just because you haven't hit it doesn't mean it isn't an issue. Lots of folks haven't hit the contention issue you hit here for example. Have you tried any of the work-arounds I've suggested? |
Previously in Java, and now in Rust I work on internal products, so I didn't care at all. But I understand how this could be an issue for other uses.
It depends on what machine you are on. I had another similar issue (with our internal library) which worked fine on laptop, but on a large server the program spent 7% of time spinning the locks. The issue with contention is that it usually either insignificant or very significant, no middle state. Because when there's no contention, lock/unlock is 10ns. But if there is contention, lock/unlock is roughly 1us, 100 times slower. So when the number of thread increases, at some point overhead of contention increases dramatically. Like the program works fine with 25 threads, and on 26 threads it starts working slower by 30% (these are not real numbers). But you probably know that.
Not yet. In the place where the problem is severe, regex is just too simple, split the string by whitespaces, easier to rewrite C style. But there are other places. Anyway, thank you for the help and detailed explanations! |
Summary: I was doing typechecker, and since some recent changes (likely after D48295588), typechecker became much slower. Instead of 7% overhead, it is 50% overhead. I started looking at profiler. ``` 25.45% buck2-rt [kernel.kallsyms] [k] queued_spin_lock_slowpath ``` Or the same from strobelight: {F1072524306} ([1](https://fburl.com/strobelight/fihizjz9) [2](https://fburl.com/strobelight/8a55kesp)) Obtaining types using documentation is suboptimal. But I need to make it fast now to continue working on typechecker. (Also reported to [regex issue tracker](rust-lang/regex#1066), but they are probably aware of it). (We need to figure out how to work with regular expressions). Reviewed By: iguridi Differential Revision: D48496022 fbshipit-source-id: 5d8113ca014208ab5d1db1750f83792d02462cc5
Summary: I was doing typechecker, and since some recent changes (likely after D48295588), typechecker became much slower. Instead of 7% overhead, it is 50% overhead. I started looking at profiler. ``` 25.45% buck2-rt [kernel.kallsyms] [k] queued_spin_lock_slowpath ``` Or the same from strobelight: {F1072524306} ([1](https://fburl.com/strobelight/fihizjz9) [2](https://fburl.com/strobelight/8a55kesp)) Obtaining types using documentation is suboptimal. But I need to make it fast now to continue working on typechecker. (Also reported to [regex issue tracker](rust-lang/regex#1066), but they are probably aware of it). (We need to figure out how to work with regular expressions). Reviewed By: iguridi Differential Revision: D48496022 fbshipit-source-id: 5d8113ca014208ab5d1db1750f83792d02462cc5
Documentation says:
That is put mildly. I was investigating why the program was very slow, and noticed, it spends 25% time in kernel in
queued_spin_lock_slowpath
. It turns out it was regex synchronization. In profiler it looks like this:These syscalls are mutexes by
Regex
.It is is not like all the threads are doing regexes, they actually do other things too, and they still managed to create this contention.
It was on the server with 80 hardware thread. In this case contention is much much larger issue than for example on laptop with 8 threads.
What I can suggest:
Anyway, this is just
so feel free to close the issue. I'm rewriting some parts of code without
regex
crate to unblock my work, and thinking about this issue in background.The text was updated successfully, but these errors were encountered: