Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex contention is way too high #1066

Closed
stepancheg opened this issue Aug 19, 2023 · 4 comments
Closed

Regex contention is way too high #1066

stepancheg opened this issue Aug 19, 2023 · 4 comments

Comments

@stepancheg
Copy link

stepancheg commented Aug 19, 2023

Documentation says:

If you share the same Regex across multiple threads, each thread still gets its own mutable space, but accessing that space is slower.

That is put mildly. I was investigating why the program was very slow, and noticed, it spends 25% time in kernel in queued_spin_lock_slowpath. It turns out it was regex synchronization. In profiler it looks like this:

image

These syscalls are mutexes by Regex.

It is is not like all the threads are doing regexes, they actually do other things too, and they still managed to create this contention.

It was on the server with 80 hardware thread. In this case contention is much much larger issue than for example on laptop with 8 threads.

What I can suggest:

  • add a big warning that contention is big issue when the concurrent access is high, especially for simple regexes
  • use different synchronization primitives (some kind of lock-free array for example, so each thread would be scanning for a pooled entry from different position)
  • I would say, implement an option to not use caches at all. Other regexes seems to do fine without mutable state. This should also decrease memory usage. But I don't understand caching in regexes well.

Anyway, this is just

image

so feel free to close the issue. I'm rewriting some parts of code without regex crate to unblock my work, and thinking about this issue in background.

@BurntSushi
Copy link
Member

Duplicate of #934.

add a big warning that contention is big issue when the concurrent access is high, especially for simple regexes

It already has a couple of paragraphs dedicated to it in the performance section of the docs: https://docs.rs/regex/latest/regex/#sharing-a-regex-across-threads-can-result-in-contention

It even provides a suggestion for how to work around it. It's unclear whether that work-around is suitable for your case.

use different synchronization primitives (some kind of lock-free array for example, so each thread would be scanning for a pooled entry from different position)

There are ideas discussed in #934. Needless to say, the problem is not so simple that a single sentence can appropriately describe a solution for it.

I would say, implement an option to not use caches at all. Other regexes seems to do fine without mutable state. This should also decrease memory usage. But I don't understand caching in regexes well.

What "other regexes" work without mutable state? PCRE2 doesn't. RE2 doesn't. Most general purpose regex engines require some kind of mutable space to write to during searches.

It is possible to implement a subset of the regex crate functionality (i.e., no capture groups) using fully compiled DFAs (and you could even do this by using the regex-automata crate directly), but it would be wildly inappropriate to do. It would technically eliminate contention, but memory usage and regex compile times would increase by multiple orders of magnitude. If went that route, I would get immediate and justified bug reports talking about the latest regex release led to major regressions and that they have to pin to an older version until it's fixed.

You aren't yelling at clouds. It's a real issue. It's just tricky to solve and hasn't bubbled to the top of my priority list. Also because the crate docs explicitly mention a work-around. You can also drop down to using the meta regex engine directly, which then permits you to use APIs that require explicitly passing the mutable scratch space. That way, you can decide how synchronization happens.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2023
@stepancheg
Copy link
Author

What "other regexes" work without mutable state?

I mean without mutable state shared between invocations. I used regexes in Java previously, and AFAIU they don't do that. And they seem to work fine, at least for common use cases.

java.util.regex.Pattern

I'm not 100% it does not share mutable state between invocations, but we used regexes extensively and I think we would have noticed if regexes permanently allocated too much memory.

@BurntSushi
Copy link
Member

Java uses backtracking and is subject to ReDoS: https://github.com/BurntSushi/rebar#cloud-flare-redos

The entire point of this crate is that it doesn't use backtracking and is generally not susceptible to ReDoS: https://docs.rs/regex/latest/regex/#untrusted-input

And if Java isn't using any mutable space during a search, then since it's a backtracker, that likely implies it's using recursion. Which now means you're susceptible to stack overflow. Just because you haven't hit it doesn't mean it isn't an issue. Lots of folks haven't hit the contention issue you hit here for example.

Have you tried any of the work-arounds I've suggested?

@stepancheg
Copy link
Author

Which now means you're susceptible to stack overflow.

Previously in Java, and now in Rust I work on internal products, so I didn't care at all. But I understand how this could be an issue for other uses.

Lots of folks haven't hit the contention issue you hit here for example.

It depends on what machine you are on. I had another similar issue (with our internal library) which worked fine on laptop, but on a large server the program spent 7% of time spinning the locks.

The issue with contention is that it usually either insignificant or very significant, no middle state. Because when there's no contention, lock/unlock is 10ns. But if there is contention, lock/unlock is roughly 1us, 100 times slower. So when the number of thread increases, at some point overhead of contention increases dramatically. Like the program works fine with 25 threads, and on 26 threads it starts working slower by 30% (these are not real numbers). But you probably know that.

Have you tried any of the work-arounds I've suggested?

Not yet. In the place where the problem is severe, regex is just too simple, split the string by whitespaces, easier to rewrite C style. But there are other places.

Anyway, thank you for the help and detailed explanations!

facebook-github-bot pushed a commit to facebook/buck2 that referenced this issue Aug 23, 2023
Summary:
I was doing typechecker, and since some recent changes (likely after D48295588), typechecker became much slower.

Instead of 7% overhead, it is 50% overhead.

I started looking at profiler.

```
    25.45%  buck2-rt         [kernel.kallsyms]     [k] queued_spin_lock_slowpath
```

Or the same from strobelight:

{F1072524306}

([1](https://fburl.com/strobelight/fihizjz9) [2](https://fburl.com/strobelight/8a55kesp))

Obtaining types using documentation is suboptimal. But I need to make it fast now to continue working on typechecker.

(Also reported to [regex issue tracker](rust-lang/regex#1066), but they are probably aware of it).

(We need to figure out how to work with regular expressions).

Reviewed By: iguridi

Differential Revision: D48496022

fbshipit-source-id: 5d8113ca014208ab5d1db1750f83792d02462cc5
facebook-github-bot pushed a commit to facebook/starlark-rust that referenced this issue Aug 23, 2023
Summary:
I was doing typechecker, and since some recent changes (likely after D48295588), typechecker became much slower.

Instead of 7% overhead, it is 50% overhead.

I started looking at profiler.

```
    25.45%  buck2-rt         [kernel.kallsyms]     [k] queued_spin_lock_slowpath
```

Or the same from strobelight:

{F1072524306}

([1](https://fburl.com/strobelight/fihizjz9) [2](https://fburl.com/strobelight/8a55kesp))

Obtaining types using documentation is suboptimal. But I need to make it fast now to continue working on typechecker.

(Also reported to [regex issue tracker](rust-lang/regex#1066), but they are probably aware of it).

(We need to figure out how to work with regular expressions).

Reviewed By: iguridi

Differential Revision: D48496022

fbshipit-source-id: 5d8113ca014208ab5d1db1750f83792d02462cc5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants