-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple pattern blows out JIT stack #156
Comments
This semantically equivalent regex seems to behave OK:
|
Which release of PCRE? And what is the content of the strings you are matching? (Do they match or not match?) Using the current 10.40 release I do not see a slowdown with a string consisting of "abcde" repeated many times, but I do see the JIT stack overflow when the repeat is more than 272. The JIT maintainer may wish to comment. |
Many releases of PCRE, including HEAD. The content matches. Please see this following commit, to PCRE2’s |
This looks like a * * test, where every character can be matched. The |
I don't think there's anything special about this in the interpreter. (But my memory isn't what it used to be.) |
Regex engines have different engine types: PCRE2 is more like a classic programming language, where you define the operations. For example, if you implement a bubble sort in C, and quicksort in JS, you cannot say C is slow just because JS is faster in this case. |
@zherczeg, is there no special case in the backtracking engine for a pattern like If there is no such special case currently, do you think adding one would be interesting, and do you have any advice about how to do so? |
In particular, call the case I care about
Given this, as soon as the |
These are static compiler optimizations, and there is a project for that: If you implement your suggested optimization there, it can rewrite: The key thing: it does not matter which engine you use, it runs faster, because, as you said, the engine can throw away the backtracking info. The generic use case: you have your patter set, you use repan at compile time to generate their optimized version, and use them at runtime. In this case doing complex regex optimizations is basically free. |
I have simplified the problem reproduction method. I can also reproduce the problem by using the following method.
In addition, I found that when the string length is 1023, no error is reported, but when the string length is 1024, the error is reported. It looks like 1024 is the threshold. |
The default stack is 32K, you should use the jit stack to allocate more. But the pattern is still inefficient. If you make a bubble sort in C, and it is slow, don't blame the compiler. |
Yes, I tried to use the jitstack parameter and it didn't report an error.
I tried the kohler/pcre2@c142ad0 case and added jitstack to the regular expression. No error was reported. Can we consider adding a reminder log: the stack space used by default is not enough, consider using heap space through jitstack. |
Hi! Thanks so much for this project.
The following relatively simple pattern has unexpected bad behavior on JIT:
On long strings, non-JIT PCRE performs much faster than JIT, and on an ~8192-character string, PCRE overflows the default JIT stack size.
Failing test available here: kohler@c142ad0
The text was updated successfully, but these errors were encountered: