-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE2 JIT crash in 10.38 and 10.42 #180
Comments
when using pcre2 >= 10.34, it is recommended to add |
Thanks. A useful suggestion indeed. But I don't recall reading the PCRE2 documentation that |
Agree it is not obvious, but it is at least implied by the documentation when it alludes to the fact that the code expects the subject to be valid UTF and even warns of the possibility of crashes, and also points to that flag as the only way to search within binary files safely (see comment at the end of the page). |
Can we close this issue? |
Note: |
I would believe that others will run into this problem too eventually, especially if this was changed since 10.34 (I believe, based on comments). This is a situation where the default settings should be safe, rather than the other way round. Better be safe than sorry, as they say. If people want to search UTF-encoded data they fully understand and control/own, then it makes sense NOT to use
For the ugrep search tool with option PS. if adding a flag is something to contemplate, then the name of the flag is critical to avoid confusion. So something like |
Historically PCRE2 only supported valid utf strings. Supporting invalid utf is a relatively new feature. Hence it is hidden behind a flag for compatibility. |
Also, note that you're using When using |
Understood. But I have to nitpick a bit about this though: note that
I understand. But if this pass finds invalid UTF then this returns an error code, which is not what we want. Grepping is more of a "brute force thing" perhaps. |
Indeed, the pcre2_jit_match page only says:
...but the consequence of that isn't mentioned. The pcre2jit page gives more details:
The pcre2api page is the most explicit about it:
I suppose this scary warning should be added to the other pages as well. |
I have made a note to update the documentation in due course. |
I have increased the scary level in several pages. |
The regex pattern
[\w-]+@([\w-]+.)+[\w-]+
causes JIT to crash when matching the pattern against some "binary" data as input.This was first reported here: Genivia/ugrep#241
I isolated the relevant code in a small C++ file to assist, see attached poc.cpp in the poc.zip. The input file with "binary" data archive2.tgz is also attached in the poc.zip.
The crash is observed with PCRE2 versions 10.38 and 10.42 using UTF-8 matching with code unit width 8. It appears that
PCRE2_JIT_PARTIAL_HARD
is a potential cause. I've tested in MacOS M1, Android ARM, MacOS Intel. All crash in the JIT code. Matching without JIT works fine.Hopefully this can be fixed.
poc.zip
The text was updated successfully, but these errors were encountered: