You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank you for taking the time to make arm support possible :)
Second, I have found a case where vectorscan reports a false positive match on ARM aarch64. The same input does not produce a false positive in the original hyperscan on x64.
I have isolated a very small reproducible example with 2 input regexes and a couple of bytes of corpus text. The text that is scanned is: xxxxxxxxxx?y\nTEXT12345xxxxxxxxxxxx
whereas the two regexes are:
^x\\z*x
y\\z*TEXT12345
The single match is reported as follows:
Match id: 1
Ending position of match: 23
Matched pattern: y\\z*TEXT12345
Input from 0 to 23: xxxxxxxxxx?y\nTEXT12345
As far as I know, this should not match.
What I think could help is that the two regexes only produce a match if compiled without the flag HS_FLAG_SOM_LEFTMOST (this is why I only report the ending position of the match). For example, in my tests I was using flags HS_FLAG_DOTALL | HS_FLAG_MULTILINE, but the moment you include HS_FLAG_SOM_LEFTMOST, the match is no longer falsely reported.
Furthermore, if I remove e.g. one or more x chars from the end of the input string (even though these are not matched), then the match is no longer reported. Same with the x chars at the beginning. I know this is a strange example but it comes from a much larger dataset of inputs and this is the smallest I could pinpoint. Also note that if compiling the regexes individually, none of them produce matches.
The self-contained code of the example (notice the multiple backslashes for the escaping character \\\\):
I compiled with g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 on x64 and gcc10-g++ (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1) on aarch64. Ragel version is Ragel State Machine Compiler version 6.10 March 2017 for both.
I noticed there was also a recent post with a similar problem here and that maybe this PR fixes the problem. I can try rerunning the test when the PR is merged.
Let me know if there is anything else I can provide. Thank you for your time.
The text was updated successfully, but these errors were encountered:
Hello,
First, thank you for taking the time to make arm support possible :)
Second, I have found a case where vectorscan reports a false positive match on ARM aarch64. The same input does not produce a false positive in the original hyperscan on x64.
I have isolated a very small reproducible example with 2 input regexes and a couple of bytes of corpus text. The text that is scanned is:
xxxxxxxxxx?y\nTEXT12345xxxxxxxxxxxx
whereas the two regexes are:
The single match is reported as follows:
1
23
y\\z*TEXT12345
xxxxxxxxxx?y\nTEXT12345
As far as I know, this should not match.
What I think could help is that the two regexes only produce a match if compiled without the flag
HS_FLAG_SOM_LEFTMOST
(this is why I only report the ending position of the match). For example, in my tests I was using flagsHS_FLAG_DOTALL | HS_FLAG_MULTILINE
, but the moment you includeHS_FLAG_SOM_LEFTMOST
, the match is no longer falsely reported.Furthermore, if I remove e.g. one or more
x
chars from the end of the input string (even though these are not matched), then the match is no longer reported. Same with thex
chars at the beginning. I know this is a strange example but it comes from a much larger dataset of inputs and this is the smallest I could pinpoint. Also note that if compiling the regexes individually, none of them produce matches.The self-contained code of the example (notice the multiple backslashes for the escaping character
\\\\
):I compiled with
g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
on x64 andgcc10-g++ (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)
on aarch64. Ragel version isRagel State Machine Compiler version 6.10 March 2017
for both.I noticed there was also a recent post with a similar problem here and that maybe this PR fixes the problem. I can try rerunning the test when the PR is merged.
Let me know if there is anything else I can provide. Thank you for your time.
The text was updated successfully, but these errors were encountered: