-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 3 structure-aware fuzzers #848
Conversation
I was using fuzz_target!(|data: (&str, &str, [bool; 7])| {
let (
pattern,
input,
[case_insensitive, multi_line, dot_matches_new_line, swap_greed, ignore_whitespace, unicode, octal],
) = data;
let r = regex::RegexBuilder::new(pattern)
.case_insensitive(case_insensitive)
.multi_line(multi_line)
.dot_matches_new_line(dot_matches_new_line)
.swap_greed(swap_greed)
.ignore_whitespace(ignore_whitespace)
.unicode(unicode)
.octal(octal)
.dfa_size_limit(1024)
.size_limit(1024)
.build();
if let Ok(re) = r {
re.is_match(input);
}
}); Looks like the only option that shows up as an option in the RegexBuilder that doesn't show up in the flags is octal. Might be worth fuzzing that option as well? |
Thanks so much! I am not sure if I am a fan of adding dependencies to I'd really strongly prefer that we don't introduce new dependencies to |
@5225225, I saw your PR earlier (it was actually what inspired me to write this one!) and would be happy to add some of the testing capabilities yours demonstrates.
@BurntSushi, let me see what I can do. 🙂 |
@BurntSushi seems that this option isn't really available: rust-fuzz/cargo-fuzz#256 Unfortunately, it would be necessary to make it a feature. However, it is possible to test if it's being used in a fuzzer environment, so we can set up something like the following: #[cfg(any(all(feature = "fuzzer", not(fuzzing)), all(feature = "arbitrary", not(fuzzing))))]
compile_error!("You cannot enable the fuzzer feature without fuzzing!"); This prevents people from pulling in the fuzzer features or arbitrary without fuzzing. Does this suffice? |
@VTCAKAVSMoACE with this fuzzers how long it takes to find cve-2022-24713? I'm tested locally from your repo I can't trigger the crash. I have 50ish corpus and using regex dictionary |
Sorry for the late response. You need to set a timeout of a second or two, and you do not want to use a corpus. |
I've given this some thought, and I'm just not okay with adding a public dependency on |
I'm going to close this out for largely similar reasons as stated here: #959 (comment) |
Hm, this seems a bit of a shame to me. Grammar fuzzing discovers hard-to-find bugs very fast in regex (e.g. reproducing the ReDoS in <1 minute); it seems a bit more than convenient testing. I'll think about alternative implementation strategies here since I think it's important, but I don't have much time to do this at the moment. I will try to nerd snipe some my friends into this 😉 |
Yeah I agree it's a shame. And honestly, my suggested work-around sucks. It blows chunks. I don't like it and I don't want it, haha. This really does feel like a flaw in the I'll see if I can open up an issue with them and see if I can get anywhere that way. |
I'm also reconsidering the other work-around suggested, which was mirroring the full |
I've added a request here: rust-fuzz/cargo-fuzz#338 |
I closed the cargo-fuzz issue as unimplementable, but repeating some of my suggestions from there: For this specific case, I'd really just recommend adding an The hack for getting this to work is to instead manually define fuzz targets under |
@Manishearth gotya, thanks. I think the |
Regarding OSS-Fuzz: this is something that you would put into the config file. Quite possible 🙂 |
Ah! There's a lot that I don't know apparently. Haha. |
OK, so I'm going to move forward with just adding
|
So, what do you need from me? 🙂 Rebase and test? |
Ah right, nope, you'll all set. I'm rolling this into #978. (I have other fuzz changes in that PR too, in order to improve coverage across various things.) |
Okay. I think potentially something worthwhile is to also perform differential testing between the rewrite and the old version. This should be fairly quick to set up and I'll create a separate repository for this since it doesn't seem appropriate here. |
@addisoncrump Sounds great. What does it entail? (Or feel free to avoid indulging my curiosity and just show me whenever the new repo is ready. :-)) |
I should be done in like 5 minutes? I'll just push it here since this PR is encapsulated by other changes. |
Aha, yup, this immediately finds some differences. Test with |
Note that currently, it uses generated ASTs from the old implementation. If there are new features present in the rewrite, this won't test them, or it will test them wrong. |
Yeah unfortunately not only are there new features, but there are bug fixes in match semantics too. I'll take a look at some of the failing cases to confirm they fall in those buckets, but this might make differential testing in this way tricky. (I've done my own form of differential non-fuzz testing by ensuring that the old implementation passes the new test suite and that the new implementation passes the old test suite.) The new test suite is much bigger than the old one, and tests each individual internal engine much more thoroughly. It's not fuzz testing, but it's not nothing either. :) |
And the differential fuzzer also just repro'd the old CVE on the new implementation with: |
@addisoncrump Yeah I caught that one a few days ago. I might not have pushed my branch since then. But it's in the new NFA compiler, not parser. Happened for a different reason. |
@addisoncrump Hmmm. Probably the simplest thing is to copy the properties from this section of But that doesn't include each of the general category or script values. For example, neither Otherwise there really isn't anything in One thing worth mentioning is that the AST parser doesn't check whether the thing inside a
|
I'll find a way 😁 But not this evening. |
So, I looked a little further. I believe that using @Manishearth you might also wish to know this; maybe worth documenting in the rust fuzz book. I believe |
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
oops, I just obliterated this while making another PR by accident. Closing, since these changes are already added. |
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
This makes uses of the new 'arbitrary' feature in 'regex-syntax' to make fuzzing much more targeted and complete. Closes #848
It turns out that the way we were dealing with quit states in the DFA was not quite right. Basically, if we entered a quit state and a match had been found, then we were returning the match instead of the error. But the match might not be the correct leftmost-first match, and so, we really shouldn't return it. Otherwise a regex like '\B.*' could match much less than it should. This was caught by a differential fuzzer developed in #848.
These are the fuzzers used to find CVE-2022-24713.
Effectively, it just marks all AST members as Arbitrary when the fuzzer feature is enabled, then generates ASTs which are then converted back to statements in regex. Then, the fuzzer parses the generated string and executes against an input.