-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I test different code branches using CIFuzz? #7479
Comments
I believe CIFuzz will always use the branch cloned in the Dockerfile for fuzzing (i.e. the main branch). @jonathanmetzman is this an area where ClusterFuzzLite may help? |
As far as I can remember CFLite isn't fully compatible with multiple branches due to some GitHub limitations: #7146 (comment) |
Having said that I think it's possible to make CFLite use different branches on push events (up until the point where it should download the latest builds, coverage reports and so on) so it seems it can replace CIFuzz in this particular case. |
Well, it would also be desirable for us to run full OSS-Fuzz on multiple branches simultaneously, so perhaps a more general enhancement to OSS-Fuzz would be warranted. I was told that CIFuzz would allow me to fuzz-test PRs and arbitrary branches, which is the main reason why I integrated with it, but now it appears that that is not the case. libjpeg-turbo, which is used in Chrome and Android and many other places, employs a “main-is-always-stable” branch model, so next-gen releases are developed in the dev branch. Without the ability to fuzz-test that branch, new features will not ever get fuzz-tested until the next beta, which may be months or years from when the feature was pushed. I also maintain stable branches for past release series, so it would be nice to be able to fuzz-test one or more of those. I would be satisfied with using CIFuzz on the dev branch until it is stabilized and merged down into the main branch, but something with “Lite” in the name is not exactly confidence-inspiring for a project with billions of users. All I really need is a way to instruct OSS-Fuzz via CIFuzz to use a particular code directory or branch when building the fuzzers. |
Yeah evgeny I'm not sure why the limitation you pointed out can't be worked around by putting the name of the target branch in the artifact name so that only artifacts from the desired branch are downloaded. |
We can consider it, but this creates complexity for us that we'd prefer to avoid.
Let me do some testing today and tomorrow and figure out how to get this working. |
I think OSS-Fuzz kind of supports that in the sense that it's possible to build fuzz targets using various branches and put them in
I think it should do it. My artifacts tend to be huge though so I avoid uploading them generally anyway.
FWIW I think CIFuzz is easier to use because it's better integrated with OSS-Fuzz. For example to make CFlite look like CIFuzz I had to download the public OSS-Fuzz corpora manually in systemd/systemd#22302. CFLite is more configurable though. |
From a project maintainer's perspective, this stuff is too much of a black box. Rather than rebranding, why don't you just address the limitations in the original tool? From my point of view, CIFuzz works fine but has one limitation. I would strongly prefer to fix or work around that limitation rather than adopt something new that may have drawbacks or other issues. |
Agreed.
I'm not a member of the OSS-Fuzz team so I can't comment on that but as far as I can remember I ended up using CFLite because CIFuzz is tightly coupled with OSS-Fuzz (where multiple branches aren't generally supported and it's expected that fuzz targets and coverage data point to the latest development branches rolling forward). If OSS-Fuzz supported that natively CIFuzz would probably support that too. Until then I use both CIFuzz and CFLite. |
I'll explain better: ClusterFuzzLite is more than just marketing. I said it was marketing because they use the same codebase, but a better description is that they are different interfaces.
Unless it's very easy and requires no maintenance, I don't plan on making the CIFuzz interface more flexible. CIFuzz is rigid by design and trying to add in flexibility is too hacky and hard to maintain. When I wrote ClusterFuzzLite, I didn't think the ClusterFuzzLite interface would be used by OSS-Fuzz projects, so there are a few features OSS-Fuzz users will lose by using ClusterFuzzLite:
I think fixing this shouldn't be so hard and I am happy to do so long as people will use it. I'll see what I can do for this now. |
Unfortunately this oss-fuzz "mode" will lack these features (corpuses can probably be used though) because oss-fuzz doesn't support branches. |
At some point we should probably deprecate the CIFuzz interface. I'm not sure when/if we will do that, but it would reduce maintenance burden and more importantly clear up some confusion for users. |
@jonathanmetzman Thanks for all of the info. I don't mind not having coverage information, because I get that periodically from OSS-Fuzz anyhow. As someone without your experience, it is very difficult for me to evaluate whether the loss of Items 2 and 3 would outweigh the advantages of ClusterFuzzLite. Can you give me more details on the tradeoff? |
Sorry, I described feature 1 very poorly. And now that I've thought more it's probably a non-issue. I'll explain though so you can make an informed decision.
Sure.
If you use CFL in OSS-Fuzz mode, these builds will be of master/HEAD which completely breaks this feature. I think the solution here is to simply turn off this feature for your use case with a flag.
(worth mentioning that for security reasons, the corpus downloaded from OSS-Fuzz is actually 90 days old (longer than our disclosure window) so that we don't release any unfixed crashes). tl;dr I think we can turn off feature 2 and keep features 1 and 3 for you |
@evverx rather me implementing and you using branch-specific artifacts I think the above comment explains why it's more reasonable IMO to just use master's artifacts. Only bug novelty checking is really broken by doing so and it's debatable how useful this feature is. But I can still do so if you want. |
@jonathanmetzman I think I hit all the corner cases with new fuzz targets, broken coverage and so on so I don't use any artifacts apart from public OSS-Fuzz corpora and I think it would be great if it was possible to download them separately.
As long as google/clusterfuzzlite#85 isn't implemented I think it would be useful. |
Great I think #7491 should work for you then as the corpus will be the only feature used for you (since coverage is broken for you, which we should try fixing). I'll add my thoughts on google/clusterfuzzlite#85 |
If I understand this correctly, I could foresee a problem. As I mentioned above, the "killer app" for CIFuzz/CFL in the context of libjpeg-turbo is being able to fuzz-test the dev branch, which is the evolving code base for the next major release. One of the big pushes for future releases of libjpeg-turbo is a complete refactor of the TurboJPEG API, which will make it possible to improve fuzzer coverage and optimize some of the existing fuzzers. (We're currently only covering 60% of lines and 70% of functions, although that's about a 3x improvement relative to a year ago.) So as the dev branch evolves, the coverage of fuzzers in that branch will diverge somewhat from main. I could foresee an issue whereby CFL assumes that the coverage from main applies to dev and decides not to run a fuzzer, when in fact code has been pushed that would affect that fuzzer. It seems to me that there is at least a finite chance that a bug could slip through in that manner, but maybe I'm misunderstanding something. Also, what would happen if the fuzzers change names in the evolving code base?
So one thing that is confusing me is that you seem to be referring to both CIFuzz and CFL in the context of these descriptions, but didn't you say above that these three features aren't available in CFL? That being said, I do not need or want novelty checking. If there is still a crash, I want to be bothered about it until the crash is fixed.
When you say OSS-Fuzz mode, do you mean batch mode?
I am still fuzzy (pun intended) regarding the operational differences between CIFuzz and CFL. CIFuzz can only run on the main branch at the moment, but will I lose some fuzzing functionality for that branch if I switch to CFL? Maybe it would be helpful to compare the modes. How does "code change mode" in CFL compare to the existing CIFuzz functionality? How does "batch mode" in CFL compare to the existing OSS-Fuzz functionality? How do those answers change when we're no longer talking about the main branch? I don't intend to replace OSS-Fuzz for the main branch. I'm just trying to get as much fuzzing as I can get on the other branches. So I really need to have a better understanding of the limitations of CIFuzz compared to OSS-Fuzz and whether CFL has the same or different limitations. |
@dcommander just out of curiosity I wonder how the branches are maintained in the libjpeg-turbo repository? If commits are pushed directly to those branches feature 1 and feature 2 aren't useful there (since they target PRs for the most part in its current form). It was briefly discussed in google/clusterfuzzlite#93 (comment) but there I was just guessing. |
main is always stable, so commits are pushed directly to it (mostly bug fixes.) dev is evolving, so commits are also pushed directly to it (mostly new features) as well as merged from main (the aforementioned bug fixes.) Commits for legacy branches are mostly cherry picked from main. |
Thanks. As far as I can tell the "oss-fuzz" setting @jonathanmetzman proposed in that PR should make it easier to switch to CFLite (because it brings feature 3 with it) but considering that the name of fuzz targets can change (as far as I understand) that feature can't be relied on so it seems the only way to make it all work would be to use "code-change" on commits, download the corpora manually and keep track of the public corpora and their names manually and run it for a few hours instead of 10 minutes. I have to say that it's not exactly maintainable but I did something like that with my kludgy script in evverx/elfutils@0329554 up until the point where that corpus got removed. Anyway it could be I'm missing something so I'd wait for @jonathanmetzman to weigh in here. |
@jonathanmetzman I would also like more information on this. Does this mean that, potentially, a bug that OSS-Fuzz is capable of detecting may go undetected by CIFuzz/CFL for three months? If so, then that seemingly undercuts one of the major arguments of using CIFuzz/CFL (i.e. that using those tools would potentially allow for more timely discovery of bugs than waiting a day or two for a full OSS-Fuzz run.) |
Ah, the changing names will indeed present an issue.
They are currently available in CFL but for users who set up their own continuous builds, coverage jobs, corpuses.
Great.
No sorry, by OSS-Fuzz mode, I mean what I am implementing in #7491 where OSS-Fuzz provides the continuous builds, corpuses and coverage jobs. So instead of needing batch fuzzing (for corpus), continuous builds, and coverage runs, OSS-Fuzz users using CIFuzz will only need a code-change fuzzing workflow and not the many other ones CFL offers.
There are two ways you will lose features:
The tradeoff here is between ease of use and soundness/flexibility. It's easiest if we use OSS-Fuzz but fuzzing other branches won't work perfectly. If we want things to work perfectly CFL provides the same functionality that CIFuzz uses from OSS-Fuzz.
code-change fuzzing in CFL is similar to CIFuzz. The only difference is where the corpus/coverage/old builds come from. In CIFuzz we assume they come from OSS-Fuzz, in CFL we currently assume they come from CFL, but I can change it to get them from OSS-Fuzz too.
CFL does not have these limitations of CIFuzz. With CFL you can set up a OSS-Fuzz(+CIFuzz)-style process for any branch you want, with coverage reports, corpus generation/pruning, and code change fuzzing. I'm going to think about how OSS-Fuzz might be able to accommodate your use case of multiple branches. Maybe we can allow something like putting fuzzers from different branches in the same build. |
Yes, this is absolutely possible. CIFuzz (not CFL, which has batch fuzzing to solve this problem) cannot detect the same bugs as OSS-Fuzz because it runs for much shorter. But the benefit of using CIFuzz is that some of the bugs you would have to
It does somewhat undercut it, but it's an acceptable tradeoff I think. In exchange for being less thorough, CIFuzz finds bugs when they are easier to fix (before/right after they are merged). |
Anyway, I think I will look into solutions to this branch problem, maybe oss-fuzz can allow it, though I'm concerned it sets a bad precedent and with some other issues. |
Can you comment on my other concern? Even if the fuzzer names remain the same, could a bug potentially slip through if different files/functions are covered by "fuzzerX" in dev vs. "fuzzerX" in main? If the worst-case scenario is that CFL runs fuzzers that it doesn't really need to run (i.e. a false positive), then that's probably OK (although, in a time-limited fuzzing scenario, that could cause bugs to slip through as well.) But if the worst-case scenario is that CFL doesn't run fuzzers that it needs to run (i.e. a false negative), then that's a problem.
But, in my case, wouldn't that also mean that the coverage and corpora were tied to the main branch? So wouldn't it be better for me to set up my own coverage jobs and corpora for each branch to avoid problems such as the ones previously discussed? Would doing that mitigate the potential issues caused by coverage and corpus drift from branch to branch?
OK, so let's say I set up a full CFL implementation, with separate continuous builds, coverage reports, and corpora for each branch. Would this effectively cover all of the functionality that I'm currently getting from OSS-Fuzz on the main branch? If not, what functionality would still be missing? (Mainly I'm trying to figure out whether it makes sense to only use CFL for non-main branches.)
Does CFL store this information so that it can refine over time, as OSS-Fuzz does? If so, then it might actually be an advantage to have separate corpus/coverage/build data for CFL vs. OSS-Fuzz, especially if it eliminates the aforementioned 90-day disclosure limitation of CIFuzz (or CFL in OSS-Fuzz mode.) Where does CFL store this information?
At the end of the day, I'm perfectly happy with how OSS-Fuzz works, and I only went down the CIFuzz path because I was told (refer to thread here: libjpeg-turbo/libjpeg-turbo#559) that CIFuzz could fuzz arbitrary branches and PRs (apparently not.) The submitter of that PR also suggested that I might be able to support multiple branches in OSS-Fuzz by setting up our Dockerfile so that it pulls from multiple branches and renames the fuzzers according to branch, but I don't see how that could work either. As far as deduplication, as long as the bug is present in multiple branches, I'm fine with it only being reported in one branch. However, if the bug gets fixed in one branch, I would expect OSS-Fuzz to notify me if it doesn't get fixed in the other branches. It seems like CFL would get me up and running a lot more quickly than waiting for OSS-Fuzz to support multiple branches, even if CFL does require more work on my part. But in order for it to be useful for our project, I ideally need it to work as closely as possible to OSS-Fuzz, i.e.:
Now that I understand more about the limitations of CIFuzz (particularly the inability to work with multiple branches and the 90-day corpus delay), I realize that CIFuzz was never a good fit for libjpeg-turbo. Whether or not CFL is a good fit depends on whether I can achieve the four goals stated above. |
For that to work CFLite would have to run periodic tasks like "corpus pruning" and "code coverage" using GHActions and there only the default branch is supported: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onschedule. Without that CFLite can reasonably support multiple branches on commits using pre-built corpora (from OSS-Fuzz for example) (which is based on the assumption that those corpora are good enough for all the branches. There are projects where that assumption isn't correct but most parsers, compressors/decompressors and so on can be covered reasonably well with those corpora and a few hours of fuzzing on every push). |
I think that sharing corpora among all branches would be tolerable, but only if they aren't subject to a 90-day delay relative to OSS-Fuzz. The 90-day delay is a deal breaker. |
Agreed. As far as I can remember due to that lag CFLite missed a couple of bugs that could have been easily caught with the latest corpora. But over time it gets better and as long as fuzz targets don't change drastically it should work in most cases. FWIW since all the fuzz targets are public anyway and they are actually run outside of OSS-Fuzz (for example https://sourceware.org/bugzilla/show_bug.cgi?id=28666) I'm not sure whether it always makes sense to hide the latest corpora due to the disclosure policy. I think one option would be to let projects willing to accept that risk make their latest corpora public. |
Yes it's definitely possible to miss bugs if CIFuzz/CFL is wrong about a change. This is acceptable to me because CIFuzz/code change fuzzing isn't intended to replace use of proper batch fuzzing.
If you want to set up batch fuzzing/corpus pruning/coverage reports with CFL on the dev branch then there will be known of the branch-related issues discussed above for dev. If you don't like the downsides of using OSS-Fuzz's artifacts from master then this will be the best approach.
Yes. If you set up CFL for a branch other than master, you much of the features of OSS-Fuzz (though maybe less CPU time) that we mentioned above for that branch, without any of the bugginess related to branches I talked about earlier. I think this does make sense for you.
CFL has no 90 day disclosure policy. CFL's data (e.g. corpuses) are stored in github action's artifacts. These have a retention policy that is 90 days long, but I'm working on making this adjustable.
Sorry about that, maybe there was confusion with CFL.
I think this should also work. first you build your fuzzers from master and then build your fuzzers from the dev codebase and append a
It would notify you the bug exists in the other branch after you fix it. I think though theres an issue, if you fix a bug in dev and then OSS-Fuzz finds it exists in master, the bug could get disclosed before it's fixed in master.
CFL can currently provide all of the features you want except the last one. I can easily add that feature though. Right now CFL supports fuzzing arbitrary branches perfectly well, but not multiple branches without hiccups, but I can fix this. Honestly though, the renaming the fuzzers approach is probably the easiest. OSS-Fuzz supporting multiple branches is a policy question not a technical one, we'd allow it either by integrating a new project or by the renaming based on branch method. Your technically free to do the latter now, so that might be the best option.
This is a fair point but we think the 90day policy makes projects much more comfortable using OSS-Fuzz. In fact, it's probably shorter than most would like, even though a determined adversary only needs to spend CPU to get the same bugs. |
OSS-Fuzz would refer to the "default" branch everywhere though so for example bug reports with commit ranges where bugs were supposedly fixed would be misleading at best because OSS-Fuzz couldn't bisect the relevant branches. Separate projects like #7371 would be a cleaner approach but it would give projects with multiple branches much more resources (which wouldn't be fair I think).
I agree if it somehow makes most projects more comfortable it should be turned on by default. I'd opt-out though because in my opinion once fuzz targets are public all the bugs they can discover can be considered public as well. |
Conceptually that makes sense, but I'm tripping up a bit on the mechanics. The Dockerfile clones the libjpeg-turbo source code into the Docker container, and OSS-Fuzz then runs the container in order to build the fuzzers, but the fuzzers are built using the code that is already in the container. Presumably I would need to modify the Dockerfile (via a PR against OSS-Fuzz) so that it clones the various libjpeg-turbo branches into different directories under /src. However, it seems like I would need a multi-branch-aware build.sh script that iterates through each branch directory in the container, calling down to a subordinate build.sh script in that directory (fuzz/build.sh in the libjpeg-turbo repository) and passing it an appropriate argument or variable so that it knows whether or not to suffix the fuzzer names. Does that sound right? I would also be interested to hear your comment regarding @evverx's concern above.
Because of our branching strategy, dev will always be a superset of main, so any bug affecting both branches will be fixed in main first and then merged into dev.
I don't understand your distinction between arbitrary branches and multiple branches.
I would definitely opt out, because the reality is that bugs in libjpeg-turbo will get publicly disclosed rather quickly unless they are discovered by OSS-Fuzz or the Mozilla or Chrome developers. Everyone else just posts a GitHub issue whenever they find something, so it's better for me to find it first. Ideally, we're talking about bugs in pre-release code, not officially released code. I consider the risk of not finding a bug and thus officially releasing it to be much greater than the risk of disclosing a bug in pre-release code. |
This was my intuition as well. But I don't think it's really true. OSS-Fuzz uses the builds for bisection. For each build OSS-Fuzz knows the commit of each git repo in /src. So the regression/progression range will be correct I think. |
But how would it know that if multiple branches are being built? That ties into my question above regarding how to modify the Dockerfile. |
Exactly, clone two repos in the dockerfile and then have a multibranch aware build.sh.
Meaning CFL currently can handle one branch well. and that branch can be any branch. But if you use multiple branches, then there will be the problems with irrelevant corpus/coverage and broken novelty checking.
You can actually opt to have all your bugs be made public like LLVM does: https://github.com/google/oss-fuzz/blob/master/projects/llvm/project.yaml#L28 but this doesn't affect the corpus (though it probably should). |
It won't know. Which I think is fine. It will see the other repo and think it's a dependency or whatever. So it might report a bug as appearing in the range of commits in the other branch, but you will be able to see the range for the actual branch yourself. |
Good to know. Thanks. I still think it would be fragile because quite often OSS-Fuzz can't get the right commit ranges right even with one branch or mark 100% reproducible bugs as flaky for no apparent reason. Anyway, assuming multiple commit ranges should be correct I'm not sure bug reports like that can be consumed by the OSV database and so on. At least it isn't clear to me what the OSV database would refer to. |
Youre right this would mess up OSV, but I don't know if that matters to every maintainer. |
@jonathanmetzman I think ideally it shouldn't matter to maintainers but as long as the database is used to automatically file bugs against packages (including upstream projects) maintainers should react to I don't think it should be messed up to avoid things like https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2022q1/016161.html. |
Referring to the conversation in google/oss-fuzz#7479 and #559, there was a misunderstanding regarding how CIFuzz works. It cannot be used to fuzz arbitrary PRs or code branches, and it has a 90-day delay in downloading corpora from OSS-Fuzz. That makes it unsuitable for libjpeg-turbo.
Referring to the conversation in google/oss-fuzz#7479 and #559, there was a misunderstanding regarding how CIFuzz works. It cannot be used to fuzz arbitrary PRs or code branches, and it has a 90-day delay in downloading corpora from OSS-Fuzz. That makes it unsuitable for libjpeg-turbo.
Question: The libjpeg-turbo dev branch will not exist between the time that the current release series enters beta and the first feature is developed for the next-gen release series. I am handling that possibility by detecting that the branch doesn't exist and not including it in the Docker container or fuzzer builds. However, will OSS-Fuzz handle this gracefully? In other words, if a fuzzer that OSS-Fuzz once knew suddenly disappears, will OSS-Fuzz pick up where it left off if that fuzzer later re-appears? |
As far as I can remember when fuzz targets disappear issues on Monorail are closed and the corpora and coverage are removed so OSS-Fuzz starts afresh if they are brought back. Last time I removed a fuzz target it took CF a couple of months to delete a corpus though so in theory it probably could have been picked up if it had re-appeared. |
That's probably OK, because I won't merge dev down into main until all issues are resolved. (If I didn't resolve them, they would reappear in main anyhow.) Also, when dev reappears, it will have new features, so starting fresh is probably the right thing to do anyhow. |
libjpeg-turbo uses a stable mainline branch model, so the main branch is always stable and feeds into the current release series. The next-gen evolving release series is developed in the dev branch, and bug fixes are cherry-picked into stable branches for past release series. It is desirable to fuzz the dev branch to ensure that bugs are caught before the evolving code is merged down into main (which generally occurs in conjunction with a beta release) and also to allow for the fuzzers themselves to evolve along with the libjpeg-turbo feature set. It is also desirable to fuzz the stable branch from the most recent release series (2.0.x at the moment) to ensure that the same quality is maintained from when that code occupied the main branch. Note that both the Dockerfile and multi-branch build script included in this commit accommodate the fact that the dev branch may not exist. The dev branch will not exist between the time that the current release series enters beta and the first feature for the next-gen release series is developed. Closes #7479
libjpeg-turbo uses a stable mainline branch model, so the main branch is always stable and feeds into the current release series. The next-gen evolving release series is developed in the dev branch, and bug fixes are cherry-picked into stable branches for past release series. It is desirable to fuzz the dev branch to ensure that bugs are caught before the evolving code is merged down into main (which generally occurs in conjunction with a beta release) and also to allow for the fuzzers themselves to evolve along with the libjpeg-turbo feature set. It is also desirable to fuzz the stable branch from the most recent release series (2.0.x at the moment) to ensure that the same quality is maintained from when that code occupied the main branch. Note that both the Dockerfile and multi-branch build script included in this commit accommodate the fact that the dev branch may not exist. The dev branch will not exist between the time that the current release series enters beta and the first feature for the next-gen release series is developed. Closes google#7479
I have integrated CIFuzz with my project (libjpeg-turbo), but I cannot figure out how to make it test the actual code that is being passed to GitHub Actions. Because of
oss-fuzz/projects/libjpeg-turbo/Dockerfile
Line 19 in 9f236c1
This log file is from fuzzing the dev branch, which should be v2.1.80, but CIFuzz is testing v2.1.4 from the main branch instead:
https://github.com/libjpeg-turbo/libjpeg-turbo/runs/5518437997
This log file is from fuzzing the 2.0.x branch, which should be v2.0.7, but CIFuzz is testing v2.1.4 from the main branch instead:
https://github.com/libjpeg-turbo/libjpeg-turbo/runs/5728485608
The text was updated successfully, but these errors were encountered: