Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple corpus locations as input to fuzzing, and change default output corpus location #7

Merged
merged 7 commits into from
Aug 3, 2019

Conversation

thepudds
Copy link
Owner

@thepudds thepudds commented Aug 3, 2019

This adds support for multiple corpus locations as input to fuzzing.

Destinations for new corpus entries

There are three possible destinations for writing new corpus entries, in decreasing order of sophistication or work needed on the part of the user:

  • -fuzzdir=/some/path: Corpus is written to the user-specified directory (perhaps outside of VCS, perhaps ultimately stored as a tar.gz in blob storage, perhaps a separate foo-corpus repo)
  • -fuzzdir=testdata: Corpus is written to <pkg-path>/testdata/fuzz/<func> for all matching fuzz functions (and therefore, most often in same VCS repo as the code under test)
  • -fuzzdir not set: Default to writing corpus under GOPATH/pkg/fuzz/corpus/...

Three rules for what corpus gets read, what corpus gets written

There are three simple rules for what corpus gets read, and what gets written:

  1. fzgo always reads from all known corpus sources that exist.
  2. fzgo always writes only to the user's requested destination (with a reasonable default if -fuzzdir is not specified)
  3. if the destination does already not exist for a particular function being fuzzed, fzgo seeds the destination with any matching corpus elements present in any of the other known corpus locations from that function.

The last rule helps with transitioning from one location to another, without needing to manually hunt around or writing a cp script, which is important especially if you have many fuzz functions or packages being fuzzed.

Additional details and rationale

GOPATH/pkg/fuzz/corpus/... is a convenient default location, with some nice attributes:

  • It means you don't end up with a dirty VCS status by default if you are fuzzing your own code. (It is fairly annoying to dirty your VCS status if you don't intend to, especially if you are just fuzzing for a brief period of time).
  • It works even if the code under test is stored in a read-only <pkg-path> (which is the common case for a dependency under modules, which are stored in a read-only module cache).
  • The user doesn't need to think about where to store their corpus, which is convenient if you are starting out or just doing brief amounts of fuzzing.
  • The user didn't need to set up anything else (e.g., did not need to set up a separate repo, did not need to set up some other shared storage, etc.).
  • The corpus is saved somewhere, and it will be re-used when you fuzz again on that machine.

That said, defaulting to GOPATH/pkg/fuzz/corpus/... means that location is typically local to the current machine, and not shared with anyone else. That's fine if you are fuzzing for "minutes" or "hours", because even if you lose a machine you can mostly regain that CPU time spent by kicking off a fuzz run over night, or over the weekend, etc. However, if you have been fuzzing for "days" or "weeks", you probably want to store your corpus somewhere more permanent and more shareable, at which point you need to make a decision: store it in VCS along with your code under test (-fuzzdir=testdadta), or store it somewhere else under your control (-fuzzdir=/some/path) however you see fit.

In order to make that transition from "I've been fuzzing locally and not been too worried about sharing my corpus" to "I want a more permanent or more shareable location for my corpus", when you first pick a non-default location for the corpus (via -fuzzdir=/some/path or -fuzzdir=testdata), fzgo will seed that new non-default location with whatever it finds in GOPATH/pkg/fuzz/corpus/... for the particular fuzz functions being run. This means you don't lose whatever you have found so far if you have been using GOPATH/pkg/fuzz/corpus/... for convenience.

In addition, whenever you fuzz, it uses the corpus from all known locations as input. For example, if you fuzz via go test ./... -fuzz=. -fuzzdir=/some/path, then any unique corpus elements found in <pkg-path>/testdata/fuzz/<func>, GOPATH/pkg/fuzz/corpus/..., or /some/path will all be used as input corpus for any matching fuzz functions.

(There is a asterisk on the last two paragraphs -- they describe what I am currently proposing as the behavior, but because dvyukov/go-fuzz does not actually support reading from multiple input corpus locations, fzgo currently approximates the proposed behavior, which is described the copyCachedCorpus comment in main.go. In short, it currently always copies any unique corpus files to whatever the destination corpus location is, which can be argued might actually be better behavior anyway).

@thepudds thepudds merged commit 709a788 into master Aug 3, 2019
@thepudds thepudds deleted the dev-multi-corpus-location branch August 3, 2019 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant