multiple corpus locations as input to fuzzing, and change default output corpus location #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for multiple corpus locations as input to fuzzing.
Destinations for new corpus entries
There are three possible destinations for writing new corpus entries, in decreasing order of sophistication or work needed on the part of the user:
-fuzzdir=/some/path
: Corpus is written to the user-specified directory (perhaps outside of VCS, perhaps ultimately stored as a tar.gz in blob storage, perhaps a separatefoo-corpus
repo)-fuzzdir=testdata
: Corpus is written to<pkg-path>/testdata/fuzz/<func>
for all matching fuzz functions (and therefore, most often in same VCS repo as the code under test)-fuzzdir
not set: Default to writing corpus underGOPATH/pkg/fuzz/corpus/...
Three rules for what corpus gets read, what corpus gets written
There are three simple rules for what corpus gets read, and what gets written:
-fuzzdir
is not specified)The last rule helps with transitioning from one location to another, without needing to manually hunt around or writing a
cp
script, which is important especially if you have many fuzz functions or packages being fuzzed.Additional details and rationale
GOPATH/pkg/fuzz/corpus/...
is a convenient default location, with some nice attributes:<pkg-path>
(which is the common case for a dependency under modules, which are stored in a read-only module cache).That said, defaulting to
GOPATH/pkg/fuzz/corpus/...
means that location is typically local to the current machine, and not shared with anyone else. That's fine if you are fuzzing for "minutes" or "hours", because even if you lose a machine you can mostly regain that CPU time spent by kicking off a fuzz run over night, or over the weekend, etc. However, if you have been fuzzing for "days" or "weeks", you probably want to store your corpus somewhere more permanent and more shareable, at which point you need to make a decision: store it in VCS along with your code under test (-fuzzdir=testdadta
), or store it somewhere else under your control (-fuzzdir=/some/path
) however you see fit.In order to make that transition from "I've been fuzzing locally and not been too worried about sharing my corpus" to "I want a more permanent or more shareable location for my corpus", when you first pick a non-default location for the corpus (via
-fuzzdir=/some/path
or-fuzzdir=testdata
), fzgo will seed that new non-default location with whatever it finds inGOPATH/pkg/fuzz/corpus/...
for the particular fuzz functions being run. This means you don't lose whatever you have found so far if you have been usingGOPATH/pkg/fuzz/corpus/...
for convenience.In addition, whenever you fuzz, it uses the corpus from all known locations as input. For example, if you fuzz via
go test ./... -fuzz=. -fuzzdir=/some/path
, then any unique corpus elements found in<pkg-path>/testdata/fuzz/<func>
,GOPATH/pkg/fuzz/corpus/...
, or/some/path
will all be used as input corpus for any matching fuzz functions.(There is a asterisk on the last two paragraphs -- they describe what I am currently proposing as the behavior, but because dvyukov/go-fuzz does not actually support reading from multiple input corpus locations, fzgo currently approximates the proposed behavior, which is described the
copyCachedCorpus
comment inmain.go
. In short, it currently always copies any unique corpus files to whatever the destination corpus location is, which can be argued might actually be better behavior anyway).