-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzz testing integration #4129
Fuzz testing integration #4129
Conversation
ffa42e6
to
9fe50b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Bunch of small nits. I think my biggest complaint is that there's no real reason to move all the existing CI tests down a directory. There's no reason we can't have tests/fuzzing
exist while also leaving all the other tests where they are.
Didn't really give the Makefiles a good looksee, as that's not really my particular wheelhouse.
tests/fuzz/Makefile
Outdated
@@ -7,12 +7,7 @@ FUZZ_TARGETS_SRC := $(wildcard tests/fuzz/fuzz-*.c) | |||
FUZZ_TARGETS_OBJS := $(FUZZ_TARGETS_SRC:.c=.o) | |||
FUZZ_TARGETS_BIN := $(FUZZ_TARGETS_SRC:.c=) | |||
|
|||
FUZZ_COMMON_OBJS := \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like it belongs in the previous commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR @darosior, I've long wanted to look into fuzzing, and with this I have almost a guided tour to help me learn. Haven't looked too closely yet, but I'll grab a coco and start learning asap.
However, I share @niftynei's feeling re moving all tests down one directory level:
- It breaks almost all PRs
- Those are likely the most edited files in the whole repo, hiding them in ya directory means some more typing 😜
Ok, will remove the move! |
This adds a new configuration, --enable-fuzzing (which is more than welcome to be coupled with --enable-address-sanitizer), to pass the fuzzer sanitizer argument when compiling objects. This allows libfuzzer to actually be able "to fuzz" by detecting coverage and be smart when mutating inputs. As libfuzzer brings its own ~~fees~~ main(), we compile objects with fsanitize=fuzzer-no-link, and special-case the linkage of the fuzz targets. A "lib" is added to abstract out the interface to the fuzzing tool used. This allow us to use the same targets to fuzz using AFL, hongfuzz or w/e by adding their entrypoints into libfuzz. (h/t to practicalswift who introduced this for bitcoin-core, which i mimiced) Signed-off-by: Antoine Poinsot <[email protected]>
4af47e6
to
442099e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor performance tips :-)
Signed-off-by: Antoine Poinsot <[email protected]>
This: - Allows `.*btc` amounts (without post-decimal) - Avoids creating decimals when amount is 0 btc - Corrects our handling of the suffixes (memeqstr would sometimes return false because of null-termination) Changelog-Fixed: We are now able to parse any amount string (XXXmsat, XX.XXXbtc, ..) we create. Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
We don't use it, and it's buggy (will always return NULL) Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
My bad, i comitted them as part of the LSAN suppressions while this data race could have and had been fixed. Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
Thanks for the reviews so far! Amended as per latest one. (Will check about the process pool soon ™️) |
Signed-off-by: Antoine Poinsot <[email protected]>
Signed-off-by: Antoine Poinsot <[email protected]>
What do you think about having a repo in lightningd/ to store corpora ? |
Sounds like something we could consider. From what I've seen so far the corpus is not going to be huge (<= 1GB by my estimate) so that can work. I wonder however if the version control brings anything new to the table, given that the corpus consists mostly of binary files, so maybe a linear storage model can work as well? The big advantage of using git + github is the reuse of infra and processes to collaborate on building up a corpus. (edit: what I'm saying is that I would have no idea on how to judge a pull request xD) |
Yeah, really just thinking of copying the workflow at github.com/bitcoin-core/qa-assets :-) EDIT:
:) I'd just ask (trust?) the contributor to have ran
Been running one for almost a week it's 8.3M atm. But we really only have few simple targets.
A place which we can easily contribute to, but you still have control over since it's pulled in your CI. I'd say that the platform, Github, is more of interest than the version control tool, git, for this one :) |
@darosior that qa-assets repo looks ugly to me. Exactly because of what @cdecker wrote earlier. Here is my offer: If you agree to put the binary data out of git, i will happily help with any and all automation required (which will be itself properly kept in a git repository; e.g. building the corpuses, uploading them somewhere (or sharing them somehow), etc.). Yes, i learned a bit about Torrent in last weeks and am strongly biased against any binary blob in repositories designed for source code. https://git-annex.branchable.com/ or https://git-lfs.github.com/ can be used to track the binary blobs. EDIT: This offer is valid also for serving the binary data over BitTorrent. No cloud needed. |
Just for reference here is one repository that uses binary fuzz files, even inside its main repository: https://github.com/rust-bitcoin/rust-bitcoin/tree/master/fuzz. The proposed separate repository (like they have in https://github.com/bitcoin-core/qa-assets) is a better idea compared to in-tree blobs. |
I think the main issue is collecting and managing the corpus, the serving is secondary (happy to host the corpus on GCS for a couple of cents each month). For the code review and addition we have a really good process, that's no longer true if we start managing things outside the git tree. I absolutely share your dislike of binary artifacts in git, however it might be easiest to get started with an in-tree solution, and move that out at a later point in time. Or the separate corpus repo could be a cleaner solution that doesn't taint the main repo with these binaries. Going a bit back and forth between the solutions tbh. |
Yeah, management on a server would be awful, discussions around it would be mixed up. I can already imagine having to upload my new corpus to (a VPS / a torrent seeder / whatever) and bug cdecker on IRC for him to pull it and eventually merge it with the existing one on its GCS for it to be pulled by our CI... Wow. Let's just use Github? |
Ok, let's use a corpus repo then on the |
ACK acef7ea |
ACK acef7ea (have learned a lot about Fuzzing in the process, thank you!) Recommended reading (because the word fuzzing is quite fuzzy until the real meaning of it is seen): |
@rustyrussell could you look at the amount parsing commit ? iirc i stumbled upon something like that in the past and you corrected my fix so just would like your input on this one. |
@darosior This PR has been merged. Is your comment relevant or is it here by mistake? Any link to that commit you mention? |
This introduces coverage-guided fuzz testing using libfuzzer.
Why
We've been talking about it for some time now, and i'm sure it could help us to uncover trivial bugs with minimal overhead, more sneaky ones more easily than with unit tests, and finally to avoid regressions by maintaining a seed corpus to ran patches against.
Going further, structure-aware fuzzing could help us with Bitcoin Script creation or even with the state machine which while more intrusive because crypto seems plausible because of the daemons separation and messages.
For now, this PR only introduces the basis for integration with primitives and rudimentary targets (i focused on
common/
). Still, this found some bugs (eg amounts string parsing / formatting) and new targets could help with e.g. our JSON parser or the gossip store.Integration
This extends our
tests/
directory by moving the Python functional tests to thetests/functional/
subdirectory (sorry for the conflicts..) and creates a new sub:tests/fuzz/
.Inspired by the unit tests, all targets are named
fuzz-[target_name]
.Not inspired by the unit tests, all targets are built in
tests/fuzz/
. Even though there are onlycommon/
-specific targets for now, i think it makes sense to keep them contained in a single directory.A
libfuzz.h
abstracts out the interface to libFuzzer for potential future support for other fuzzers, andrun.py
allows to manage them in batches. This was inspired by the integration in bitcoin-core but i guess it's a pretty common pattern.What next
I propose to create a new repository (eg in https://github.com/lightningd/ ?) for tracking the seed corpus that we could update if we generate new coverage locally.
This could be pulled in a new CI job that checks PRs against each [target + seeds] once to check for regression. What do you think about using Github Action for this ?
Write more targets
Generate more coverage
Go to b) :)