-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up instrumentation #25
Comments
From what I understand the trace function determines which files are covered, and the current implementation just track coverage for all python code, including installed libraries and the standard library. With that I think it should speed up the run, it should give afl a smaller attack area. |
This is correct.
There's a TODO for this: # TODO: make it configurable which modules are instrumented, and which are not But I'm afraid that the cost of the extra check could ealisy exceed the savings from skipping instrumentation. |
I'm making some tests with a sample project, this is basically what I changed on the trace function:
where module_path is a global variable, the value is passed to _init, it is expected to be something like: "/home/user/projroot" note that I remove the prefix from the filename, I noticed that sometimes the filename uses the full path and sometimes ./, meaning the traces would be different, I didn't really debug this to understand why or when. "fuzzer.py" is the fuzzer.py file passed to py-afl-fuzz, I don't really care about the coverage for the wrapper, but if it is not traced then afl thinks the binary has no instrumentation. I'm getting exec speeds of up to ~5k, but the stability is very low (less than 5%), and it says "no new instrumentation output" for a lot of the initial seed corpus. Maybe I'm doing something wrong here. I also changed the trace to something more naive: The project has ~90k LOC, so I'm thinking I should increase the map size, I see python-afl uses a 32 bit uint (a lot less then 90k). From what I understand afl expects to map blocks of code, not each line, so could we use a deterministic way to map each filename:lineno instead of hashing and truncating the hash? Maybe I'm thinking all this wrong, I'm currently fuzzing the whole project, should I be fuzzing each function separately? |
As indicated in README, the instrumentation is slow at the moment.
Here are some rough ideas how to speed it up:
Replace
sys.settrace()
with lower-levelPyEval_SetTrace
.Rewrite bytecode to inject instrumentation. (Perhaps use the bytecode module?)
Rewrite AST to inject instrumentation. (See how it's done in pytest.)
(I don't plan to work on any of these, unless there's funding for the work.)
The text was updated successfully, but these errors were encountered: