Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does it make sense to use JaCoCo for coverage? #78

Closed
vlsi opened this issue Jan 16, 2020 · 10 comments
Closed

Does it make sense to use JaCoCo for coverage? #78

vlsi opened this issue Jan 16, 2020 · 10 comments

Comments

@vlsi
Copy link
Contributor

vlsi commented Jan 16, 2020

Do you think it makes sense to use https://github.com/jacoco/jacoco for coverage capturing?

@rohanpadhye
Copy link
Owner

In the Zest paper, we used Jacoco to evaluate the code coverage of the inputs generated by JQF after fuzzing is stopped. This gave us an objective measurement to evaluate Zest in comparison to other fuzzing approaches. However, JQF does not use Jacoco during the fuzzing loop itself to get coverage feedback.

JQF currently uses custom instrumentation (borrowed from a project called Janala) in order to have full control over what points in the bytecode are instrumented and how much overhead they add. My current understanding is that Jacoco is not optimized for performance and will therefore have higher overhead per test execution, but I have not actually evaluated this hypothesis.

Contributions are always welcome! If you would like to create a replacement for InstrumentingClassLoader that uses Jacoco to collect coverage instead of Janala, please open a PR!

@vlsi
Copy link
Contributor Author

vlsi commented Jan 16, 2020

My current understanding is that Jacoco is not optimized for performance and will therefore have higher overhead per test execution, but I have not actually evaluated this hypothesis.

That depends. Technically speaking, JaCoCo seems to be one of the first (the first?) users of https://openjdk.java.net/jeps/309. I'm not sure which features JQF requires, so I asked.

A good point with JaCoCo is there are clear ways to integrate it with build systems.

As of now, I think behind the lines of applying JQF to fuzz Apache Calcite: https://github.com/apache/calcite
For instance, implement coverage-guided https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/test/fuzzer/RexFuzzer.java or even fuzz SQL statements

@rohanpadhye
Copy link
Owner

That's interesting. Can you elaborate on integration with build systems?

I will look into the performance of JaCoCo when I get time, but I am not familiar with their API. All we need is a way to get a hook into every branch, call, and return being executed in the test program.

@rohanpadhye
Copy link
Owner

Aside: the RexFuzzer looks like a great example of a generator function that should work well with Zest. Please let me know if you find any bugs while fuzzing Apache Calcite (you can send a PR to the README)!

Contributions of generators to the examples package are also welcome if you are inclined to share them :-)

@vlsi
Copy link
Contributor Author

vlsi commented Jan 19, 2020

I will look into the performance of JaCoCo when I get time, but I am not familiar with their API. All we need is a way to get a hook into every branch, call, and return being executed in the test program.

Is per branch execution counter enough, or do you need a callback for each branch execution?

@rohanpadhye
Copy link
Owner

We need a callback for each trace event (i.e., bytecode instruction execution), where the event needs to be customizable. Some of the non-trivial fuzzing algorithms do more than just look at the aggregate count of branches, e.g. maintaining a shadow call stack by pushing/popping at calls/returns respectively, as well as tracking the heap by handling memory allocations and loads/stores. Some of these are research projects still in progress.

If JaCoCo has an API for requesting callbacks from arbitrary bytecode instruction executions, it would be a good starting point.

@vlsi
Copy link
Contributor Author

vlsi commented Jan 19, 2020

If JaCoCo has an API for requesting callbacks from arbitrary bytecode instruction executions

Oh, it does not seem to be in line with JaCoCo's goals :-/

@vlsi
Copy link
Contributor Author

vlsi commented Jan 19, 2020

That's interesting. Can you elaborate on integration with build systems?

Most major Java build systems have JaCoCo integration which means they know how to download and add JaCoCo's javaagent.

For instance, in Gradle one can add: plugins { jacoco }, and Gradle would automatically add JaCoCo javaagent to all the test tasks.

Then, if jqf could consume the required coverage data from JaCoCo's APIs, then it would simplify the life for the end-users.
For instance, they already know how to configure jacoco instrumentation (e.g. the set of included classes).

@rohanpadhye
Copy link
Owner

I see the point. We can probably mimic JaCoCo's configuration options instead of using Janala's if that makes it any easier to use.

Unfortunately, JaCoCo's CoverageTransformer uses a private final Instrumenter instance, making it not very extensible (otherwise, we could have re-used JaCoCo's agent and config and just dropped in JQF's own instrumenter classes).

I browsed through JaCoCo's MethodInstrumenter, and it seems that the main logic is simply to insert probes as elements in a boolean array. So, the only information JaCoCo seems to collect is whether a program location was visited.

This seems like it could be faster than JQF's current instrumentation -- writes to a local boolean array will likely be faster than invoking a callback handler. However, it also means that the amount of information is very limited. I'm not sure if it is worth the effort to implement a separate fuzzing engine in JQF that uses only boolean visit information.

For those who really want a lightweight fuzzer that uses JaCoCo instrumentation and does not support all the bells and whistles of QuickCheck generators and fancy fuzzing algorithms, there is https://github.com/fuzzitdev/javafuzz. cc: @yevgenypats

@rohanpadhye
Copy link
Owner

After playing around with JaCoCo a bit, I've learned that it is not very programmable via APIs. Perhaps their approach gives better performance, but it is difficult to integrate JaCoCO into a complex framework such as JQF. Closing this issue for now, but thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants