Fast entanglement detection based on entanglement candidates (suspects) #154
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements a heap-local candidate (a.k.a. suspect) set of objects that might contain a down-pointer. Candidates have a bit in their header which is marked by the write-barrier when a down-pointer is created. This is then used to accelerate the read-barrier for entanglement detection:
The fast path is supported by the compiler: in
ssa2-to-rssa
, we generate the code for the fast path, avoiding a runtime call in the case of a non-candidate.Whenever a heap becomes a leaf, we clear the candidates within that heap (because at this point, those objects are guaranteed to no longer have down-pointers).
Performance Improvement
The fast path works incredibly well. Here are some results from our recent experiments, measuring the performance improvement due to the fast path.
Running time improvements are as much as 4x at scale.
Notice also:
Overall performance
Due to fast path improvements, the overall cost of entanglement detection is now essentially zero. Space overhead appears to be negligible across the board. In terms of time overhead, we measured approximately 1% on average across 23 benchmarks, and a max of 7%. A majority of benchmarks (18 out of 23) incur less than 2% time overhead.