Skip to content

Commit

Permalink
Remove stale implementation details of coverage instrumentation (#2179)
Browse files Browse the repository at this point in the history
This level of detail in the dev guide is a maintenance burden; better to leave
this sort of thing to in-tree comments.
  • Loading branch information
Zalathar authored Dec 30, 2024
1 parent 04a5a98 commit 9677beb
Showing 1 changed file with 2 additions and 354 deletions.
356 changes: 2 additions & 354 deletions src/llvm-coverage-instrumentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,7 @@ important benefits:
out the coverage counts of each unique instantiation of a generic function,
if invoked with multiple type substitution variations.

## Components of LLVM Coverage Instrumentation in `rustc`

### LLVM Runtime Dependency
## The LLVM profiler runtime

Coverage data is only generated by running the executable Rust program. `rustc`
statically links coverage-instrumented binaries with LLVM runtime code
Expand All @@ -94,209 +92,7 @@ When compiling with `-C instrument-coverage`,
[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/main/compiler-rt/lib/profile
[crate-loader-postprocess]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html#method.postprocess

### MIR Pass: `InstrumentCoverage`

Coverage instrumentation is performed on the MIR with a [MIR pass][mir-passes]
called [`InstrumentCoverage`][mir-instrument-coverage]. This MIR pass analyzes
the control flow graph (CFG)--represented by MIR `BasicBlock`s--to identify
code branches, attaches [`FunctionCoverageInfo`] to the function's body,
and injects additional [`Coverage`][coverage-statement] statements into the
`BasicBlock`s.

A MIR `Coverage` statement is a virtual instruction that indicates a counter
should be incremented when its adjacent statements are executed, to count
a span of code ([`CodeRegion`][code-region]). It counts the number of times a
branch is executed, and is referred to by coverage mappings in the function's
coverage-info struct.

Note that many coverage counters will _not_ be converted into
physical counters (or any other executable instructions) in the final binary.
Some of them will be (see [`CoverageKind::CounterIncrement`]),
but other counters can be computed on the fly, when generating a coverage
report, by mapping a `CodeRegion` to a coverage-counter _expression_.

As an example:

```rust
fn some_func(flag: bool) {
// increment Counter(1)
...
if flag {
// increment Counter(2)
...
} else {
// count = Expression(1) = Counter(1) - Counter(2)
...
}
// count = Expression(2) = Counter(1) + Zero
// or, alternatively, Expression(2) = Counter(2) + Expression(1)
...
}
```

In this example, four contiguous code regions are counted while only
incrementing two counters.

CFG analysis is used to not only determine _where_ the branches are, for
conditional expressions like `if`, `else`, `match`, and `loop`, but also to
determine where expressions can be used in place of physical counters.

The advantages of optimizing coverage through expressions are more pronounced
with loops. Loops generally include at least one conditional branch that
determines when to break out of a loop (a `while` condition, or an `if` or
`match` with a `break`). In MIR, this is typically lowered to a `SwitchInt`,
with one branch to stay in the loop, and another branch to break out of the
loop. The branch that breaks out will almost always execute less often,
so `InstrumentCoverage` chooses to add a `CounterIncrement` to that branch, and
uses an expression (`Counter(loop) - Counter(break)`) for the branch that
continues.

The `InstrumentCoverage` MIR pass is documented in
[more detail below][instrument-coverage-pass-details].

[mir-passes]: mir/passes.md
[mir-instrument-coverage]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_mir_transform/src/coverage
[`FunctionCoverageInfo`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.FunctionCoverageInfo.html
[code-region]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/ffi/struct.CodeRegion.html
[`CoverageKind::CounterIncrement`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.CounterIncrement
[coverage-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.StatementKind.html#variant.Coverage
[instrument-coverage-pass-details]: #implementation-details-of-the-instrumentcoverage-mir-pass

### Counter Injection and Coverage Map Pre-staging

When the compiler enters [the Codegen phase][backend-lowering-mir], with a
coverage-enabled MIR, [`codegen_statement()`][codegen-statement] converts each
MIR `Statement` into some backend-specific action or instruction.
`codegen_statement()` forwards `Coverage` statements to
[`codegen_coverage()`][codegen-coverage]:

```rust
pub fn codegen_statement(&mut self, mut bx: Bx, statement: &mir::Statement<'tcx>) -> Bx {
...
match statement.kind {
...
mir::StatementKind::Coverage(box ref coverage) => {
self.codegen_coverage(bx, coverage, statement.source_info.scope);
}
```

`codegen_coverage()` handles inlined statements and then forwards the coverage
statement to [`Builder::add_coverage`], which handles each `CoverageKind` as
follows:


- For both `CounterIncrement` and `ExpressionUsed`, the underlying counter or
expression ID is passed through to the corresponding [`FunctionCoverage`]
struct to indicate that the corresponding regions of code were not removed
by MIR optimizations.
- For `CoverageKind::CounterIncrement`s, an instruction is injected in the backend
IR to increment the physical counter, by calling the `BuilderMethod`
[`instrprof_increment()`][instrprof-increment].

```rust
fn add_coverage(&mut self, instance: Instance<'tcx>, coverage: &Coverage) {
...
let Coverage { kind } = coverage;
match *kind {
CoverageKind::CounterIncrement { id } => {
func_coverage.mark_counter_id_seen(id);
...
bx.instrprof_increment(fn_name, hash, num_counters, index);
}
CoverageKind::ExpressionUsed { id } => {
func_coverage.mark_expression_id_seen(id);
}
}
}
```

> The function name `instrprof_increment()` is taken from the LLVM intrinsic
call of the same name ([`llvm.instrprof.increment`][llvm-instrprof-increment]),
and uses the same arguments and types; but note that, up to and through this
stage (even though modeled after LLVM's implementation for code coverage
instrumentation), the data and instructions are not strictly LLVM-specific.
>
> But since LLVM is the only Rust-supported backend with the tooling to
process this form of coverage instrumentation, the backend for `Coverage`
statements is only implemented for LLVM, at this time.

[backend-lowering-mir]: backend/lowering-mir.md
[codegen-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_statement
[codegen-coverage]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_coverage
[`Builder::add_coverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/builder/struct.Builder.html#method.add_coverage
[`FunctionCoverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/map_data/struct.FunctionCoverage.html
[instrprof-increment]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/trait.BuilderMethods.html#tymethod.instrprof_increment

### Coverage Map Generation

With the instructions to increment counters now implemented in LLVM IR,
the last remaining step is to inject the LLVM IR variables that hold the
static data for the coverage map.

`rustc_codegen_llvm`'s [`compile_codegen_unit()`][compile-codegen-unit] calls
[`coverageinfo_finalize()`][coverageinfo-finalize],
which delegates its implementation to the
[`rustc_codegen_llvm::coverageinfo::mapgen`][mapgen-finalize] module.

For each function `Instance` (code-generated from MIR, including multiple
instances of the same MIR for generic functions that have different type
substitution combinations), `mapgen`'s `finalize()` method queries the
`Instance`-associated `FunctionCoverage` for its `Counter`s, `Expression`s,
and `CodeRegion`s; and calls LLVM codegen APIs to generate
properly-configured variables in LLVM IR, according to very specific
details of the [_LLVM Coverage Mapping Format_][coverage-mapping-format]
(Version 6).[^llvm-and-covmap-versions]

[^llvm-and-covmap-versions]: The Rust compiler (as of <!-- date-check: --> Nov 2024) supports _LLVM Coverage Mapping Format_ 6.
The Rust compiler will automatically use the most up-to-date coverage mapping format
version that is compatible with the compiler's built-in version of LLVM.

```rust
pub fn finalize<'ll, 'tcx>(cx: &CodegenCx<'ll, 'tcx>) {
...
if !tcx.sess.instrument_coverage_except_unused_functions() {
add_unused_functions(cx);
}

let mut function_coverage_map = match cx.coverage_context() {
Some(ctx) => ctx.take_function_coverage_map(),
None => return,
};
...
let mut mapgen = CoverageMapGenerator::new();

for (instance, function_coverage) in function_coverage_map {
...
let coverage_mapping_buffer = llvm::build_byte_buffer(|coverage_mapping_buffer| {
mapgen.write_coverage_mapping(expressions, counter_regions, coverage_mapping_buffer);
});
```
_code snippet trimmed for brevity_

One notable first step performed by `mapgen::finalize()` is the call to
[`add_unused_functions()`][add-unused-functions]:

When finalizing the coverage map, `FunctionCoverage` only has the `CodeRegion`s
and counters for the functions that went through codegen; such as public
functions and "used" functions (functions referenced by other "used" or public
items). Any other functions (considered unused) were still parsed and processed
through the MIR stage.

The set of unused functions is computed via the set difference of all MIR
`DefId`s (`tcx` query `mir_keys`) minus the codegenned `DefId`s (`tcx` query
`codegened_and_inlined_items`). `add_unused_functions()` computes the set of
unused functions, queries the `tcx` for the previously-computed `CodeRegions`,
for each unused MIR, synthesizes an LLVM function (with no internal statements,
since it will not be called), and adds a new `FunctionCoverage`, with
`Unreachable` code regions.

[compile-codegen-unit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/base/fn.compile_codegen_unit.html
[coverageinfo-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/context/struct.CodegenCx.html#method.coverageinfo_finalize
[mapgen-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.finalize.html
[coverage-mapping-format]: https://llvm.org/docs/CoverageMappingFormat.html
[add-unused-functions]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.add_unused_functions.html

## Testing LLVM Coverage
## Testing coverage instrumentation

[(See also the compiletest documentation for the `tests/coverage`
test suite.)](./tests/compiletest.md#coverage-tests)
Expand Down Expand Up @@ -341,151 +137,3 @@ and `mir-opt` tests can be refreshed by running:
[`src/tools/coverage-dump`]: https://github.com/rust-lang/rust/tree/master/src/tools/coverage-dump
[`tests/coverage-run-rustdoc`]: https://github.com/rust-lang/rust/tree/master/tests/coverage-run-rustdoc
[`tests/codegen/instrument-coverage/testprog.rs`]: https://github.com/rust-lang/rust/blob/master/tests/mir-opt/coverage/instrument_coverage.rs

## Implementation Details of the `InstrumentCoverage` MIR Pass

The bulk of the implementation of the `InstrumentCoverage` MIR pass is performed
by [`instrument_function_for_coverage`]. For each eligible MIR body, the instrumentor:

- Prepares a [coverage graph]
- Extracts mapping information from MIR
- Prepares counters for each relevant node/edge in the coverage graph
- Creates mapping data to be embedded in side-tables attached to the MIR body
- Injects counters and other coverage statements into MIR

The [coverage graph] is a coverage-specific simplification of the MIR control
flow graph (CFG). Its nodes are [`BasicCoverageBlock`s][bcb], which
encompass one or more sequentially-executed MIR `BasicBlock`s
(with no internal branching).

Nodes and edges in the graph can have associated [`BcbCounter`]s, which are
stored in [`CoverageCounters`].

[`instrument_function_for_coverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/fn.instrument_function_for_coverage.html
[coverage graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.CoverageGraph.html
[bcb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.BasicCoverageBlock.html
[`BcbCounter`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/enum.BcbCounter.html
[`CoverageCounters`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.CoverageCounters.html

### The `CoverageGraph`

The [`CoverageGraph`][coverage graph] is derived from the MIR (`mir::Body`).

```rust
let basic_coverage_blocks = CoverageGraph::from_mir(mir_body);
```

Like `mir::Body`, the `CoverageGraph` is also a
[`DirectedGraph`][directed-graph]. Both graphs represent the function's
fundamental control flow, with many of the same
[`graph trait`][graph-traits]s, supporting `start_node()`, `num_nodes()`,
`successors()`, `predecessors()`, and `is_dominated_by()`.

For anyone that knows how to work with the [MIR, as a CFG][mir-dev-guide], the
`CoverageGraph` will be familiar, and can be used in much the same way.
The nodes of the `CoverageGraph` are `BasicCoverageBlock`s (BCBs), which
index into an `IndexVec` of `BasicCoverageBlockData`. This is analogous
to the MIR CFG of `BasicBlock`s that index `BasicBlockData`.

Each `BasicCoverageBlockData` captures one or more MIR `BasicBlock`s,
exclusively, and represents the maximal-length sequence of `BasicBlocks`
without conditional branches.

[`compute_basic_coverage_blocks()`][compute-basic-coverage-blocks] builds the
`CoverageGraph` as a coverage-specific simplification of the MIR CFG. In
contrast with the [`SimplifyCfg`][simplify-cfg] MIR pass, this step does
not alter the MIR itself, because the `CoverageGraph` aggressively simplifies
the CFG, and ignores nodes that are not relevant to coverage. For example:

- The BCB CFG ignores (excludes) branches considered not relevant
to the current coverage solution. It excludes unwind-related code[^78544]
that is injected by the Rust compiler but has no physical source
code to count, which allows a `Call`-terminated BasicBlock
to be merged with its successor, within a single BCB.
- A `Goto`-terminated `BasicBlock` can be merged with its successor
**_as long as_** it has the only incoming edge to the successor
`BasicBlock`.
- Some BasicBlock terminators support Rust-specific concerns--like
borrow-checking--that are not relevant to coverage analysis. `FalseUnwind`,
for example, can be treated the same as a `Goto` (potentially merged with
its successor into the same BCB).

[^78544]: (Note, however, that Issue [#78544][rust-lang/rust#78544] considers
providing future support for coverage of programs that intentionally
`panic`, as an option, with some non-trivial cost.)

The BCB CFG is critical to simplifying the coverage analysis by ensuring graph path-based
queries (`is_dominated_by()`, `predecessors`, `successors`, etc.) have branch (control flow)
significance.

[directed-graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/trait.DirectedGraph.html
[graph-traits]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/index.html#traits
[mir-dev-guide]: mir/index.md
[compute-basic-coverage-blocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.CoverageGraph.html#method.compute_basic_coverage_blocks
[simplify-cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/simplify/enum.SimplifyCfg.html
[rust-lang/rust#78544]: https://github.com/rust-lang/rust/issues/78544

### `make_bcb_counters()`

[`make_bcb_counters`] traverses the `CoverageGraph` and adds a
`Counter` or `Expression` to every BCB. It uses _Control Flow Analysis_
to determine where an `Expression` can be used in place of a `Counter`.
`Expressions` have no runtime overhead, so if a viable expression (adding or
subtracting two other counters or expressions) can compute the same result as
an embedded counter, an `Expression` is preferred.

[`TraverseCoverageGraphWithLoops`][traverse-coverage-graph-with-loops]
provides a traversal order that ensures all `BasicCoverageBlock` nodes in a
loop are visited before visiting any node outside that loop. The traversal
state includes a `context_stack`, with the current loop's context information
(if in a loop), as well as context for nested loops.

Within loops, nodes with multiple outgoing edges (generally speaking, these
are BCBs terminated in a `SwitchInt`) can be optimized when at least one
branch exits the loop and at least one branch stays within the loop. (For an
`if` or `while`, there are only two branches, but a `match` may have more.)

A branch that does not exit the loop should be counted by `Expression`, if
possible. Note that some situations require assigning counters to BCBs before
they are visited by traversal, so the `counter_kind` (`CoverageKind` for
a `Counter` or `Expression`) may have already been assigned, in which case
one of the other branches should get the `Expression`.

For a node with more than two branches (such as for more than two
`match` patterns), only one branch can be optimized by `Expression`. All
others require a `Counter` (unless its BCB `counter_kind` was previously
assigned).

A branch expression is derived from the equation:

```text
Counter(branching_node) = SUM(Counter(branches))
```

It's important to
be aware that the `branches` in this equation are the outgoing _edges_
from the `branching_node`, but a `branch`'s target node may have other
incoming edges. Given the following graph, for example, the count for
`B` is the sum of its two incoming edges:

<img alt="Example graph with multiple incoming edges to a branch node"
src="img/coverage-branch-counting-01.png" class="center" style="width: 25%">
<br/>

In this situation, BCB node `B` may require an edge counter for its
"edge from A", and that edge might be computed from an `Expression`,
`Counter(A) - Counter(C)`. But an expression for the BCB _node_ `B`
would be the sum of all incoming edges:

```text
Expression((Counter(A) - Counter(C)) + SUM(Counter(remaining_edges)))
```

Note that this is only one possible configuration. The actual choice
of `Counter` vs. `Expression` also depends on the order of counter
assignments, and whether a BCB or incoming edge counter already has
its `Counter` or `Expression`.

[`make_bcb_counters`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.CoverageCounters.html#method.make_bcb_counters
[bcb-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.BcbCounters.html
[traverse-coverage-graph-with-loops]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.TraverseCoverageGraphWithLoops.html

0 comments on commit 9677beb

Please sign in to comment.