From a08c5b25bccf1edb0c9f297ab3c5c2594e260b56 Mon Sep 17 00:00:00 2001 From: Zalathar Date: Mon, 30 Dec 2024 11:51:53 +1100 Subject: [PATCH] Remove stale implementation details of coverage instrumentation This level of detail in the dev guide is a maintenance burden; better to leave this sort of thing to in-tree comments. --- src/llvm-coverage-instrumentation.md | 356 +-------------------------- 1 file changed, 2 insertions(+), 354 deletions(-) diff --git a/src/llvm-coverage-instrumentation.md b/src/llvm-coverage-instrumentation.md index 97dc0921d..3078ae094 100644 --- a/src/llvm-coverage-instrumentation.md +++ b/src/llvm-coverage-instrumentation.md @@ -73,9 +73,7 @@ important benefits: out the coverage counts of each unique instantiation of a generic function, if invoked with multiple type substitution variations. -## Components of LLVM Coverage Instrumentation in `rustc` - -### LLVM Runtime Dependency +## The LLVM profiler runtime Coverage data is only generated by running the executable Rust program. `rustc` statically links coverage-instrumented binaries with LLVM runtime code @@ -94,209 +92,7 @@ When compiling with `-C instrument-coverage`, [compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/main/compiler-rt/lib/profile [crate-loader-postprocess]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html#method.postprocess -### MIR Pass: `InstrumentCoverage` - -Coverage instrumentation is performed on the MIR with a [MIR pass][mir-passes] -called [`InstrumentCoverage`][mir-instrument-coverage]. This MIR pass analyzes -the control flow graph (CFG)--represented by MIR `BasicBlock`s--to identify -code branches, attaches [`FunctionCoverageInfo`] to the function's body, -and injects additional [`Coverage`][coverage-statement] statements into the -`BasicBlock`s. - -A MIR `Coverage` statement is a virtual instruction that indicates a counter -should be incremented when its adjacent statements are executed, to count -a span of code ([`CodeRegion`][code-region]). It counts the number of times a -branch is executed, and is referred to by coverage mappings in the function's -coverage-info struct. - -Note that many coverage counters will _not_ be converted into -physical counters (or any other executable instructions) in the final binary. -Some of them will be (see [`CoverageKind::CounterIncrement`]), -but other counters can be computed on the fly, when generating a coverage -report, by mapping a `CodeRegion` to a coverage-counter _expression_. - -As an example: - -```rust -fn some_func(flag: bool) { - // increment Counter(1) - ... - if flag { - // increment Counter(2) - ... - } else { - // count = Expression(1) = Counter(1) - Counter(2) - ... - } - // count = Expression(2) = Counter(1) + Zero - // or, alternatively, Expression(2) = Counter(2) + Expression(1) - ... -} -``` - -In this example, four contiguous code regions are counted while only -incrementing two counters. - -CFG analysis is used to not only determine _where_ the branches are, for -conditional expressions like `if`, `else`, `match`, and `loop`, but also to -determine where expressions can be used in place of physical counters. - -The advantages of optimizing coverage through expressions are more pronounced -with loops. Loops generally include at least one conditional branch that -determines when to break out of a loop (a `while` condition, or an `if` or -`match` with a `break`). In MIR, this is typically lowered to a `SwitchInt`, -with one branch to stay in the loop, and another branch to break out of the -loop. The branch that breaks out will almost always execute less often, -so `InstrumentCoverage` chooses to add a `CounterIncrement` to that branch, and -uses an expression (`Counter(loop) - Counter(break)`) for the branch that -continues. - -The `InstrumentCoverage` MIR pass is documented in -[more detail below][instrument-coverage-pass-details]. - -[mir-passes]: mir/passes.md -[mir-instrument-coverage]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_mir_transform/src/coverage -[`FunctionCoverageInfo`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/struct.FunctionCoverageInfo.html -[code-region]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/ffi/struct.CodeRegion.html -[`CoverageKind::CounterIncrement`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/coverage/enum.CoverageKind.html#variant.CounterIncrement -[coverage-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.StatementKind.html#variant.Coverage -[instrument-coverage-pass-details]: #implementation-details-of-the-instrumentcoverage-mir-pass - -### Counter Injection and Coverage Map Pre-staging - -When the compiler enters [the Codegen phase][backend-lowering-mir], with a -coverage-enabled MIR, [`codegen_statement()`][codegen-statement] converts each -MIR `Statement` into some backend-specific action or instruction. -`codegen_statement()` forwards `Coverage` statements to -[`codegen_coverage()`][codegen-coverage]: - -```rust - pub fn codegen_statement(&mut self, mut bx: Bx, statement: &mir::Statement<'tcx>) -> Bx { - ... - match statement.kind { - ... - mir::StatementKind::Coverage(box ref coverage) => { - self.codegen_coverage(bx, coverage, statement.source_info.scope); - } -``` - -`codegen_coverage()` handles inlined statements and then forwards the coverage -statement to [`Builder::add_coverage`], which handles each `CoverageKind` as -follows: - - -- For both `CounterIncrement` and `ExpressionUsed`, the underlying counter or - expression ID is passed through to the corresponding [`FunctionCoverage`] - struct to indicate that the corresponding regions of code were not removed - by MIR optimizations. -- For `CoverageKind::CounterIncrement`s, an instruction is injected in the backend - IR to increment the physical counter, by calling the `BuilderMethod` - [`instrprof_increment()`][instrprof-increment]. - -```rust - fn add_coverage(&mut self, instance: Instance<'tcx>, coverage: &Coverage) { - ... - let Coverage { kind } = coverage; - match *kind { - CoverageKind::CounterIncrement { id } => { - func_coverage.mark_counter_id_seen(id); - ... - bx.instrprof_increment(fn_name, hash, num_counters, index); - } - CoverageKind::ExpressionUsed { id } => { - func_coverage.mark_expression_id_seen(id); - } - } - } -``` - -> The function name `instrprof_increment()` is taken from the LLVM intrinsic -call of the same name ([`llvm.instrprof.increment`][llvm-instrprof-increment]), -and uses the same arguments and types; but note that, up to and through this -stage (even though modeled after LLVM's implementation for code coverage -instrumentation), the data and instructions are not strictly LLVM-specific. -> -> But since LLVM is the only Rust-supported backend with the tooling to -process this form of coverage instrumentation, the backend for `Coverage` -statements is only implemented for LLVM, at this time. - -[backend-lowering-mir]: backend/lowering-mir.md -[codegen-statement]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_statement -[codegen-coverage]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.codegen_coverage -[`Builder::add_coverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/builder/struct.Builder.html#method.add_coverage -[`FunctionCoverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/map_data/struct.FunctionCoverage.html -[instrprof-increment]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/trait.BuilderMethods.html#tymethod.instrprof_increment - -### Coverage Map Generation - -With the instructions to increment counters now implemented in LLVM IR, -the last remaining step is to inject the LLVM IR variables that hold the -static data for the coverage map. - -`rustc_codegen_llvm`'s [`compile_codegen_unit()`][compile-codegen-unit] calls -[`coverageinfo_finalize()`][coverageinfo-finalize], -which delegates its implementation to the -[`rustc_codegen_llvm::coverageinfo::mapgen`][mapgen-finalize] module. - -For each function `Instance` (code-generated from MIR, including multiple -instances of the same MIR for generic functions that have different type -substitution combinations), `mapgen`'s `finalize()` method queries the -`Instance`-associated `FunctionCoverage` for its `Counter`s, `Expression`s, -and `CodeRegion`s; and calls LLVM codegen APIs to generate -properly-configured variables in LLVM IR, according to very specific -details of the [_LLVM Coverage Mapping Format_][coverage-mapping-format] -(Version 6).[^llvm-and-covmap-versions] - -[^llvm-and-covmap-versions]: The Rust compiler (as of Nov 2024) supports _LLVM Coverage Mapping Format_ 6. - The Rust compiler will automatically use the most up-to-date coverage mapping format - version that is compatible with the compiler's built-in version of LLVM. - -```rust -pub fn finalize<'ll, 'tcx>(cx: &CodegenCx<'ll, 'tcx>) { - ... - if !tcx.sess.instrument_coverage_except_unused_functions() { - add_unused_functions(cx); - } - - let mut function_coverage_map = match cx.coverage_context() { - Some(ctx) => ctx.take_function_coverage_map(), - None => return, - }; - ... - let mut mapgen = CoverageMapGenerator::new(); - - for (instance, function_coverage) in function_coverage_map { - ... - let coverage_mapping_buffer = llvm::build_byte_buffer(|coverage_mapping_buffer| { - mapgen.write_coverage_mapping(expressions, counter_regions, coverage_mapping_buffer); - }); -``` -_code snippet trimmed for brevity_ - -One notable first step performed by `mapgen::finalize()` is the call to -[`add_unused_functions()`][add-unused-functions]: - -When finalizing the coverage map, `FunctionCoverage` only has the `CodeRegion`s -and counters for the functions that went through codegen; such as public -functions and "used" functions (functions referenced by other "used" or public -items). Any other functions (considered unused) were still parsed and processed -through the MIR stage. - -The set of unused functions is computed via the set difference of all MIR -`DefId`s (`tcx` query `mir_keys`) minus the codegenned `DefId`s (`tcx` query -`codegened_and_inlined_items`). `add_unused_functions()` computes the set of -unused functions, queries the `tcx` for the previously-computed `CodeRegions`, -for each unused MIR, synthesizes an LLVM function (with no internal statements, -since it will not be called), and adds a new `FunctionCoverage`, with -`Unreachable` code regions. - -[compile-codegen-unit]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/base/fn.compile_codegen_unit.html -[coverageinfo-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/context/struct.CodegenCx.html#method.coverageinfo_finalize -[mapgen-finalize]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.finalize.html -[coverage-mapping-format]: https://llvm.org/docs/CoverageMappingFormat.html -[add-unused-functions]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/coverageinfo/mapgen/fn.add_unused_functions.html - -## Testing LLVM Coverage +## Testing coverage instrumentation [(See also the compiletest documentation for the `tests/coverage` test suite.)](./tests/compiletest.md#coverage-tests) @@ -341,151 +137,3 @@ and `mir-opt` tests can be refreshed by running: [`src/tools/coverage-dump`]: https://github.com/rust-lang/rust/tree/master/src/tools/coverage-dump [`tests/coverage-run-rustdoc`]: https://github.com/rust-lang/rust/tree/master/tests/coverage-run-rustdoc [`tests/codegen/instrument-coverage/testprog.rs`]: https://github.com/rust-lang/rust/blob/master/tests/mir-opt/coverage/instrument_coverage.rs - -## Implementation Details of the `InstrumentCoverage` MIR Pass - -The bulk of the implementation of the `InstrumentCoverage` MIR pass is performed -by [`instrument_function_for_coverage`]. For each eligible MIR body, the instrumentor: - -- Prepares a [coverage graph] -- Extracts mapping information from MIR -- Prepares counters for each relevant node/edge in the coverage graph -- Creates mapping data to be embedded in side-tables attached to the MIR body -- Injects counters and other coverage statements into MIR - -The [coverage graph] is a coverage-specific simplification of the MIR control -flow graph (CFG). Its nodes are [`BasicCoverageBlock`s][bcb], which -encompass one or more sequentially-executed MIR `BasicBlock`s -(with no internal branching). - -Nodes and edges in the graph can have associated [`BcbCounter`]s, which are -stored in [`CoverageCounters`]. - -[`instrument_function_for_coverage`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/fn.instrument_function_for_coverage.html -[coverage graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.CoverageGraph.html -[bcb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.BasicCoverageBlock.html -[`BcbCounter`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/enum.BcbCounter.html -[`CoverageCounters`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.CoverageCounters.html - -### The `CoverageGraph` - -The [`CoverageGraph`][coverage graph] is derived from the MIR (`mir::Body`). - -```rust - let basic_coverage_blocks = CoverageGraph::from_mir(mir_body); -``` - -Like `mir::Body`, the `CoverageGraph` is also a -[`DirectedGraph`][directed-graph]. Both graphs represent the function's -fundamental control flow, with many of the same -[`graph trait`][graph-traits]s, supporting `start_node()`, `num_nodes()`, -`successors()`, `predecessors()`, and `is_dominated_by()`. - -For anyone that knows how to work with the [MIR, as a CFG][mir-dev-guide], the -`CoverageGraph` will be familiar, and can be used in much the same way. -The nodes of the `CoverageGraph` are `BasicCoverageBlock`s (BCBs), which -index into an `IndexVec` of `BasicCoverageBlockData`. This is analogous -to the MIR CFG of `BasicBlock`s that index `BasicBlockData`. - -Each `BasicCoverageBlockData` captures one or more MIR `BasicBlock`s, -exclusively, and represents the maximal-length sequence of `BasicBlocks` -without conditional branches. - -[`compute_basic_coverage_blocks()`][compute-basic-coverage-blocks] builds the -`CoverageGraph` as a coverage-specific simplification of the MIR CFG. In -contrast with the [`SimplifyCfg`][simplify-cfg] MIR pass, this step does -not alter the MIR itself, because the `CoverageGraph` aggressively simplifies -the CFG, and ignores nodes that are not relevant to coverage. For example: - - - The BCB CFG ignores (excludes) branches considered not relevant - to the current coverage solution. It excludes unwind-related code[^78544] - that is injected by the Rust compiler but has no physical source - code to count, which allows a `Call`-terminated BasicBlock - to be merged with its successor, within a single BCB. - - A `Goto`-terminated `BasicBlock` can be merged with its successor - **_as long as_** it has the only incoming edge to the successor - `BasicBlock`. - - Some BasicBlock terminators support Rust-specific concerns--like - borrow-checking--that are not relevant to coverage analysis. `FalseUnwind`, - for example, can be treated the same as a `Goto` (potentially merged with - its successor into the same BCB). - -[^78544]: (Note, however, that Issue [#78544][rust-lang/rust#78544] considers -providing future support for coverage of programs that intentionally -`panic`, as an option, with some non-trivial cost.) - -The BCB CFG is critical to simplifying the coverage analysis by ensuring graph path-based -queries (`is_dominated_by()`, `predecessors`, `successors`, etc.) have branch (control flow) -significance. - -[directed-graph]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/trait.DirectedGraph.html -[graph-traits]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/graph/index.html#traits -[mir-dev-guide]: mir/index.md -[compute-basic-coverage-blocks]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.CoverageGraph.html#method.compute_basic_coverage_blocks -[simplify-cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/simplify/enum.SimplifyCfg.html -[rust-lang/rust#78544]: https://github.com/rust-lang/rust/issues/78544 - -### `make_bcb_counters()` - -[`make_bcb_counters`] traverses the `CoverageGraph` and adds a -`Counter` or `Expression` to every BCB. It uses _Control Flow Analysis_ -to determine where an `Expression` can be used in place of a `Counter`. -`Expressions` have no runtime overhead, so if a viable expression (adding or -subtracting two other counters or expressions) can compute the same result as -an embedded counter, an `Expression` is preferred. - -[`TraverseCoverageGraphWithLoops`][traverse-coverage-graph-with-loops] -provides a traversal order that ensures all `BasicCoverageBlock` nodes in a -loop are visited before visiting any node outside that loop. The traversal -state includes a `context_stack`, with the current loop's context information -(if in a loop), as well as context for nested loops. - -Within loops, nodes with multiple outgoing edges (generally speaking, these -are BCBs terminated in a `SwitchInt`) can be optimized when at least one -branch exits the loop and at least one branch stays within the loop. (For an -`if` or `while`, there are only two branches, but a `match` may have more.) - -A branch that does not exit the loop should be counted by `Expression`, if -possible. Note that some situations require assigning counters to BCBs before -they are visited by traversal, so the `counter_kind` (`CoverageKind` for -a `Counter` or `Expression`) may have already been assigned, in which case -one of the other branches should get the `Expression`. - -For a node with more than two branches (such as for more than two -`match` patterns), only one branch can be optimized by `Expression`. All -others require a `Counter` (unless its BCB `counter_kind` was previously -assigned). - -A branch expression is derived from the equation: - -```text -Counter(branching_node) = SUM(Counter(branches)) -``` - -It's important to -be aware that the `branches` in this equation are the outgoing _edges_ -from the `branching_node`, but a `branch`'s target node may have other -incoming edges. Given the following graph, for example, the count for -`B` is the sum of its two incoming edges: - -Example graph with multiple incoming edges to a branch node -
- -In this situation, BCB node `B` may require an edge counter for its -"edge from A", and that edge might be computed from an `Expression`, -`Counter(A) - Counter(C)`. But an expression for the BCB _node_ `B` -would be the sum of all incoming edges: - -```text -Expression((Counter(A) - Counter(C)) + SUM(Counter(remaining_edges))) -``` - -Note that this is only one possible configuration. The actual choice -of `Counter` vs. `Expression` also depends on the order of counter -assignments, and whether a BCB or incoming edge counter already has -its `Counter` or `Expression`. - -[`make_bcb_counters`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.CoverageCounters.html#method.make_bcb_counters -[bcb-counters]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/counters/struct.BcbCounters.html -[traverse-coverage-graph-with-loops]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/coverage/graph/struct.TraverseCoverageGraphWithLoops.html