Feature Name: benchmarking
Start Date: 2018-01-11
RFC PR: (leave this empty)
Rust Issue: (leave this empty)

Summary

This aims to stabilize basic benchmarking tools for a stable cargo bench

Motivation

Benchmarking is important for maintaining good libraries. They give us a clear idea of performance tradeoffs and make it easier to pick the best library for the job. They also help people keep track of performance regressions, and aid in finding and fixing performance bottlenecks.

Guide-level explanation

You can write benchmarks much like tests; using a #[bench] annotation in your library code or in a dedicated file under benches/. You can also use [[bench]] entries in your Cargo.toml to place it in a custom location.

A benchmarking function looks like this:

use std::test::{Bencher, BenchResult};

#[bench]
fn my_benchmark(bench: Bencher) -> BenchResult {
    let x = do_some_setup();
    let result = bench.iter(|| x.compute_thing());
    x.teardown();
    result
}

Bencher::iter is where the actual code being benchmarked is placed. It will run the test multiple times until it has a clear idea of what the average time taken is, and the variance.

The benchmark can be run with cargo bench.

To ensure that the compiler doesn't optimize things away, use test::black_box. The following code will show very little time taken because of optimizations, because the optimizer knows the input at compile time and can do some of the computations beforehand.

use std::test::Bencher;

fn pow(x: u32, y: u32) -> u32 {
    if y == 0 {
        1
    } else {
        x * pow(x, y - 1)
    }
}

#[bench]
fn my_benchmark(bench: Bencher) -> BenchResult {
    bench.iter(|| pow(4, 30))
}

running 1 test
test my_benchmark ... bench:           4 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

However, via mem::black_box, we can blind the optimizer to the input values, so that it does not attempt to use them to optimize the code:

use std::mem;
use std::test::{Bencher, BenchResult};

#[bench]
fn my_benchmark(bench: Bencher) -> BenchResult {
    let x = mem::black_box(4);
    let y = mem::black_box(30);
    bench.iter(|| pow(x, y))
}

running 1 test
test my_benchmark ... bench:          11 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

Any result that is yielded from the callback for Bencher::iter() is also black boxed; otherwise, the compiler might notice that the result is unused and optimize out the entire computation.

In case you are generating unused values that do not get returned from the callback, use black_box() on them as well:

fn my_benchmark(bench: Bencher) -> BenchResult {
    let x = mem::black_box(4);
    let y = mem::black_box(30);
    bench.iter(|| {
        black_box(pow(y, x));
        pow(x, y)
    })
}

In case you want the benchmark to run for a predetermined number of times, use iter_n:

#[bench]
fn my_benchmark(bench: Bencher) -> BenchResult {
    bench.iter_n(1000, || do_some_stuff());
}

Reference-level explanation

The bencher reports the median value and deviation (difference between min and max). Samples are winsorized, so extreme outliers get clamped.

cargo bench essentially takes the same flags as cargo test, except it has a --bench foo flag to select a single benchmark target.

Drawbacks

The reason we haven't stabilized this so far is basically because we're hoping to have a custom test framework system, so that the bencher can be written as a crate. This is still an alternative, though there has been no movement on this front in years.

Rationale and alternatives

This design works. It doesn't give you fine grained tools for analyzing results, but it's a basic building block that lets one do most benchmarking tasks. The alternatives include a custom test/bench framework, which is much more holistic, or exposing more fundamental building blocks.

Another possible API would be one which implicitly handles the black boxing, something like

let input1 = foo();
let input2 = bar();
bencher.iter(|(input1, input2)| baz(input1, input2), (input1, input2))

This has problems with the types not being Copy, and it feels a bit less flexible.

Unresolved questions

Should stuff be in std::test or a partially-stabilized libtest?
Should we stabilize any other Bencher methods (like run_once)?
Stable machine-readable output for this would be nice, but can be done in a separate RFC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0000-benchmarking.md

0000-benchmarking.md

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Rationale and alternatives

Unresolved questions

Files

0000-benchmarking.md

Latest commit

History

0000-benchmarking.md

File metadata and controls

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Rationale and alternatives

Unresolved questions