Dynamic number of benchmark repetitions #379

ggwpez · 2023-02-16T16:43:52Z

Problem

The benchmark pallet command is invoked with --steps and --repeat arguments which decide how often a benchmark should be repeated.
This can cause issues for very short benchmarks since the time-frame on the machine is too short to get constent performance.
It especially applies to VMs which can have CPU steal and other kind of short-lived disturbances, which do not impair the average performance but destroy the bench results.

Proposed Solution

Instead of running each bench with a fixed number of repetitions, we could look to criterion for more statistical driven results.
For example running the benchmark until we get consistent average results and error otherwise with "inconsistent hardware detected".

Ideally we could also get nicer prints to directly compare with past results 😄

Benchmarking DAG 1k/1k: Collecting 100 samples in estimated 5.0000 s (133M iter)
DAG 1k/1k               time:   [37.627 ns 37.723 ns 37.834 ns]
                        change: [-29.326% -28.527% -27.771%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  7 (7.00%) high severe

The text was updated successfully, but these errors were encountered:

bkchr · 2023-02-16T22:51:35Z

For example running the benchmark until we get consistent average results and error otherwise with "inconsistent hardware detected".

When we have long running benchmarks, we will have problems :P

What I just thought about for this issue, could we maybe not use criterion? Or would that be too crazy?

koute · 2023-02-20T07:05:03Z

What I just thought about for this issue, could we maybe not use criterion? Or would that be too crazy?

Unfortunately criterion doesn't seem to expose APIs which are low level enough for us to use as-is, but considering it is relatively small (~10k lines of code, most of which we shouldn't need anyway) and liberally licensed we could just copy the relevant code verbatim into a separate helper crate and use that; should be relatively straightforward to do from what I can see.

ggwpez · 2023-02-22T14:19:28Z

What I just thought about for this issue, could we maybe not use criterion? Or would that be too crazy?

Criterion requires nightly, but we could maybe work around that. Anyway for a first version I would just hack it together instead of doing a larger refactor to integrate Criterion.

bkchr mentioned this issue Feb 22, 2023

Update hardware requirements for benchmark machine paritytech/substrate#13308

Closed

bkchr mentioned this issue May 19, 2023

Baseline weights VM runners paritytech/substrate#13933

Merged

ggwpez mentioned this issue Jun 20, 2023

pallet benchmarking: Bump default steps and repeat paritytech/substrate#14408

Merged

juangirini transferred this issue from paritytech/substrate Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic number of benchmark repetitions #379

Dynamic number of benchmark repetitions #379

ggwpez commented Feb 16, 2023 •

edited

Loading

bkchr commented Feb 16, 2023

koute commented Feb 20, 2023

ggwpez commented Feb 22, 2023

Dynamic number of benchmark repetitions #379

Dynamic number of benchmark repetitions #379

Comments

ggwpez commented Feb 16, 2023 • edited Loading

Problem

Proposed Solution

bkchr commented Feb 16, 2023

koute commented Feb 20, 2023

ggwpez commented Feb 22, 2023

ggwpez commented Feb 16, 2023 •

edited

Loading