Provide finer-grained statistics #22

jonhoo · 2018-01-22T01:04:18Z

It'd be nice to be able to show other statistics such as the 95th percentile or median runtime. HdrHistogram can record this with relatively little overhead, and there's a pretty good official Rust implementation here (I'm one of the maintainers).

josephglanville · 2018-01-22T02:41:18Z

p95/p99/p99.9 values ofcourse would be great but an option to output the entire histogram both in machine consumable format (json/csv) and/or graphically would be amazing.

jonhoo · 2018-01-22T02:55:32Z

@josephglanville You should be able to do all of that with a HdrHistogram. Of course, it does pay some cost of accuracy in order to remain compact, but it's unlikely that users will notice. The exact trade-offs are also easy to control.

sharkdp · 2018-01-22T18:56:32Z

This sounds really interesting.

Is it possible to get reasonable estimates from HdrHistogram even if we only have very few samples (like 10 time measurements)?

jonhoo · 2018-01-22T20:34:57Z

HdrHistogram doesn't actually do any statistical analysis, it just gathers up recorded values in an efficient way. Think of it as keeping a count per bin (e.g., "there have been 3 samples in the range 1-3ms, 2 in 3-5, etc."), but in a "smart" way such that you generally always have good resolution, and the histogram doesn't grow unbounded if you have large discrepancies in the values.

jonhoo · 2018-01-25T01:25:27Z

Which then in turn can give you whatever percentile you want, though obviously not with higher fidelity than what the number of samples provides. There will be a small inaccuracy due to the binning, but it should be marginal.

XANi · 2018-03-26T10:13:10Z

@jonhoo The thing is that (i think) in most cases CLI testing will be used for less than < 100000 iterations, so you will be saving barely any memory while introducing inaccuracies. It might make sense if that was something that activates above certain threshold, but then 1 milion floats takes 8 MB if you wanted to just use "dumb" algorithm so I doubt it is worth it.

jonhoo · 2018-03-26T15:12:25Z

@XANi you're totally right that with few samples the advantages of using HdrHistogram aren't as compelling. That said, the inaccuracies will also likely be very small, and HdrHistogram does present a nice interface for getting percentile values. It would also be (marginally) faster than scanning all the recorded samples after the fact to compute the percentiles.

psteinb · 2018-12-03T14:58:58Z

I personally would love, if the --output-* options would store the timing results of each individual run instead of the summary statistics. This way creating histograms and such can be handled downstream by other tools/languages.

sharkdp · 2018-12-03T17:15:09Z

I personally would love, if the --output-* options would store the timing results of each individual run instead of the summary statistics. This way creating histograms and such can be handled downstream by other tools/languages.

This is exactly what the JSON-output option is for. Downstream tools will get all of the benchmarking information from that output, right?

psteinb · 2018-12-04T08:30:50Z

Yes, but #110 then I suggest to rename the flag, such as `--summ-as-csv` or so. From interface I cannot infer that json contains ALL the measurements and csv doesn't. The help text also doesn't undisclose this: ``` --export-csv <FILE> Export the timing results as CSV to the given FILE.

…

--export-json <FILE> Export the timing results as JSON to the given FILE. ```

sharkdp · 2018-12-05T07:58:08Z

Ok, good point. Let's clarify things in the --help text. I'd like to keep the current names for the command-line options.

sharkdp · 2018-12-12T19:33:15Z

In the spirit of the last few comments, I have created a folder with exemplary Python scripts that can be used to further analyze benchmarks that have been performed with hyperfine:

https://github.com/sharkdp/hyperfine/tree/master/scripts

The advanced_statistics.py script, for example, shows median, percentiles and interquartile range:

> ./advanced_statistics.py result-gaussian-distribution-mean-0.5-stddev-0.05.json
Command './gaussian.py'
  mean:      0.507 s
  stddev:    0.050 s
  median:    0.509 s

  percentiles:
     P_05 .. P_95:    0.424 s .. 0.583 s
     P_25 .. P_75:    0.472 s .. 0.543 s  (IQR = 0.071 s)

With this, I'd like to close this ticket, but I'd be really glad to get some feedback on this. If someone has ideas on how to improve or extend these scripts, I'd also be happy to take pull requests.

sharkdp added feature-request question Further information is requested labels Jan 22, 2018

sharkdp mentioned this issue Sep 10, 2018

Option to output the percentiles #82

Closed

psteinb mentioned this issue Dec 5, 2018

refined help text of export flags to separate summary results from su… #112

Merged

sharkdp closed this as completed Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide finer-grained statistics #22

Provide finer-grained statistics #22

jonhoo commented Jan 22, 2018 •

edited

Loading

josephglanville commented Jan 22, 2018

jonhoo commented Jan 22, 2018

sharkdp commented Jan 22, 2018

jonhoo commented Jan 22, 2018

jonhoo commented Jan 25, 2018

XANi commented Mar 26, 2018

jonhoo commented Mar 26, 2018

psteinb commented Dec 3, 2018

sharkdp commented Dec 3, 2018

psteinb commented Dec 4, 2018 via email

sharkdp commented Dec 5, 2018

sharkdp commented Dec 12, 2018 •

edited

Loading

Provide finer-grained statistics #22

Provide finer-grained statistics #22

Comments

jonhoo commented Jan 22, 2018 • edited Loading

josephglanville commented Jan 22, 2018

jonhoo commented Jan 22, 2018

sharkdp commented Jan 22, 2018

jonhoo commented Jan 22, 2018

jonhoo commented Jan 25, 2018

XANi commented Mar 26, 2018

jonhoo commented Mar 26, 2018

psteinb commented Dec 3, 2018

sharkdp commented Dec 3, 2018

psteinb commented Dec 4, 2018 via email

sharkdp commented Dec 5, 2018

sharkdp commented Dec 12, 2018 • edited Loading

jonhoo commented Jan 22, 2018 •

edited

Loading

sharkdp commented Dec 12, 2018 •

edited

Loading