Include histogram in JSON output #395

fxkr · 2019-04-24T07:39:13Z

Background

Interesting cases to test:

# Main use case
% vegeta report -type=json -hist='[0,1ms,2ms,3ms,4ms,5ms]' vegeta.bin
{"latencies": ..., "hist": [0,0,0,56,44,400], ...}

# If -hist is not given, "hist" is omitted
% vegeta report -type=json vegeta.bin

# -type=hist supports -hist too (for consistency)
% vegeta report -type=hist -hist='[0,1ms,2ms,3ms,4ms,5ms]' vegeta.bin

# Legacy -type=hist invocation continues to work unchanged
% vegeta report -type=hist'[0,1ms,2ms,3ms,4ms,5ms]' vegeta.bin

# Help should mention the new parameter
% vegeta --help

Checklist

Git commit messages conform to community standards.
Each Git commit represents meaningful milestones or atomic units of work.
Changed or added code is covered by appropriate tests.

tsenart · 2019-04-28T12:00:52Z

Thanks for the contribution! I have some thoughts that I'd like to share and get your input on.

I've had the feeling for a long time that the histogram report is under-used and its UX and utility could be substantially improved. Your use-case hints at the same conclusion.

Similarly to how vegeta plot became its own sub command when it started to deviate significantly from the other "reports", I think that the introduction of this --hist command line argument suggests that a vegeta hist(ogram) sub-command is up for promotion.

My vision for this sub-command could be captured like this in its help output:

Usage: vegeta histogram [options] [<file>...]

Outputs a histogram of request latencies in HTML, JSON or text format.

Arguments:
  <file>  A file with vegeta attack results encoded with one of
          the supported encodings (gob | json | csv) [default: stdin]

Options:
  --bins    How to bin request latencies. Defaults to auto
            logarithmic binning. Specific bins can also be specified
            with as a string of comma separated durations (e.g "0,1ms,2ms,3ms,4ms,5ms")
            [default: "auto"]

  --format  Format to output histogram in. Supported formats are
            html, json and text. [default: "text"]

Examples:
  echo "GET http://:80" | vegeta attack -name=50qps -rate=50 -duration=5s > results.50qps.bin
  cat results.50qps.bin | vegeta histogram

What are your thoughts on this?

fxkr · 2019-04-29T01:09:42Z

I feel that histograms are underused in general, and I would like to see their use promoted.

I would go the opposite way and eventually include a histogram in the default report output. Having it in a separate command kind of implies it's something not normally needed. Whereas in reality averages alone can be misleading, so imo the histogram should always be shown at least for a quick look. As a user, I would also like to see a good enough summary with one command and no options, ie vegeta report.

The limit is that I also don't want to have to scroll the output. But the current text output is 7 lines plus errors, so some space is available. And actually I'd rather have the histogram than the errors, which are excessive for me due to #399, but that can be fixed with better aggregation (let's discuss that in the other issue).

I'm not against having subcommands for the individual reports. But:

a) I'd still want a combined report for ease of use and to have a standard, easily recognizable output text format (e.g. if I see iperf output I know what to look for right away. I do that a lot.). It's not mutually exclusive with subcommands, it's enough if it just calls all the individual commands with default params for me. But defaults matter, and good defaults can save a lot of time.

b) I'm fine with having a flexible report command with many options for individual "sections" of the report. For histograms, I don't foresee more than --bins. That can easily stay in report. We shouldn't even have too many parameters. Users with very specific requirements will build their own scripts around the JSON or API anyway.

Also I think it's very important to have a single command that outputs all the data in one JSON. This is kind of important for easy integration. It doesn't have to be report -format=json, but it can be.

tsenart · 2019-05-05T11:45:41Z

I feel that histograms are underused in general, and I would like to see their use promoted.

I echo this sentiment.

I would go the opposite way and eventually include a histogram in the default report output. Having it in a separate command kind of implies it's something not normally needed. Whereas in reality averages alone can be misleading, so imo the histogram should always be shown at least for a quick look.

It's is because averages can be misleading that the default text report includes percentiles. Do you think that many users fail to interpret percentiles, but would instead succeed with a histogram?

As a user, I would also like to see a good enough summary with one command and no options, ie vegeta report. The limit is that I also don't want to have to scroll the output. But the current text output is 7 lines plus errors, so some space is available.

a) I'd still want a combined report for ease of use and to have a standard, easily recognizable output text format (e.g. if I see iperf output I know what to look for right away. I do that a lot.). It's not mutually exclusive with subcommands, it's enough if it just calls all the individual commands with default params for me.

How are you envisioning this standard report output? Could you show me an illustrated example?

But defaults matter, and good defaults can save a lot of time.

Automatic binning, then, seems like the way to go. How would you approach this?

And actually I'd rather have the histogram than the errors, which are excessive for me due to #399, but that can be fixed with better aggregation (let's discuss that in the other issue).

Good call! These errors can should indeed be better aggregated. Thanks for opening #399.

b) I'm fine with having a flexible report command with many options for individual "sections" of the report. For histograms, I don't foresee more than --bins. That can easily stay in report. We shouldn't even have too many parameters. Users with very specific requirements will build their own scripts around the JSON or API anyway.

Also I think it's very important to have a single command that outputs all the data in one JSON. This is kind of important for easy integration. It doesn't have to be report -format=json, but it can be.

Agreed and agreed.

fxkr · 2019-05-06T06:08:00Z

I would go the opposite way and eventually include a histogram in the default report output. Having it in a separate command kind of implies it's something not normally needed. Whereas in reality averages alone can be misleading, so imo the histogram should always be shown at least for a quick look.

It's is because averages can be misleading that the default text report includes percentiles. Do you think that many users fail to interpret percentiles, but would instead succeed with a histogram?

I've read that article before. I was actually wondering why vegeta only outputs the 99th percentile, given that (taking the numbers from there) 18% of users will see worse than the 99 .9th percentile. I suppose it's in the region where we need to be careful with introducing errors (e.g. due to GC).

Just to be clear, I wasn't arguing against percentiles, I was just arguing for adding histograms (and keeping the percentiles). I think percentiles and histograms serve different purposes.

Percentiles are an easier to track metric, and are easier to relate to end user experience. From a blackbox "how good is this system?" perspective, percentiles might even be fine.

Histograms help you understand the system. I know I find it hard to interpret percentiles if I know nothing over the distribution. From a few percentiles I can't even tell a single log-normal distribution from two overlaid ones (e.g. due to a cache that hits in x% cases. more motivation). So the histogram is useful from a whitebox "how do I tune/fix this system?" perspective.

(It would be different if we could look at the CDF, ie "all percentiles as a graph". In terms of information that's equivalent to the histogram. I find them less intuitive than histograms.)

How are you envisioning this standard report output? Could you show me an illustrated example?

Just all the sections one after another.

I wasn't asking to really change much in the format. What I meant was: if we have a single good output that doesn't rely on customization (via subcommands and parameters, e.g. to enable/disable individual sections) then most vegeta reports (incl. those copy/pasted by other people) will look similar. If we encourage users to parametrize the command, the reader will have to visually "parse" the output again every time.

Requests      [total, rate]            10000, 1000.08
Duration      [total, attack, wait]    9.999650299s, 9.999204067s, 446.232µs
Latencies     [mean, 50, 95, 99, max]  508.506µs, 505.419µs, 618.511µs, 733.938µs, 12.052322ms
Bytes In      [total, mean]            5159484, 515.95
Bytes Out     [total, mean]            0, 0.00
Success       [ratio]                  99.99%
Status Codes  [code:count]             0:1  200:9999  

Bucket         #     %       Histogram
[0s,    1ms]   9952  99.52%  ###########################################################################
[1ms,   2ms]   40    0.40%   ##############################
[2ms,   4ms]   6     0.06%   ##############
[4ms,   8ms]   1     0.01%   
[8ms,   +Inf]  1     0.01%   

Error Set:
Get http://localhost:8000: net/http: timeout awaiting response headers

Automatic binning, then, seems like the way to go. How would you approach this?

I have a rough draft in my fork (branch/commit)

Usage: vegeta report -type=hist. It's only extending the hist report for now, I didn't touch the text report yet.

The problem I'm running into is vegeta has structs in the public API, which means I have to resort to conditionals instead of polymorphism if I want to have a Histogram that manages its own buckets. If Histogram was an Interface this would not be a problem. How attached are you to API compatibility (relative to clean code)?

I picked base 2 and didn't add a way to customize the base, which is probably ok.

It also includes another bugfix/feature that I should probably split out and propose separately (I'm rescaling the bars of the histogram so that the largest bar fits the available width.)

I was also wondering if linear search is best to find the bucket, but haven't benchmarked this yet. Quite possible binary search is slower since len(buckets) is smallish. Anyway that's also a pre-existing issue, so a fine micro-optimization project for later.

I would do a separate PR for the auto log binning after we merge this one here, but it's still good that we're discussing both here until we have an acceptable CLI.

tsenart

I like the proposed ideas to extend the default text and JSON report to include a histogram by default. Will you follow up with a PR for that? If we do that, we should deprecate the independent hist report.

tsenart · 2019-05-12T11:32:35Z

README.md

@@ -363,6 +363,8 @@ Options:
  --type    Which report type to generate (text | json | hist[buckets]).
            [default: text]

+  --hist    Histogram buckets, e.g.: '[0,1ms,10ms]'


I would call this flag --buckets instead and not make the output of the actual histogram optional. It should always be included in both the JSON and text report, with the --buckets value determined automatically from the data set when not specified. This, however, would mean that the algorithm for automatic binning must be appropriate for large data sets (i.e. worse-case O(n)).

Ok, I'll change it to --buckets (won't get to it today, but soon).

I'll do the change to enable the histogram by default using automatic bins in a follow up PR once this one is in.

This, however, would mean that the algorithm for automatic binning must be appropriate for large data sets (i.e. worse-case O(n)).

Yes. It's already O(num of samples), and num of bucket is "small" (<=16 in practice, <= 64 technically) so that's ok too.

Your comment did make me realize I can just preallocate 64 buckets and remove the extension logic.

I wanted to benchmark the bucket search code, but with 16 buckets I expect linear search will be close to ideal.

tsenart · 2019-05-12T11:38:58Z

report.go

+			if len(typ) < 6 {
+				return fmt.Errorf("bad buckets: '%s'", typ[4:])
+			}
+			histStr = typ[4:]


Let's return an error saying that specifying the buckets via the --type flag is deprecated and that one should use the --buckets flag instead.

fxkr · 2019-05-12T22:55:40Z

I like the proposed ideas to extend the default text and JSON report to include a histogram by default. Will you follow up with a PR for that? If we do that, we should deprecate the independent hist report.

Yes, I'll do a follow up PR.

fxkr · 2019-05-12T23:21:14Z

Having used the histogram data in the JSON a bit, I now feel that using an array of bucket counts was a huge mistake. I keep second guessing what the bins were in old JSON output. I think I will switch to using a dict. This will also work better with the automatic bins.

fxkr · 2019-05-12T23:24:13Z

Change --hist to --buckets
In JSON output, change array of bucket counts to map[latency]count
Add deprecation warning to --type=hist ---> update: no, will be done only when text report includes histogram

tsenart · 2019-05-13T09:12:31Z

Add deprecation warning to --type=hist

Maybe the deprecation warning should only be added once we have the text report with the histogram included.

ruiaraujo · 2019-05-17T21:14:25Z

Would it be an option to use the HDR histogram log format so that a lot of visualization tooling for HDR can be used?

lib/histogram.go

fxkr · 2019-05-28T05:17:25Z

Would it be an option to use the HDR histogram log format so that a lot of visualization tooling for HDR can be used?

I like the idea, but I would say it's out of scope for this PR. I had a brief look at the source code (it appears there's no actual spec for the format) and it would definitely need to be implemented completely separately from vegeta's proprietary JSON format. I don't know what @tsenart thinks about it, but I would suggest filing a separate feature request issue for this.

fxkr · 2019-05-28T05:20:13Z

@tsenart Apologies for the delay! Here's the new changeset. Functionally I'd say it's complete I would do separate PR's for more features (automatic bins, extending text report etc.). Let me know if that works for you or if you find anything wrong with the code. Thanks!

tsenart · 2019-06-16T11:50:30Z

Would it be an option to use the HDR histogram log format so that a lot of visualization tooling for HDR can be used?

Thanks for bringing this to our attention! I wasn't familiar with it. I think it's a great idea to make Vegeta able to output this.

d it would definitely need to be implemented completely separately from vegeta's proprietary JSON format.

Veget'a JSON format is not proprietary. Did you mean "custom"? :-)

README.md

ruiaraujo · 2019-06-16T18:18:19Z

@tsenart I have a fork that outputs the format I described, I will clean it up and propose a PR later.
The main issue is that the golang implementation of hdr histogram is deprecated with no replacement.

fxkr · 2019-06-17T05:00:56Z

Veget'a JSON format is not proprietary. Did you mean "custom"? :-)

Yes; that was badly worded. I meant "non-standard", to contrast with the hdrhistogram format (which is somewhat less non-standard in the way that some other tools probably use it, even though it's still totally not a standard)

tsenart · 2019-06-17T19:09:19Z

@ruiaraujo, @fxkr: Mind to take a look? #413

Closes tsenart#394

fxkr · 2019-06-24T02:11:10Z

Rebased onto master and extended README with description of bucket field in JSON:

+In the `buckets` field, each key is a nanosecond value representing the lower bound of a bucket.
+The upper bound is implied by the next higher bucket.
+Upper bounds are non-inclusive.
+The highest bucket is the overflow bucket; it has no upper bound.
+The values are counts of how many requests fell into that particular bucket.

fxkr requested review from tsenart and xla as code owners April 24, 2019 07:39

tsenart reviewed May 12, 2019

View reviewed changes

fxkr force-pushed the hist-in-json branch from 471ff84 to 7b2b51e Compare May 28, 2019 05:10

golangcibot reviewed May 28, 2019

View reviewed changes

lib/histogram.go Outdated Show resolved Hide resolved

fxkr force-pushed the hist-in-json branch from 7b2b51e to bcfcc1c Compare May 28, 2019 05:13

tsenart reviewed Jun 16, 2019

View reviewed changes

README.md Show resolved Hide resolved

tsenart previously approved these changes Jun 16, 2019

View reviewed changes

fxkr added 2 commits June 23, 2019 19:08

report: Add -buckets option for -type=json

4e4bc0f

Closes tsenart#394

report: Add support for -buckets option for -type=hist too

9bacac8

fxkr dismissed tsenart’s stale review via 9bacac8 June 24, 2019 02:09

fxkr force-pushed the hist-in-json branch from 36bae01 to 9bacac8 Compare June 24, 2019 02:09

Merge branch 'master' into hist-in-json

37d4c6f

tsenart approved these changes Jun 24, 2019

View reviewed changes

tsenart merged commit 2870712 into tsenart:master Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include histogram in JSON output #395

Include histogram in JSON output #395

fxkr commented Apr 24, 2019

tsenart commented Apr 28, 2019

fxkr commented Apr 29, 2019

tsenart commented May 5, 2019

fxkr commented May 6, 2019

tsenart left a comment

tsenart May 12, 2019

fxkr May 12, 2019

tsenart May 12, 2019

fxkr May 12, 2019

fxkr commented May 12, 2019 •

edited

Loading

fxkr commented May 12, 2019

fxkr commented May 12, 2019 •

edited

Loading

tsenart commented May 13, 2019

ruiaraujo commented May 17, 2019

fxkr commented May 28, 2019

fxkr commented May 28, 2019

tsenart commented Jun 16, 2019

ruiaraujo commented Jun 16, 2019

fxkr commented Jun 17, 2019

tsenart commented Jun 17, 2019

fxkr commented Jun 24, 2019

Include histogram in JSON output #395

Include histogram in JSON output #395

Conversation

fxkr commented Apr 24, 2019

Background

Checklist

tsenart commented Apr 28, 2019

fxkr commented Apr 29, 2019

tsenart commented May 5, 2019

fxkr commented May 6, 2019

tsenart left a comment

Choose a reason for hiding this comment

tsenart May 12, 2019

Choose a reason for hiding this comment

fxkr May 12, 2019

Choose a reason for hiding this comment

tsenart May 12, 2019

Choose a reason for hiding this comment

fxkr May 12, 2019

Choose a reason for hiding this comment

fxkr commented May 12, 2019 • edited Loading

fxkr commented May 12, 2019

fxkr commented May 12, 2019 • edited Loading

tsenart commented May 13, 2019

ruiaraujo commented May 17, 2019

fxkr commented May 28, 2019

fxkr commented May 28, 2019

tsenart commented Jun 16, 2019

ruiaraujo commented Jun 16, 2019

fxkr commented Jun 17, 2019

tsenart commented Jun 17, 2019

fxkr commented Jun 24, 2019

fxkr commented May 12, 2019 •

edited

Loading

fxkr commented May 12, 2019 •

edited

Loading