Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a performance benchmark test for the Buffer implementation #5474

Merged
merged 5 commits into from
Jan 11, 2019
Merged

add a performance benchmark test for the Buffer implementation #5474

merged 5 commits into from
Jan 11, 2019

Conversation

brian-pane
Copy link
Contributor

@brian-pane brian-pane commented Jan 4, 2019

Description:
Add a benchmark test for the commonly used operations in the Buffer implementation.

Risk Level: low

Testing:
Test output:

2019-01-10 12:51:22
Running bazel-bin/test/common/buffer/buffer_speed_test
Run on (12 X 3500 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 262K (x6)
  L3 Unified 12582K (x1)
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BufferCreateEmpty                        81 ns         81 ns    8480942
BufferCreate/1                          168 ns        168 ns    4132183
BufferCreate/4096                       252 ns        252 ns    2689329
BufferCreate/16384                      664 ns        664 ns    1083776
BufferCreate/65536                     3886 ns       3883 ns     188950
BufferAddSmallIncrement/1                14 ns         14 ns   49352078
BufferAddSmallIncrement/2                14 ns         14 ns   50098766
BufferAddSmallIncrement/3                14 ns         14 ns   50146140
BufferAddSmallIncrement/4                14 ns         14 ns   50235749
BufferAddSmallIncrement/5                14 ns         14 ns   50178132
BufferAddString/1                        14 ns         14 ns   48137094
BufferAddString/4096                    244 ns        244 ns    2886979
BufferAddString/16384                   808 ns        807 ns     863515
BufferAddString/65536                  3952 ns       3950 ns     177855
BufferAddBuffer/1                        37 ns         37 ns   19130440
BufferAddBuffer/4096                    272 ns        272 ns    2556573
BufferAddBuffer/16384                   850 ns        850 ns     807214
BufferAddBuffer/65536                  4036 ns       4034 ns     174159
BufferPrependString/1                    14 ns         14 ns   49731097
BufferPrependString/4096                307 ns        307 ns    2263475
BufferPrependString/16384               887 ns        887 ns     785784
BufferPrependString/65536              3673 ns       3671 ns     188100
BufferPrependBuffer/1                   727 ns        725 ns    1024125
BufferPrependBuffer/4096                201 ns        201 ns    3436359
BufferPrependBuffer/16384               200 ns        200 ns    3438756
BufferPrependBuffer/65536               208 ns        208 ns    3273904
BufferDrain/1                            41 ns         41 ns   16859467
BufferDrain/4096                        196 ns        196 ns    3588529
BufferDrain/16384                       618 ns        617 ns    1117211
BufferDrain/65536                      3851 ns       3849 ns     181239
BufferDrainSmallIncrement/1              12 ns         12 ns   58796848
BufferDrainSmallIncrement/2              12 ns         12 ns   58485395
BufferDrainSmallIncrement/3              12 ns         12 ns   58747010
BufferDrainSmallIncrement/4              12 ns         12 ns   58175290
BufferDrainSmallIncrement/5              12 ns         12 ns   56670984
BufferMove/1                             48 ns         48 ns   14357885
BufferMove/4096                          49 ns         49 ns   14517972
BufferMove/16384                         48 ns         48 ns   14520562
BufferMove/65536                         48 ns         48 ns   14517219
BufferMovePartial/1                      63 ns         63 ns   11046236
BufferMovePartial/4096               121358 ns     121315 ns       5735
BufferMovePartial/16384              482824 ns     482610 ns       1445
BufferMovePartial/65536             1939180 ns    1938159 ns        359
BufferReserveCommit/1                   174 ns        174 ns    3961158
BufferReserveCommit/4096                162 ns        162 ns    4310823
BufferReserveCommit/16384               159 ns        159 ns    4378585
BufferReserveCommit/65536               952 ns        952 ns     729273
BufferReserveCommitPartial/1             25 ns         25 ns   28028814
BufferReserveCommitPartial/4096          27 ns         27 ns   25523226
BufferReserveCommitPartial/16384         27 ns         27 ns   25601451
BufferReserveCommitPartial/65536         27 ns         27 ns   25651173
BufferLinearizeSimple/1                  97 ns         97 ns    7133903
BufferLinearizeSimple/4096               97 ns         97 ns    7122145
BufferLinearizeSimple/16384              97 ns         97 ns    7073565
BufferLinearizeSimple/65536              97 ns         97 ns    7156147
BufferLinearizeGeneral/1                100 ns        100 ns    6975168
BufferLinearizeGeneral/4096             725 ns        724 ns     945499
BufferLinearizeGeneral/16384           2699 ns       2698 ns     257592
BufferLinearizeGeneral/65536          11826 ns      11817 ns      58621
BufferSearch/1                           35 ns         35 ns   20259964
BufferSearch/4096                       128 ns        128 ns    5480439
BufferSearch/16384                      389 ns        389 ns    1792849
BufferSearch/65536                     1800 ns       1798 ns     385526
BufferSearchPartialMatch/1              887 ns        887 ns     776837
BufferSearchPartialMatch/4096         80482 ns      80460 ns       8597
BufferSearchPartialMatch/16384       324251 ns     324126 ns       2182
BufferSearchPartialMatch/65536      1287507 ns    1286964 ns        529

Docs Changes: N/A
Release Notes: N/A
Related Issues: #4952
Related Pull Requests: 5441

Signed-off-by: Brian Pane [email protected]

@brian-pane
Copy link
Contributor Author

@jmarantz once this passes review, I'll use it to generate before/after performance data for 5441.

@jmarantz jmarantz self-assigned this Jan 4, 2019
@brian-pane
Copy link
Contributor Author

@jmarantz are there any other changes needed in this before it's commit-worthy?

For what it's worth, I've been using this PR in its current form while iterating on my buffer rewrite, and it has been quite effective at detecting regressions so far.

jmarantz
jmarantz previously approved these changes Jan 9, 2019
@jmarantz
Copy link
Contributor

jmarantz commented Jan 9, 2019

@alyssawilk can you take a look as (a) senior committer and (b) someone who knows where the performance gotchas might be around buffer code.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a couple of thoughts on my end:


// On each iteration of the benchmark, add N bytes and drain a multiple of N, as specified
// by DrainCycleRatios. This exercises full-slice, partial-slice, and multi-slice code paths
// in the Buffer's drain implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts around this one is that I think it covers the behaviors you mentioned but I can imagine implementations where it might not tickle perf bugs.

I've seen a bunch of classes where resizing got super inefficient when you cross some arbitrary threshhold (usually powers of 2 but YNK) due to high-low resizing issues. Do you think it might be worthwhile to have an add/remove test which added/removed a large number of "random" (fixed seed for reproducability but try many values tested within a benchmark) adds and drains to make sure there aren't expensive spurious copies / allocations as we add and remove data?

The common case for proxying is we're getting random header and body chunks from the network and parsing some subset as we pass things through the chain, so I think it'd be useful for the tests here and below. Before you do any work I'd be interested in @jmarantz thoughts on useful vs overkill as I think he does more perf testing than I.

// by DrainCycleRatios. This exercises full-slice, partial-slice, and multi-slice code paths
// in the Buffer's drain implementation.
constexpr size_t DrainCycleSize = 7;
constexpr double DrainCycleRatios[DrainCycleSize] = {0.0, 1.5, 1, 1.5, 0, 2.0, 1.0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely like to see trickle-add and trickle-drain tests added - add one byte at a time many many times, remove one byte at a time many many times. Should be easy enough and while I hope we never have a buffer impl that sucks for those trivial use cases I think it's worth covering :-)

Copy link
Contributor Author

@brian-pane brian-pane Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Currently, the BufferMovePartial test exercises the moving of a byte at a time from one buffer to another, which happens to exercise the add/drain code paths today but might not in the future (if we ever add reference-counting to do zero-copy moves between buffers, for example). So I'll add benchmark tests that directly exercise 1-byte adds and drains.

BENCHMARK(BufferSearchPartialMatch)->Arg(1)->Arg(4096)->Arg(16384)->Arg(65536);

} // namespace Envoy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are some functions we're not testing which are in upstream envoy. linearize, prepend etc. They're not in the common path but it'd be good to still cover them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm having a little trouble understanding github's comment history, but can you update the PR description to include stats from your new tests, you haven't already done that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added the latest benchmark output to the PR description.

Signed-off-by: Brian Pane <[email protected]>
@jmarantz
Copy link
Contributor

Thanks. It's surprising that BufferMovePartial/65536 takes 2ms but this is the existing condition, and maybe your new impl will improve it :)

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that's all I can think of. Hopefully between the two of us we got it all :-P

Josh, haven't seen your LGTM on the latest so I'll let you merge if you're happy with the updates.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- just waiting for CI to finish.

@jmarantz jmarantz merged commit 695f50c into envoyproxy:master Jan 11, 2019
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
…proxy#5474)

* add a performance benchmark test for the Buffer implementation

Signed-off-by: Brian Pane <[email protected]>
Signed-off-by: Fred Douglas <[email protected]>
@brian-pane brian-pane deleted the buffer-benchmark branch May 25, 2019 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants