add a performance benchmark test for the Buffer implementation #5474

brian-pane · 2019-01-04T03:29:46Z

Description:
Add a benchmark test for the commonly used operations in the Buffer implementation.

Risk Level: low

Testing:
Test output:

2019-01-10 12:51:22
Running bazel-bin/test/common/buffer/buffer_speed_test
Run on (12 X 3500 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 262K (x6)
  L3 Unified 12582K (x1)
------------------------------------------------------------------------
Benchmark                                 Time           CPU Iterations
------------------------------------------------------------------------
BufferCreateEmpty                        81 ns         81 ns    8480942
BufferCreate/1                          168 ns        168 ns    4132183
BufferCreate/4096                       252 ns        252 ns    2689329
BufferCreate/16384                      664 ns        664 ns    1083776
BufferCreate/65536                     3886 ns       3883 ns     188950
BufferAddSmallIncrement/1                14 ns         14 ns   49352078
BufferAddSmallIncrement/2                14 ns         14 ns   50098766
BufferAddSmallIncrement/3                14 ns         14 ns   50146140
BufferAddSmallIncrement/4                14 ns         14 ns   50235749
BufferAddSmallIncrement/5                14 ns         14 ns   50178132
BufferAddString/1                        14 ns         14 ns   48137094
BufferAddString/4096                    244 ns        244 ns    2886979
BufferAddString/16384                   808 ns        807 ns     863515
BufferAddString/65536                  3952 ns       3950 ns     177855
BufferAddBuffer/1                        37 ns         37 ns   19130440
BufferAddBuffer/4096                    272 ns        272 ns    2556573
BufferAddBuffer/16384                   850 ns        850 ns     807214
BufferAddBuffer/65536                  4036 ns       4034 ns     174159
BufferPrependString/1                    14 ns         14 ns   49731097
BufferPrependString/4096                307 ns        307 ns    2263475
BufferPrependString/16384               887 ns        887 ns     785784
BufferPrependString/65536              3673 ns       3671 ns     188100
BufferPrependBuffer/1                   727 ns        725 ns    1024125
BufferPrependBuffer/4096                201 ns        201 ns    3436359
BufferPrependBuffer/16384               200 ns        200 ns    3438756
BufferPrependBuffer/65536               208 ns        208 ns    3273904
BufferDrain/1                            41 ns         41 ns   16859467
BufferDrain/4096                        196 ns        196 ns    3588529
BufferDrain/16384                       618 ns        617 ns    1117211
BufferDrain/65536                      3851 ns       3849 ns     181239
BufferDrainSmallIncrement/1              12 ns         12 ns   58796848
BufferDrainSmallIncrement/2              12 ns         12 ns   58485395
BufferDrainSmallIncrement/3              12 ns         12 ns   58747010
BufferDrainSmallIncrement/4              12 ns         12 ns   58175290
BufferDrainSmallIncrement/5              12 ns         12 ns   56670984
BufferMove/1                             48 ns         48 ns   14357885
BufferMove/4096                          49 ns         49 ns   14517972
BufferMove/16384                         48 ns         48 ns   14520562
BufferMove/65536                         48 ns         48 ns   14517219
BufferMovePartial/1                      63 ns         63 ns   11046236
BufferMovePartial/4096               121358 ns     121315 ns       5735
BufferMovePartial/16384              482824 ns     482610 ns       1445
BufferMovePartial/65536             1939180 ns    1938159 ns        359
BufferReserveCommit/1                   174 ns        174 ns    3961158
BufferReserveCommit/4096                162 ns        162 ns    4310823
BufferReserveCommit/16384               159 ns        159 ns    4378585
BufferReserveCommit/65536               952 ns        952 ns     729273
BufferReserveCommitPartial/1             25 ns         25 ns   28028814
BufferReserveCommitPartial/4096          27 ns         27 ns   25523226
BufferReserveCommitPartial/16384         27 ns         27 ns   25601451
BufferReserveCommitPartial/65536         27 ns         27 ns   25651173
BufferLinearizeSimple/1                  97 ns         97 ns    7133903
BufferLinearizeSimple/4096               97 ns         97 ns    7122145
BufferLinearizeSimple/16384              97 ns         97 ns    7073565
BufferLinearizeSimple/65536              97 ns         97 ns    7156147
BufferLinearizeGeneral/1                100 ns        100 ns    6975168
BufferLinearizeGeneral/4096             725 ns        724 ns     945499
BufferLinearizeGeneral/16384           2699 ns       2698 ns     257592
BufferLinearizeGeneral/65536          11826 ns      11817 ns      58621
BufferSearch/1                           35 ns         35 ns   20259964
BufferSearch/4096                       128 ns        128 ns    5480439
BufferSearch/16384                      389 ns        389 ns    1792849
BufferSearch/65536                     1800 ns       1798 ns     385526
BufferSearchPartialMatch/1              887 ns        887 ns     776837
BufferSearchPartialMatch/4096         80482 ns      80460 ns       8597
BufferSearchPartialMatch/16384       324251 ns     324126 ns       2182
BufferSearchPartialMatch/65536      1287507 ns    1286964 ns        529

Docs Changes: N/A
Release Notes: N/A
Related Issues: #4952
Related Pull Requests: 5441

Signed-off-by: Brian Pane [email protected]

Signed-off-by: Brian Pane <[email protected]>

brian-pane · 2019-01-04T06:31:02Z

@jmarantz once this passes review, I'll use it to generate before/after performance data for 5441.

test/common/buffer/buffer_speed_test.cc

Signed-off-by: Brian Pane <[email protected]>

… drain cases Signed-off-by: Brian Pane <[email protected]>

brian-pane · 2019-01-09T01:05:59Z

@jmarantz are there any other changes needed in this before it's commit-worthy?

For what it's worth, I've been using this PR in its current form while iterating on my buffer rewrite, and it has been quite effective at detecting regressions so far.

jmarantz · 2019-01-09T22:35:45Z

@alyssawilk can you take a look as (a) senior committer and (b) someone who knows where the performance gotchas might be around buffer code.

alyssawilk

Looks great! Just a couple of thoughts on my end:

alyssawilk · 2019-01-10T15:06:01Z

test/common/buffer/buffer_speed_test.cc

+
+  // On each iteration of the benchmark, add N bytes and drain a multiple of N, as specified
+  // by DrainCycleRatios. This exercises full-slice, partial-slice, and multi-slice code paths
+  // in the Buffer's drain implementation.


My thoughts around this one is that I think it covers the behaviors you mentioned but I can imagine implementations where it might not tickle perf bugs.

I've seen a bunch of classes where resizing got super inefficient when you cross some arbitrary threshhold (usually powers of 2 but YNK) due to high-low resizing issues. Do you think it might be worthwhile to have an add/remove test which added/removed a large number of "random" (fixed seed for reproducability but try many values tested within a benchmark) adds and drains to make sure there aren't expensive spurious copies / allocations as we add and remove data?

The common case for proxying is we're getting random header and body chunks from the network and parsing some subset as we pass things through the chain, so I think it'd be useful for the tests here and below. Before you do any work I'd be interested in @jmarantz thoughts on useful vs overkill as I think he does more perf testing than I.

alyssawilk · 2019-01-10T15:06:25Z

test/common/buffer/buffer_speed_test.cc

+  // by DrainCycleRatios. This exercises full-slice, partial-slice, and multi-slice code paths
+  // in the Buffer's drain implementation.
+  constexpr size_t DrainCycleSize = 7;
+  constexpr double DrainCycleRatios[DrainCycleSize] = {0.0, 1.5, 1, 1.5, 0, 2.0, 1.0};


I'd definitely like to see trickle-add and trickle-drain tests added - add one byte at a time many many times, remove one byte at a time many many times. Should be easy enough and while I hope we never have a buffer impl that sucks for those trivial use cases I think it's worth covering :-)

Sounds good to me. Currently, the BufferMovePartial test exercises the moving of a byte at a time from one buffer to another, which happens to exercise the add/drain code paths today but might not in the future (if we ever add reference-counting to do zero-copy moves between buffers, for example). So I'll add benchmark tests that directly exercise 1-byte adds and drains.

alyssawilk · 2019-01-10T15:31:54Z

test/common/buffer/buffer_speed_test.cc

+BENCHMARK(BufferSearchPartialMatch)->Arg(1)->Arg(4096)->Arg(16384)->Arg(65536);
+
+} // namespace Envoy
+


looks like there are some functions we're not testing which are in upstream envoy. linearize, prepend etc. They're not in the common path but it'd be good to still cover them.

i'm having a little trouble understanding github's comment history, but can you update the PR description to include stats from your new tests, you haven't already done that?

Ok, I added the latest benchmark output to the PR description.

Signed-off-by: Brian Pane <[email protected]>

jmarantz · 2019-01-10T21:10:36Z

Thanks. It's surprising that BufferMovePartial/65536 takes 2ms but this is the existing condition, and maybe your new impl will improve it :)

alyssawilk

Cool, that's all I can think of. Hopefully between the two of us we got it all :-P

Josh, haven't seen your LGTM on the latest so I'll let you merge if you're happy with the updates.

jmarantz

lgtm -- just waiting for CI to finish.

…proxy#5474) * add a performance benchmark test for the Buffer implementation Signed-off-by: Brian Pane <[email protected]> Signed-off-by: Fred Douglas <[email protected]>

add a performance benchmark test for the Buffer implementation

1b0fb3e

Signed-off-by: Brian Pane <[email protected]>

jmarantz reviewed Jan 4, 2019

View reviewed changes

test/common/buffer/buffer_speed_test.cc Show resolved Hide resolved

jmarantz self-assigned this Jan 4, 2019

brian-pane added 2 commits January 4, 2019 07:43

add a buffer search benchmark

fb2ced6

Signed-off-by: Brian Pane <[email protected]>

add a benchmark that tests the various aligned and non-aligned buffer…

f8a1e54

… drain cases Signed-off-by: Brian Pane <[email protected]>

jmarantz previously approved these changes Jan 9, 2019

View reviewed changes

alyssawilk reviewed Jan 10, 2019

View reviewed changes

updates based on code review feedback

aec3d9e

Signed-off-by: Brian Pane <[email protected]>

brian-pane dismissed jmarantz’s stale review via aec3d9e January 10, 2019 20:43

one more cleanup

241b4cc

Signed-off-by: Brian Pane <[email protected]>

alyssawilk approved these changes Jan 10, 2019

View reviewed changes

jmarantz approved these changes Jan 10, 2019

View reviewed changes

jmarantz merged commit 695f50c into envoyproxy:master Jan 11, 2019

brian-pane deleted the buffer-benchmark branch May 25, 2019 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a performance benchmark test for the Buffer implementation #5474

add a performance benchmark test for the Buffer implementation #5474

brian-pane commented Jan 4, 2019 •

edited

Loading

brian-pane commented Jan 4, 2019

brian-pane commented Jan 9, 2019

jmarantz commented Jan 9, 2019

alyssawilk left a comment

alyssawilk Jan 10, 2019

alyssawilk Jan 10, 2019

brian-pane Jan 10, 2019 •

edited

Loading

alyssawilk Jan 10, 2019

jmarantz Jan 10, 2019

brian-pane Jan 10, 2019

jmarantz commented Jan 10, 2019

alyssawilk left a comment

jmarantz left a comment

		BENCHMARK(BufferSearchPartialMatch)->Arg(1)->Arg(4096)->Arg(16384)->Arg(65536);

		} // namespace Envoy

add a performance benchmark test for the Buffer implementation #5474

add a performance benchmark test for the Buffer implementation #5474

Conversation

brian-pane commented Jan 4, 2019 • edited Loading

brian-pane commented Jan 4, 2019

brian-pane commented Jan 9, 2019

jmarantz commented Jan 9, 2019

alyssawilk left a comment

Choose a reason for hiding this comment

alyssawilk Jan 10, 2019

Choose a reason for hiding this comment

alyssawilk Jan 10, 2019

Choose a reason for hiding this comment

brian-pane Jan 10, 2019 • edited Loading

Choose a reason for hiding this comment

alyssawilk Jan 10, 2019

Choose a reason for hiding this comment

jmarantz Jan 10, 2019

Choose a reason for hiding this comment

brian-pane Jan 10, 2019

Choose a reason for hiding this comment

jmarantz commented Jan 10, 2019

alyssawilk left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

brian-pane commented Jan 4, 2019 •

edited

Loading

brian-pane Jan 10, 2019 •

edited

Loading