Improve CKMS performance #21

blt · 2017-12-11T18:24:23Z

This series of commits improves the baseline CKMS insertion performance by about 80%. This work was inspired by the needs of postmates/cernan users who set the Prometheus sink summary window to longer than a few seconds, filling a CKMS with > 100k points. This resulted in hangups from the Prometheus aggregator on account of the long delay in responding to it.

The interface for CKMS has slightly changed: we no longer provide CKMS::sum. This is done because it was not clear what the behaviour of sum should be at the wrapping boundary of the input type and was felt that this is better handled by users requiring sum. We retain the continuous moving average of inputs.

Related reading: https://users.rust-lang.org/t/how-can-i-optimize-this-data-structure/14273/

This commit introduces the [criterion](https://github.com/japaric/criterion.rs) library into quantiles. The intention is to tackle CKMS performance again. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit cuts the size of Entry a by 64 bits, shaving milliseconds off the 10k benchmark run. We're still waaaay too slow for 65k to really notice a good difference there. Signed-off-by: Brian L. Troutwine <[email protected]>

Time to wrap it up. I'll have to shop this around to folks that have a better handle on wacky tricks and/or know of a DS of greater applicability than Vec. I am stuck for sure. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit is inspired by the discussion [here](https://users.rust-lang.org/t/how-can-i-optimize-this-data-structure/14273/7). I'm not sure yet how well it does, benchmark wise. Signed-off-by: Brian L. Troutwine <[email protected]>

In this commit we're set to run comparative benchmarks for Vec and Store. The results were not promising, with Vec mean being 493ms and Store mean being 546. Better SD and MAD on Vec too. The computation of 'r' can be stored at the top of each inner store and we can keep a sorted buffer for insertion. I guess the thing to do is benchmark Vec against Store, see where we end up. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit moves CKMS fully onto store with the understanding that the current implementation is pokier than just plain Vec on account of iteration on insertion. The expectation is that we can resolve this. Benchmarking -- not shown -- indicates that Store can do inserts about twice as fast as Vec _if_ we don't have to seek over the whole thing before inserting. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit goes quite a ways toward making 'stores' a proper backing store for CKMS. We just need AddAssign going again. Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

I've removed sum from the interface of CKMS because if the type is especially small I don't have a wrapping_add available to me. Oops. Up to the client to keep track of this, as a result. Well, maybe if I look into Wrapping? Anyway! We're now on a good footing to benchmark this thing. Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

blt · 2017-12-11T18:25:11Z

@tsantero @pulltab this build will fail since I've incorporated an unstable benchmarking library. The code is ready for review, that said. I'll correct the benchmarking build failure.

I've backed off the use of criterion for now since I couldn't figure out how to slot it into our existing macro-based benchmark generation. This _should_ now pass CI. Signed-off-by: Brian L. Troutwine <[email protected]>

This commit adds the last bit of polish to store.rs in terms of commenting on its function. We also introduce fuzz tests for inspecting the CKMS implementation. -max_len=160 ought to be in place for the existing fuzz test. Signed-off-by: Brian L. Troutwine <[email protected]>

blt · 2017-12-11T22:47:07Z

For future reverence:

Benchmark `e027030`

test ckms::f32::bench_insert_100    ... bench:       6,293 ns/iter (+/- 1,816)
test ckms::f32::bench_insert_1000   ... bench:     220,828 ns/iter (+/- 31,953)
test ckms::f32::bench_insert_10000  ... bench:   8,873,920 ns/iter (+/- 929,011)
test ckms::f32::bench_insert_100000 ... bench: 146,735,764 ns/iter (+/- 6,005,790)
test ckms::f32::bench_primed_100    ... bench:       5,842 ns/iter (+/- 1,179)
test ckms::f32::bench_primed_1000   ... bench:       6,151 ns/iter (+/- 1,637)
test ckms::f32::bench_primed_10000  ... bench:       6,382 ns/iter (+/- 1,467)
test ckms::f32::bench_primed_65535  ... bench:       5,478 ns/iter (+/- 1,085)
test ckms::u16::bench_insert_100    ... bench:       6,118 ns/iter (+/- 871)
test ckms::u16::bench_insert_1000   ... bench:     182,911 ns/iter (+/- 27,109)
test ckms::u16::bench_insert_10000  ... bench:   8,028,145 ns/iter (+/- 787,948)
test ckms::u16::bench_insert_65535  ... bench:  82,724,569 ns/iter (+/- 5,650,221)
test ckms::u16::bench_primed_100    ... bench:       4,718 ns/iter (+/- 884)
test ckms::u16::bench_primed_1000   ... bench:       4,657 ns/iter (+/- 1,065)
test ckms::u16::bench_primed_10000  ... bench:       5,817 ns/iter (+/- 1,219)
test ckms::u16::bench_primed_65535  ... bench:       4,740 ns/iter (+/- 831)
test ckms::u32::bench_insert_100    ... bench:       6,041 ns/iter (+/- 1,248)
test ckms::u32::bench_insert_1000   ... bench:     177,669 ns/iter (+/- 24,077)
test ckms::u32::bench_insert_10000  ... bench:   8,035,107 ns/iter (+/- 792,884)
test ckms::u32::bench_insert_100000 ... bench: 135,404,387 ns/iter (+/- 5,621,009)
test ckms::u32::bench_primed_100    ... bench:       4,705 ns/iter (+/- 861)
test ckms::u32::bench_primed_1000   ... bench:       4,610 ns/iter (+/- 1,245)
test ckms::u32::bench_primed_10000  ... bench:       5,895 ns/iter (+/- 1,469)
test ckms::u32::bench_primed_65535  ... bench:       4,698 ns/iter (+/- 1,205)

Benchmark `0c6563c` (0.6.3)

test ckms::u16::bench_insert_100    ... bench:       7,191 ns/iter (+/- 864)
test ckms::u16::bench_insert_1000   ... bench:     273,136 ns/iter (+/- 38,620)
test ckms::u16::bench_insert_10000  ... bench:  23,441,375 ns/iter (+/- 2,057,169)
test ckms::u16::bench_insert_65535  ... bench: 772,036,814 ns/iter (+/- 24,403,538)
test ckms::u16::bench_primed_100    ... bench:     214,393 ns/iter (+/- 299,768)
test ckms::u16::bench_primed_1000   ... bench:     173,827 ns/iter (+/- 259,229)
test ckms::u16::bench_primed_10000  ... bench:     182,952 ns/iter (+/- 292,019)
test ckms::u16::bench_primed_65535  ... bench:     180,036 ns/iter (+/- 299,064)
test ckms::u32::bench_insert_100    ... bench:       6,820 ns/iter (+/- 836)
test ckms::u32::bench_insert_1000   ... bench:     272,598 ns/iter (+/- 36,300)
test ckms::u32::bench_insert_10000  ... bench:  23,119,329 ns/iter (+/- 1,618,310)
test ckms::u32::bench_insert_100000 ... bench: 1,705,918,483 ns/iter (+/- 58,434,067)
test ckms::u32::bench_primed_100    ... bench:     176,282 ns/iter (+/- 292,272)
test ckms::u32::bench_primed_1000   ... bench:     183,390 ns/iter (+/- 313,832)
test ckms::u32::bench_primed_10000  ... bench:     182,736 ns/iter (+/- 303,547)
test ckms::u32::bench_primed_65535  ... bench:     182,674 ns/iter (+/- 321,044)

Brian L. Troutwine added 10 commits December 6, 2017 19:51

Introduce criterion benchmarks for CKMS

200743c

This commit introduces the [criterion](https://github.com/japaric/criterion.rs) library into quantiles. The intention is to tackle CKMS performance again. Signed-off-by: Brian L. Troutwine <[email protected]>

Fiddle some with the performance characteristics

6153de3

This commit cuts the size of Entry a by 64 bits, shaving milliseconds off the 10k benchmark run. We're still waaaay too slow for 65k to really notice a good difference there. Signed-off-by: Brian L. Troutwine <[email protected]>

Finish adding documentation for the night

504e378

Time to wrap it up. I'll have to shop this around to folks that have a better handle on wacky tricks and/or know of a DS of greater applicability than Vec. I am stuck for sure. Signed-off-by: Brian L. Troutwine <[email protected]>

Introduce 'Store'

1ff7fc6

This commit is inspired by the discussion [here](https://users.rust-lang.org/t/how-can-i-optimize-this-data-structure/14273/7). I'm not sure yet how well it does, benchmark wise. Signed-off-by: Brian L. Troutwine <[email protected]>

WIP commit

7d82205

This commit goes quite a ways toward making 'stores' a proper backing store for CKMS. We just need AddAssign going again. Signed-off-by: Brian L. Troutwine <[email protected]>

Re-introduce AddAssign to CKMS

07a1f53

Signed-off-by: Brian L. Troutwine <[email protected]>

Switch benchmark back to 65k.

bc906d9

Signed-off-by: Brian L. Troutwine <[email protected]>

blt requested review from tsantero and pulltab December 11, 2017 18:24

Brian L. Troutwine added 2 commits December 11, 2017 11:36

Correct benchmarking

2193949

I've backed off the use of criterion for now since I couldn't figure out how to slot it into our existing macro-based benchmark generation. This _should_ now pass CI. Signed-off-by: Brian L. Troutwine <[email protected]>

pulltab approved these changes Dec 11, 2017

View reviewed changes

tsantero approved these changes Dec 11, 2017

View reviewed changes

blt merged commit 08b4e46 into master Dec 11, 2017

blt deleted the benchmarking_ckms branch December 11, 2017 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CKMS performance #21

Improve CKMS performance #21

blt commented Dec 11, 2017

blt commented Dec 11, 2017

blt commented Dec 11, 2017

Improve CKMS performance #21

Improve CKMS performance #21

Conversation

blt commented Dec 11, 2017

blt commented Dec 11, 2017

blt commented Dec 11, 2017

Benchmark e027030

Benchmark 0c6563c (0.6.3)

Benchmark `e027030`

Benchmark `0c6563c` (0.6.3)