gcs: Implement version 2 filters. #1856

davecgh · 2019-08-21T13:27:22Z

This requires #1851, and #1854.

This implements new version 2 filters which have 4 changes as compared to version 1 filters:

Support for independently specifying the false positive rate and Golomb coding bin size which allows minimizing the filter size
A faster (incompatible with version 1) reduction function
A more compact serialization for the number of members in the set
Deduplication of all hash collisions prior to reducing and serializing the deltas

In addition, it adds a full set of tests and updates the benchmarks to use the new version 2 filters.

The primary motivating factor for these changes is the ability to minimize the size of the filters, however, the following is a before and after comparison of version 1 and 2 filters in terms of performance and allocations.

It is interesting to note the results for attempting to match a single item is not very representative due to the fact the actual hash value itself dominates to the point it can significantly vary due to the very low ns timings involved. Those differences average out when matching multiple items, which is the much more realistic scenario, and the performance increase is in line with the expected values. It is also worth nothing that filter construction now takes a bit longer due to the additional deduplication step. While the performance numbers for filter construction are about 25% larger in relative terms, it is only a few ms difference in practice and therefore is an acceptable trade off for the size savings provided.

benchmark                      old ns/op    new ns/op    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      16194920     20279043     +25.22%
BenchmarkFilterBuild100000     32609930     41629998     +27.66%
BenchmarkFilterMatch           620          593          -4.35%
BenchmarkFilterMatchAny        2687         2302         -14.33%

benchmark                      old allocs   new allocs   delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      6            17           +183.33%
BenchmarkFilterBuild100000     6            18           +200.00%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%

benchmark                      old bytes    new bytes    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      688366       2074653      +201.39%
BenchmarkFilterBuild100000     1360064      4132627      +203.86%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%

davecgh · 2019-08-27T11:26:17Z

This has been updated to support deduplication of filter data for the version 2 filters. The PR description and all benchmarks have also been updated accordingly.

For a concrete example of the difference the deduplication can make, consider the following values for block 1096:

v1 filter bytes: 2940
v2 filter bytes before update: 2819
v2 filter bytes after update: 69

dnldd

Looks good.

gcs/bench_test.go

gcs/gcs.go

This implements new version 2 filters which have 4 changes as compared to version 1 filters: - Support for independently specifying the false positive rate and Golomb coding bin size which allows minimizing the filter size - A faster (incompatible with version 1) reduction function - A more compact serialization for the number of members in the set - Deduplication of all hash collisions prior to reducing and serializing the deltas In addition, it adds a full set of tests and updates the benchmarks to use the new version 2 filters. The primary motivating factor for these changes is the ability to minimize the size of the filters, however, the following is a before and after comparison of version 1 and 2 filters in terms of performance and allocations. It is interesting to note the results for attempting to match a single item is not very representative due to the fact the actual hash value itself dominates to the point it can significantly vary due to the very low ns timings involved. Those differences average out when matching multiple items, which is the much more realistic scenario, and the performance increase is in line with the expected values. It is also worth nothing that filter construction now takes a bit longer due to the additional deduplication step. While the performance numbers for filter construction are about 25% larger in relative terms, it is only a few ms difference in practice and therefore is an acceptable trade off for the size savings provided. benchmark old ns/op new ns/op delta ----------------------------------------------------------------- BenchmarkFilterBuild50000 16194920 20279043 +25.22% BenchmarkFilterBuild100000 32609930 41629998 +27.66% BenchmarkFilterMatch 620 593 -4.35% BenchmarkFilterMatchAny 2687 2302 -14.33% benchmark old allocs new allocs delta ----------------------------------------------------------------- BenchmarkFilterBuild50000 6 17 +183.33% BenchmarkFilterBuild100000 6 18 +200.00% BenchmarkFilterMatch 0 0 +0.00% BenchmarkFilterMatchAny 0 0 +0.00% benchmark old bytes new bytes delta ----------------------------------------------------------------- BenchmarkFilterBuild50000 688366 2074653 +201.39% BenchmarkFilterBuild100000 1360064 4132627 +203.86% BenchmarkFilterMatch 0 0 +0.00% BenchmarkFilterMatchAny 0 0 +0.00%

davecgh added this to the 1.5.0 milestone Aug 21, 2019

davecgh force-pushed the gcs_filters_v2 branch 3 times, most recently from 562c2f4 to 518172a Compare August 27, 2019 11:16

dnldd approved these changes Aug 29, 2019

View reviewed changes

dajohi approved these changes Aug 30, 2019

View reviewed changes

matheusd approved these changes Sep 2, 2019

View reviewed changes

jrick reviewed Sep 3, 2019

View reviewed changes

gcs/bench_test.go Show resolved Hide resolved

gcs/gcs.go Show resolved Hide resolved

gcs/gcs.go Outdated Show resolved Hide resolved

gcs/gcs.go Outdated Show resolved Hide resolved

davecgh force-pushed the gcs_filters_v2 branch 2 times, most recently from 5285405 to ec73c7f Compare September 3, 2019 14:29

jrick approved these changes Sep 3, 2019

View reviewed changes

davecgh force-pushed the gcs_filters_v2 branch from ec73c7f to f07e16a Compare September 3, 2019 15:21

davecgh force-pushed the gcs_filters_v2 branch from f07e16a to 2c3a4e3 Compare September 3, 2019 15:30

davecgh merged commit 2c3a4e3 into decred:master Sep 3, 2019

davecgh deleted the gcs_filters_v2 branch September 4, 2019 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcs: Implement version 2 filters. #1856

gcs: Implement version 2 filters. #1856

davecgh commented Aug 21, 2019 •

edited

Loading

davecgh commented Aug 27, 2019 •

edited

Loading

dnldd left a comment

gcs: Implement version 2 filters. #1856

gcs: Implement version 2 filters. #1856

Conversation

davecgh commented Aug 21, 2019 • edited Loading

davecgh commented Aug 27, 2019 • edited Loading

dnldd left a comment

Choose a reason for hiding this comment

davecgh commented Aug 21, 2019 •

edited

Loading

davecgh commented Aug 27, 2019 •

edited

Loading