Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcs: Implement version 2 filters. #1856

Merged
merged 1 commit into from
Sep 3, 2019
Merged

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Aug 21, 2019

This requires #1851, and #1854.

This implements new version 2 filters which have 4 changes as compared to version 1 filters:

  • Support for independently specifying the false positive rate and Golomb coding bin size which allows minimizing the filter size
  • A faster (incompatible with version 1) reduction function
  • A more compact serialization for the number of members in the set
  • Deduplication of all hash collisions prior to reducing and serializing the deltas

In addition, it adds a full set of tests and updates the benchmarks to use the new version 2 filters.

The primary motivating factor for these changes is the ability to minimize the size of the filters, however, the following is a before and after comparison of version 1 and 2 filters in terms of performance and allocations.

It is interesting to note the results for attempting to match a single item is not very representative due to the fact the actual hash value itself dominates to the point it can significantly vary due to the very low ns timings involved. Those differences average out when matching multiple items, which is the much more realistic scenario, and the performance increase is in line with the expected values. It is also worth nothing that filter construction now takes a bit longer due to the additional deduplication step. While the performance numbers for filter construction are about 25% larger in relative terms, it is only a few ms difference in practice and therefore is an acceptable trade off for the size savings provided.

benchmark                      old ns/op    new ns/op    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      16194920     20279043     +25.22%
BenchmarkFilterBuild100000     32609930     41629998     +27.66%
BenchmarkFilterMatch           620          593          -4.35%
BenchmarkFilterMatchAny        2687         2302         -14.33%

benchmark                      old allocs   new allocs   delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      6            17           +183.33%
BenchmarkFilterBuild100000     6            18           +200.00%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%

benchmark                      old bytes    new bytes    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      688366       2074653      +201.39%
BenchmarkFilterBuild100000     1360064      4132627      +203.86%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%

@davecgh davecgh added this to the 1.5.0 milestone Aug 21, 2019
@davecgh davecgh force-pushed the gcs_filters_v2 branch 3 times, most recently from 562c2f4 to 518172a Compare August 27, 2019 11:16
@davecgh
Copy link
Member Author

davecgh commented Aug 27, 2019

This has been updated to support deduplication of filter data for the version 2 filters. The PR description and all benchmarks have also been updated accordingly.

For a concrete example of the difference the deduplication can make, consider the following values for block 1096:

v1 filter bytes: 2940
v2 filter bytes before update: 2819
v2 filter bytes after update: 69

Copy link
Member

@dnldd dnldd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

gcs/bench_test.go Show resolved Hide resolved
gcs/gcs.go Show resolved Hide resolved
gcs/gcs.go Outdated Show resolved Hide resolved
gcs/gcs.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the gcs_filters_v2 branch 2 times, most recently from 5285405 to ec73c7f Compare September 3, 2019 14:29
This implements new version 2 filters which have 4 changes as compared
to version 1 filters:

- Support for independently specifying the false positive rate and
  Golomb coding bin size which allows minimizing the filter size
- A faster (incompatible with version 1) reduction function
- A more compact serialization for the number of members in the set
- Deduplication of all hash collisions prior to reducing and serializing
  the deltas

In addition, it adds a full set of tests and updates the benchmarks to
use the new version 2 filters.

The primary motivating factor for these changes is the ability to
minimize the size of the filters, however, the following is a before and
after comparison of version 1 and 2 filters in terms of performance and
allocations.

It is interesting to note the results for attempting to match a single
item is not very representative due to the fact the actual hash value
itself dominates to the point it can significantly vary due to the very
low ns timings involved.  Those differences average out when matching
multiple items, which is the much more realistic scenario, and the
performance increase is in line with the expected values.  It is also
worth nothing that filter construction now takes a bit longer due to the
additional deduplication step.  While the performance numbers for filter
construction are about 25% larger in relative terms, it is only a few ms
difference in practice and therefore is an acceptable trade off for the
size savings provided.

benchmark                      old ns/op    new ns/op    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      16194920     20279043     +25.22%
BenchmarkFilterBuild100000     32609930     41629998     +27.66%
BenchmarkFilterMatch           620          593          -4.35%
BenchmarkFilterMatchAny        2687         2302         -14.33%

benchmark                      old allocs   new allocs   delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      6            17           +183.33%
BenchmarkFilterBuild100000     6            18           +200.00%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%

benchmark                      old bytes    new bytes    delta
-----------------------------------------------------------------
BenchmarkFilterBuild50000      688366       2074653      +201.39%
BenchmarkFilterBuild100000     1360064      4132627      +203.86%
BenchmarkFilterMatch           0            0            +0.00%
BenchmarkFilterMatchAny        0            0            +0.00%
@davecgh davecgh merged commit 2c3a4e3 into decred:master Sep 3, 2019
@davecgh davecgh deleted the gcs_filters_v2 branch September 4, 2019 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants