Structure for finding values that belongs to result set #44

ak0rz · 2019-04-02T19:55:05Z

This issue is a part of #42

As we already have BitSet indexes it seems to be pretty straightforward to implement algorithms that check that certain value belongs to given result BitSet.

Unfortunately, it's performance would be dramatically lower on databases with a large number of documents, and even worse when the number of values in the index is equal or almost large as the number of documents, up to O(n * n / 64) operations to find all of the distinct values for this result set.

The simplest idea is to store something like a bloom filter consists of all longs in a bitset bitwise-or'ed to a single (or a few) long values. That way we may calculate the same hash on the result set and then use indexed hash to exclude values where is definitely no documents matching.

It would work smoothly on a large diverged indexes where it's needed and will cause almost no interfering on indexes with few values where in most cases we may have a document for each value and thus must check for each value by hand (which is a few)

ak0rz added the enhancement label Apr 2, 2019

ak0rz self-assigned this Apr 2, 2019

ak0rz changed the title ~~Bloom filter for finding values that belongs to result set~~ Structure for finding values that belongs to result set Apr 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structure for finding values that belongs to result set #44

Structure for finding values that belongs to result set #44

ak0rz commented Apr 2, 2019

Structure for finding values that belongs to result set #44

Structure for finding values that belongs to result set #44

Comments

ak0rz commented Apr 2, 2019