Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Endless recursion in getDistinctIndices with certain parameters. #34

Closed
kblauhut opened this issue Jun 21, 2021 · 3 comments
Closed
Assignees
Labels

Comments

@kblauhut
Copy link

Explanation
The getDistinctIndicesBis function within getDistinctIndices will fail with a Maximum call stack size exceeded error when there are too few distinct indizes being returned while n < size.

How to reproduce

const filter = new BloomFilter(39, 28);
filter.add("da5e21f8a67c4163f1a53ef43515bd027967da305ecfc741b2c3f40f832b7f82");

This code will trigger the issue.

Why does this happen?

// This returns the same hashes after n is larger than size
const hashes = hashTwice(elem, true, seed! + (size % n));
// This return from the doubleHashing function will also return the same values when called with the same hashes and different n values
return Math.abs((hashA + n * hashB) % size);

This combined with the filtering of double indizes in getDistinctIndicesBis will lead to the issue.
Its important to note that this will only happen in certain cases with high hash count and a low bit count.

@danielaugustin
Copy link

For every n >= size the function hashTwice does produce the same hashes (hashA and hashB) for every n.

If n >= size and hashB is dividable by size the function doubleHashing returns the same index for every n. This will always produce the same parameters for the recursive calls (getDistinctIndicesBis), resulting in a stack overflow.

Example (hashA=23, hashB=21, size=7 => ind always will become 23 % 7 = 2):

(23 + 7 * 21) mod 7 = 2
(23 + 8 * 21) mod 7 = 2
(23 + 9 * 21) mod 7 = 2
(23 + 10 * 21) mod 7 = 2
...

Possible fixes:

  1. (dirty) Within getDistinctIndicesBis put a bracket around seed! + size when calling hashTwice. This delays the point where every recursive call produces the same hashes. This fix does only delay
  2. Within doubleHashing replace n * hashB with n + hashB. The term (hashA + n + hashB) % size will always produce different indizes.
  3. Before calling doubleHashing check if hashB is dividable by size. In this case add a number (!= size) to hashB.

I checked fix (2) and it seems to work. Because the link in the comment (http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=4060353E67A356EF9528D2C57C064F5A?doi=10.1.1.152.579&rep=rep1&type=pdf) does not work I was not able to check if doubleHashing was implemented correctly .

Hope this helps.

@folkvir
Copy link
Collaborator

folkvir commented Jun 23, 2021

Thank you for sharing this, I will investigate this and post news as soon as possible.

@folkvir folkvir self-assigned this Jun 23, 2021
@folkvir folkvir added the bug label Jun 23, 2021
@folkvir
Copy link
Collaborator

folkvir commented Jun 23, 2021

Back, Fix pushed in branch fix-34 https://github.com/Callidon/bloom-filters/tree/fix-34
Is it possible to test it with a production test?

The difference is the hashTwice(elem, true, seed! + (size % n)) which is now hashTwice(elem, true, seed! + size + n) and is now a classic while loop instead of a recursive function.

You have to know that generating distinct indexes using this technique is very time consuming.
In fact if you want to generate N distinct indexes in a set of N elements, the probability to find the latest will be infinitely small in a very large set. But for a BloomFilter, we get a number of indexes equal to the number of hash functions used, so practically this is very fast.

@folkvir folkvir mentioned this issue Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants