-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
negative distance returned in IndexFlatL2 search query #297
Comments
Hi, |
No activity. Closing. |
Hi, I'm quite new to nearest neighbor techniques and just tried faiss on my data (80 dimensional log magnitude mel spectrogram frames). I was surprised to see negative distances. @mdouze, what exactly you are recommending when you say "it is not advised to use vectors with large differences in magnitude"? A lot of datasets will have big differences in magnitude between different vectors, and you can't necessarily change the dataset (although in my case maybe some transformation, like undoing the log, would improve things, but I don't know any methodology for finding appropriate transformations). Reducing the batch size to below 20 is always a possibility, but I guess it will hurt performance, and it sounds like you are saying it won't actually help accuracy? |
The problem is that if you have a query vector x and two database vectors y_1 and y_2, where ||x|| >> ||y_1|| and ||x|| >> ||y_2|| then there will be accuracy losses because computations are performed with 32-bit float precision. For example, in 1D, float-32 => 24 bits mantissa => epsilon = 1/16M, so if there is a factor 16M between the magnitudes of x and y_i then || x - y_1|| = || x- y_2|| = ||x||, so y_1 and y_2 will be indistinguishable. Of course this is an extreme case, but any relative difference M in magnitude does incur a loss of precision that of log2(M) bits. In the current version of faiss, you can switch between the two implementations by adjusting distance_compute_blas_threshold that is set to 20 by default. |
Is it possible to set |
|
I met the same problem when I tried to generate simple data for testing. I used [1, 1], [2, 2], .. [N, N] where N = 10^5 as the database, and compared the result between |
setting the |
It is only for CPU Faiss. GPU always uses cblas I believe. @wickedfoo ? |
GPU always does the -2xy via cublas. However, L2 distance computation should be prevented from going negative as of the last release I believe. |
Yes indeed, for both CPU and GPU. |
Perhaps this was fixed for
Is this numerical overflow? |
Dear all, does anyone know why the following code could return negative entries for D? I am calculating the L2-nearest neighbors of CIFAR images, for which I assume IndexFlatL2 should return non-negative distances (and 0 for exact match).
Some notes:
1-Flat.py
.Thanks!
The text was updated successfully, but these errors were encountered: