-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querying with a 20-item array is much more than 20 times slower than querying with 1-item array. #53
Comments
Hi That seems weird. Could you post a script to reproduce? |
@mdouze Anyway, I'm trying to create a |
Please just post the source that you are using. |
Because what I am interested in is exact KNN, so the In [11]: import numpy as np
In [12]: X = np.random.random((1000000, 160)).astype('float32')
In [13]: index = faiss.IndexFlatL2(160)
In [14]: index.add(X)
In [15]: %timeit index.search(X[:20], 20)
1 loop, best of 3: 3.62 s per loop
In [16]: %timeit index.search(X[:1], 20)
10 loops, best of 3: 64.4 ms per loop
In [17]: 64.4 * 20
Out[17]: 1288.0
In [18]: %timeit [index.search(X[i:i+1], 20) for i in xrange(20)]
1 loop, best of 3: 1.27 s per loop |
This may indeed be an issue due to OpenBLAS, because 20 is the threshold where the code switches from the internal distance computation to the OpenBLAS one. To verify this, could you set the threshold to something higher (1024) in utils.cpp line 855 and test again? I will check on my side as well. |
@mdouze Sure, I will tell you the result after trying. BTW, when I ran similar tests with In [19]: from sklearn.neighbors import NearestNeighbors
In [20]: nbrs = NearestNeighbors(n_neighbors=20, algorithm='brute', metric='l2').fit(X
...: )
In [21]: %timeit nbrs.kneighbors(X[:1], n_neighbors=20)
1 loop, best of 3: 179 ms per loop
In [22]: %timeit nbrs.kneighbors(X[:20], n_neighbors=20)
1 loop, best of 3: 603 ms per loop It ran almost 3 times slower than |
OK I can repro the issue with OpenBLAS In [8]: %timeit index.search(X[:19], 20)
10 loops, best of 3: 115 ms per loop
In [9]: %timeit index.search(X[:20], 20)
1 loops, best of 3: 5.09 s per loop Let's look for a fix now... |
BTW, I used |
Does not happen with MKL, tested on MKL BLAS In [1]: import numpy as np
In [2]: import faiss
Failed to load GPU Faiss: No module named swigfaiss_gpu
Faiss falling back to CPU-only.
In [3]: import time
In [4]: X = np.random.random((1000000, 160)).astype('float32')
In [5]: index = faiss.IndexFlatL2(160)
In [6]: index.add(X)
In [7]: t0 = time.time(); index.search(X[:19], 20); print time.time() - t0
0.113594055176
In [8]: t0 = time.time(); index.search(X[:20], 20); print time.time() - t0
0.107301950455 |
So, does this suggest that |
If possible, I would suggest switching to MKL BLAS. The internal Faiss implementation of exhaustive search is very inefficient in terms of memory accesses for a large number of queries nq. It seems that the break-down in terms of speed is between nq=512 and nq=2048 queries, but at nq=512 BLAS 10.0s When extrapolating from the nq=19 (internal) should give 2.6s. Therefore, both implementations are not satisfactory. |
OpenBLAS and OpenMP are known not to play will together, see https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded. A few more comments, see session
Another test, setting OMP_WAIT_POLICY, see https://gist.github.com/mdouze/9a392de0de81614ea2f39b8c724597a3 Setting the policy to PASSIVE seems to solve the problem. My guess is:
|
…er_ptr-unsafe Make `TryFromInnerPtr::try_from_inner_ptr` unsafe
When comparing
index.search
ofIndexFlatL2
withsklearn
'skneighbors
implementation, I found that:faiss
was several times faster when searching with 1-item arraysfaiss
was much slower when searching with many items at a timeI'm using
faiss
withOpenBLAS
.The text was updated successfully, but these errors were encountered: