-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faiss::gpu::runMatrixMult failure #34
Comments
Possibility that you ran out of GPU memory? |
What were you trying to run? |
train data shape: (2000000, 1000) my code:
My GPU memory is 8GB. I just tried the bench bench_gpu_sift1m.py, the same error. |
instead of giving all of the (20000000, 1000) at once, try giving it in chunks of (10000, 1000) or so. |
Only GpuIndexFlat* handles passing large amounts of data all at once for add or search at present. |
I see. Actually I used numpy.memmap to load the data. Sorry, could you give me some guidance on how to chunk the input data that can be loaded with index.add? |
Also, I notice that my GPU memory occupation in training is always about 20%. That's strange. |
Just made some changes on the bench code bench_gpu_sift1m.py, still the same error. Populating top 10000 not work, either. Seems it is not memory issue. Maybe there is something wrong with the CUBLAS. By the way, do you have a plan to publish an official docker image to avoid some problems caused by installation?
|
Hi |
Hi, mdouze. Above code is from bench_gpu_sift1m.py. I used the data from http://corpus-texmex.irisa.fr/, following the instruction in https://github.com/facebookresearch/faiss/tree/master/benchs. I just wanted to check if the bench code works. Turn out to be the same error with my own. |
Ok, so this is the exact script bench_gpu_sift1m.py applied to the SIFT1M dataset and not your 20M*1000-dim dataset, correct? |
Yes for your first question. |
It could be the same bug as issue #8. Unfortunately we do not have the hardware to reproduce it, so we would be grateful if you could narrow down the error for us:
|
You can also try running
Another thing is to try resetting the GPU via nvidia-smi and trying again. Also, you could try and investigate which CUDA shared libraries it is trying to load, to see if there is a mismatch if you have multiple CUDA SDK versions installed. |
Also, I notice that my GPU memory occupation in training is always about 20%. That's strange. Faiss GPU reserves about 18% of available GPU memory up front for scratch space. This amount is controllable via |
For your questions:
Some other infos:
|
|
Are you compiling with clang or gcc? |
gcc |
I believe this is related to the GPU, which is similar to issue #8 |
I meet the same problem. My GPU is TITAN X. I want to index 1000000 512 dimension vectors using faiss.GpuIndexFlatL2. Then it will meet this issue. But if I cut the number 1000000 to 500000, it will be normal. It seems the max number of vectors is 500000. Because 60*0000 vectors will also cause this problem. The following is my code:
|
@yhpku , Thanks. I tried GTX 1080 and Titan X, both failed. Seems yours is caused by OOM. IndexFlatL2 will load all the data all at once for add or search. So, maybe 500000 is the upper limitation for Titan X. You can try IndexIVFPQ, which compresses the stored vectors with a lossy compression. |
Hi @yhpku, in the code above you use 512 vectors in 1M dimensions. Is this what you want? |
@mdouze,that's not. I means 1M vectors in 512 dimensions |
@hellolovetiger, Titan X should work. Does bench_gpu_sift1m.py crash on Titan X? What error? |
@yhpku, please fix your code then. |
On Titan X,
For bench_gpu_sift1m.py,
The error will be gone if setting co.usePrecomputed = False For my own code:
The error is:
When I cut the base data from 20M to 3M, the error becomes:
Seems it becomes a memory issue. |
You are running out of GPU memory. Do not try and add so many vectors at once. 3M * 1000 * sizeof(float) is 12 GB. Try adding the vectors in chunks of 10000 to 50000 instead. |
After adding to the index, the vectors will then be compressed via PQ, and then you can add more. But, before compression, each vector takes 4000 bytes of memory ( = 1000 * sizeof(float)), not 16 bytes (PQ16). |
Problems with attempting to add large CPU resident vectors all at once will be fixed internally at some point. But in the meantime you will have to incrementally add them. |
Got it. Thanks, @wickedfoo . It is better to add these infos to wiki. 😃 |
@mdouze ,I am sorry , this is a typing error. The actual code is as follows. And the error output is, "Faiss assertion err == cudaSuccess failed in faiss::gpu::StackDeviceMemory::Stack::~Stack() at utils/StackDeviceMemory.cpp:54Aborted (core dumped)".
|
Closing this issue now, because the discussion derived. Please open a new one if it is blocking. |
Recently, I started to use faiss and met the same problem. I found many issues and tried almost all the solutions mentioned above, but failed to find a solution. At last, I found different CUDA versions shown by nvcc and nvdia-smi, so I adjust the nvcc verion to match the nvidia-smi, and luckily it works at last. So, Note that the nvcc version must be consistent with the nvdia-smi version. my mismatch nvcc and nvdia-smi If you met the same problem throgh compile faiss, this may help you. choose the best CUDA Toolkit version is here. |
You are lucky. Unfortunately, it does not work when I tried to use the faiss-gpu on cuda 11.1. |
|
Hi, |
…ex-macro Use impl_concurrent_index macro
The full log:
Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)
I have successfully run demo_ivfpq_indexing_gpu, which I think the faiss was installed successfully.
The text was updated successfully, but these errors were encountered: