Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segfault when extending CAGRA index with less than 128 nodes #486

Open
ajit283 opened this issue Nov 21, 2024 · 2 comments
Open

[BUG] Segfault when extending CAGRA index with less than 128 nodes #486

ajit283 opened this issue Nov 21, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@ajit283
Copy link
Contributor

ajit283 commented Nov 21, 2024

Describe the bug
When trying to extend a CAGRA index that has less than 128 nodes, a segfault occurs.

Steps/Code to reproduce bug
Run the test BuildExtendSearch in cpp/test/neighbors/ann_cagra_c.cu and set main_data_size to 127

or (using CAGRA with C++ directly):
Change n_rows to 127 in one of the test inputs in ann_cagra.cuh

Expected behavior
The index gets extended successfully

Environment details (please complete the following information):
Bare-metal, compiled from source. I've seen this bug on CUDA toolkit 12.6, 12.5 and 12.4.

@ajit283 ajit283 added the bug Something isn't working label Nov 21, 2024
@tfeher tfeher self-assigned this Dec 3, 2024
@tfeher
Copy link
Contributor

tfeher commented Dec 3, 2024

Thank you @ajit283 for reporting the bug. I could reproduce the problem (by using dim 127 at this line).

I believe that CAGRA-Q requires that dim is a multiple of pq_dim, therefore the modified input config is incorrect. This is still a bug, since the code should not crash, instead it should exit after printing an informative error message. We will fix this.

For reference, here is the test output:

[ RUN      ] AnnCagraTest/AnnCagraTestF_U32.AnnCagra/1
using ivf_pq::index_params nrows 10000, dim 127, n_lits 100, pq_dim 32
[I] [13:41:49.824636] optimizing graph
[I] [13:41:49.851619] Graph optimized, creating index
unknown file: Failure
C++ exception with description "std::bad_alloc: CUDA error at: /workspace1/cuvs/cpp/build_90/_deps/rmm-src/include/rmm/mr/device/cuda_memory_resource.hpp:62: cudaErrorIllegalAddress an illegal memory access was encountered" thrown in the test body.
unknown file: Failure
C++ exception with description "CUDA error encountered at: file=/workspace1/cuvs/cpp/build_90/_deps/raft-src/cpp/include/raft/core/interruptible.hpp line=303: call='query_result', Reason=cudaErrorIllegalAddress:an illegal memory access was encountered
Obtained 12 stack frames
#1 in gtests/NEIGHBORS_ANN_CAGRA_TEST: raft::cuda_error::cuda_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x6a [0x560a8bcd630a]
#2 in gtests/NEIGHBORS_ANN_CAGRA_TEST: void raft::interruptible::synchronize_impl<cudaError (*)(CUstream_st*), rmm::cuda_stream_view>(cudaError (*)(CUstream_st*), rmm::cuda_stream_view) +0x213 [0x560a8bcd7373]
#3 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0xb5401) [0x560a8bd26401]
#4 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x1d3551) [0x560a8be44551]
#5 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x1bfd95) [0x560a8be30d95]
#6 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x1c0375) [0x560a8be31375]
#7 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x1c7d0f) [0x560a8be38d0f]
#8 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x1bfe5a) [0x560a8be30e5a]
#9 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x5d594) [0x560a8bcce594]
#10 in /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x153a325d4d90]
#11 in /usr/lib/x86_64-linux-gnu/libc.so.6: __libc_start_main +0x80 [0x153a325d4e40]
#12 in gtests/NEIGHBORS_ANN_CAGRA_TEST(+0x5d5f5) [0x560a8bcce5f5]
" thrown in TearDown().
[  FAILED  ] AnnCagraTest/AnnCagraTestF_U32.AnnCagra/1, where GetParam() = {n_queries=100, dataset shape=10000x127, k=16, auto, max_queries=10, itopk_size=64, search_width=1, metric=L2, device, build_algo=IVF_PQ, pq_bits=8, pq_dim=63, vq_n_centers=100}

@ajit283
Copy link
Contributor Author

ajit283 commented Dec 5, 2024

@tfeher interesting, I actually meant to change n_rows in ann_cagra.cuh. Maybe the behavior described by you is a separate bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants