Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable compute_sdc_table if HNSWPQ index is read only #3246

Closed
2 of 4 tasks
jmazanec15 opened this issue Feb 8, 2024 · 7 comments
Closed
2 of 4 tasks

Disable compute_sdc_table if HNSWPQ index is read only #3246

jmazanec15 opened this issue Feb 8, 2024 · 7 comments

Comments

@jmazanec15
Copy link
Contributor

Summary

In index_read for HNSWPQ, the sdc_table is always computed. This SDC table is used to optimize distance computations during graph construction in HNSW through symmetric_distance operation:

  1. https://github.com/facebookresearch/faiss/blob/main/faiss/IndexPQ.cpp#L95-L106
  2. https://github.com/facebookresearch/faiss/blob/main/faiss/impl/HNSW.cpp#L316
  3. https://github.com/facebookresearch/faiss/blob/main/faiss/impl/HNSW.cpp#L240

This table comes with the cost of M*2^nbits*2^nbits*4 bytes of memory. For systems that will use this index type as a template multiple times (this is how we integrated with OpenSearch), this can accumulate a large memory overhead: 500*(64*2^8*2^8*4)/1024^3 = 7.8125 GB.

This SDC table is not used during search. That being said, I am wondering if we can change the read logic to avoid computing this table if the index is read only.

So, my proposal is to switch index_read to:

if (h == fourcc("IHNp") && !(io_flags & IO_FLAG_READ_ONLY)) {
   dynamic_cast<IndexPQ*>(idxhnsw->storage)->pq.compute_sdc_table();
}

From my understanding, this should not break any of the typical functionality of the HNSWPQ index type when it is read only. The only case the could be potentially problematic is when a user is reading the index from disk and then extracting PQ from the index and using it for other things (like maybe building another HNSWPQ index). That being said, I am not sure if this is a typical/valid use case.

Alternative options I can think of:

  1. Add different io flag to specifically skip this pre-computed table - in general, this table is implementation detail so I think itd be better to avoid exposing this in the interface if possible
  2. Implement logic on our side to read index and manually skip this - this may work but would be more difficult to maintain

I can raise a PR to make change, but wanted to first get feedback on the approach.

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python
@mdouze
Copy link
Contributor

mdouze commented Feb 13, 2024

Thanks for this analysis. This sounds good, IMO the flag could be IO_FLAG_SKIP_PRECOMPUTE_TABLE

https://github.com/facebookresearch/faiss/blob/main/faiss/index_io.h#L54

If it breaks things, the error messages would be clear enough for people to realize what they opted in for.

Do you want to make a PR for that?

@jmazanec15
Copy link
Contributor Author

@mdouze sure, let me raise a PR. I think that makes sense. But will dig in a little bit more on implementation.

@jmazanec15
Copy link
Contributor Author

Thinking about it more, though, there is already a precomputed table associated with the IndexPQ index that will still be computed: https://github.com/facebookresearch/faiss/blob/main/faiss/IndexPQ.cpp#L114 (albeit a bit smaller) so this flag may be confusing. Further, if a user passes IO_FLAG_SKIP_PRECOMPUTE_TABLE, then any indexing will fail to the HNSWPQ index because exceptions will be thrown in add_link and shrink_neighbor_list because of https://github.com/facebookresearch/faiss/blob/main/faiss/IndexPQ.cpp#L95, which may be misleading. Is there any concern over using READ_ONLY_FLAG instead?

@mdouze
Copy link
Contributor

mdouze commented Feb 14, 2024

I would rather introduce a new flag then, IO_FLAG_PQ_SKIP_SDC_TABLE for example.
The fact that readonly means that SDC is not needed in HSNW + PQ is very non obvious.

@jmazanec15
Copy link
Contributor Author

@mdouze that makes the most sense - safest way to preserve BWC. Ill update the PR

@mdouze
Copy link
Contributor

mdouze commented Feb 27, 2024

This should be sloved now

@mdouze mdouze closed this as completed Feb 27, 2024
@jmazanec15
Copy link
Contributor Author

Hey @mdouze I saw it got imported but not merged yet in #3250

facebook-github-bot pushed a commit that referenced this issue Mar 1, 2024
Summary:
## Description

Related issue: #3246

When reading HNSWPQ from disk, if index ~read only~ new `IO_FLAG_PQ_SKIP_SDC_TABLE` flag is set, skip initializing the sdc_table.

In addition, adds cpp test case verifying functionality and build test util header file to share creation of temporary files amongst tests.

Pull Request resolved: #3250

Test Plan: buck test //faiss/tests/:test_disable_pq_sdc_tables

Reviewed By: junjieqi

Differential Revision: D53844075

Pulled By: mdouze

fbshipit-source-id: e9a83c0e5243867edbca8f80e3b1242b38ef6a42
abhinavdangeti pushed a commit to blevesearch/faiss that referenced this issue Jul 12, 2024
Summary:
## Description

Related issue: facebookresearch#3246

When reading HNSWPQ from disk, if index ~read only~ new `IO_FLAG_PQ_SKIP_SDC_TABLE` flag is set, skip initializing the sdc_table.

In addition, adds cpp test case verifying functionality and build test util header file to share creation of temporary files amongst tests.

Pull Request resolved: facebookresearch#3250

Test Plan: buck test //faiss/tests/:test_disable_pq_sdc_tables

Reviewed By: junjieqi

Differential Revision: D53844075

Pulled By: mdouze

fbshipit-source-id: e9a83c0e5243867edbca8f80e3b1242b38ef6a42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants