-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable compute_sdc_table if HNSWPQ index is read only #3246
Comments
Thanks for this analysis. This sounds good, IMO the flag could be https://github.com/facebookresearch/faiss/blob/main/faiss/index_io.h#L54 If it breaks things, the error messages would be clear enough for people to realize what they opted in for. Do you want to make a PR for that? |
@mdouze sure, let me raise a PR. I think that makes sense. But will dig in a little bit more on implementation. |
Thinking about it more, though, there is already a precomputed table associated with the IndexPQ index that will still be computed: https://github.com/facebookresearch/faiss/blob/main/faiss/IndexPQ.cpp#L114 (albeit a bit smaller) so this flag may be confusing. Further, if a user passes IO_FLAG_SKIP_PRECOMPUTE_TABLE, then any indexing will fail to the HNSWPQ index because exceptions will be thrown in add_link and shrink_neighbor_list because of https://github.com/facebookresearch/faiss/blob/main/faiss/IndexPQ.cpp#L95, which may be misleading. Is there any concern over using READ_ONLY_FLAG instead? |
I would rather introduce a new flag then, |
@mdouze that makes the most sense - safest way to preserve BWC. Ill update the PR |
This should be sloved now |
Summary: ## Description Related issue: #3246 When reading HNSWPQ from disk, if index ~read only~ new `IO_FLAG_PQ_SKIP_SDC_TABLE` flag is set, skip initializing the sdc_table. In addition, adds cpp test case verifying functionality and build test util header file to share creation of temporary files amongst tests. Pull Request resolved: #3250 Test Plan: buck test //faiss/tests/:test_disable_pq_sdc_tables Reviewed By: junjieqi Differential Revision: D53844075 Pulled By: mdouze fbshipit-source-id: e9a83c0e5243867edbca8f80e3b1242b38ef6a42
Summary: ## Description Related issue: facebookresearch#3246 When reading HNSWPQ from disk, if index ~read only~ new `IO_FLAG_PQ_SKIP_SDC_TABLE` flag is set, skip initializing the sdc_table. In addition, adds cpp test case verifying functionality and build test util header file to share creation of temporary files amongst tests. Pull Request resolved: facebookresearch#3250 Test Plan: buck test //faiss/tests/:test_disable_pq_sdc_tables Reviewed By: junjieqi Differential Revision: D53844075 Pulled By: mdouze fbshipit-source-id: e9a83c0e5243867edbca8f80e3b1242b38ef6a42
Summary
In index_read for HNSWPQ, the
sdc_table
is always computed. This SDC table is used to optimize distance computations during graph construction in HNSW through symmetric_distance operation:This table comes with the cost of
M*2^nbits*2^nbits*4
bytes of memory. For systems that will use this index type as a template multiple times (this is how we integrated with OpenSearch), this can accumulate a large memory overhead:500*(64*2^8*2^8*4)/1024^3 = 7.8125 GB
.This SDC table is not used during
search
. That being said, I am wondering if we can change the read logic to avoid computing this table if the index is read only.So, my proposal is to switch index_read to:
From my understanding, this should not break any of the typical functionality of the HNSWPQ index type when it is read only. The only case the could be potentially problematic is when a user is reading the index from disk and then extracting PQ from the index and using it for other things (like maybe building another HNSWPQ index). That being said, I am not sure if this is a typical/valid use case.
Alternative options I can think of:
I can raise a PR to make change, but wanted to first get feedback on the approach.
Running on:
Interface:
The text was updated successfully, but these errors were encountered: