-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Additive quantizers
[This page is Work In Progress and depends on a PR that has not been merged yet]
An additive quantizer approximates a vector x in dimension d as
x ~= x' = T_1(i_1) + ... + T_M(i_M)
where T_m is a table of d-dimensional vectors indexed by i_m. The size of table m is K_m. The code that is stored to represent x is
[i_1,..., i_M].
It is of size (in bits)
ceil(log_2(K_1)) + ... + ceil(log_2(K_M)).
As usual with data-adaptive quantizers, the tables T_1, ..., T_M are learned on a training set.
The product quantization can be considered as a special case of additive quantization where only a sub-vector of size d/M is non-zero in each table T_m.
A good introduction to additive quantization is Additive quantization for extreme vector compression, Babenko and Lempitsky, CVPR'14.
The Faiss implementation is based on the AdditiveQuantizer
object.
The Faiss implementation imposes that K_m is a power of two. It does not in general impose that the K_m's are all identical.
The decoding of an additive quantizer is unambiguous. However there is no simple method to (1) train an additive quantizer and (2) perform the encoding. Therefore, Faiss contains two implementations of additive quantizers: the Residual Quantizer and the Local Search Quantizer.
The Faiss residual quantizer is in the [ResidualQuantizer
] (https://github.com/facebookresearch/faiss/blob/master/faiss/impl/ResidualQuantizer.h) object.
In a residual quantizer, the encoding is sequential. At stage m of the encoding of x, the encoder picks the entry that best reconstructs the residual of vector x w.r.t. the previous encoding steps:
i_m = argmin_j || T_m(j) - (x - T_1(i_1) + ... + T_{m-1}(i_{m-1})) ||^2
This is a greedy approach, which tends to get trapped in local minima.
To avoid the local minima, the encoder maintains a beam of possible codes and picks the best code at stage M.
The beam size is given by the max_beam_size
field of ResidualQuantizer
: the larger the more accurate, but also the slower.
At training time, the tables are trained sequentially by k-means at each step. The max_beam_size
is also used for that.
The ResidualQuantizer
supports a version of k-means that starts training in a lower dimension, as described in "Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search", Shicong et al, AAAI'15.
We found that this k-means implementation was more useful for residual quantizer training than for k-means used in non-recursive settings (like IVF training).
The alternative k-means is implemented in the ProgressiveDimClustering
object.
The LSQ quantizer is from "LSQ++: Lower running time and higher recall in multi-codebook quantization", Julieta Martinez, et al. ECCV 2018.
It is implemented in the LocalSearchQuantizer
object.
At encoding time, LSQ starts from a random encoding of the vector and proceeds to a simulated annealing optimization to optimize the codes.
The parameter that determines the speed vs. accuracy tradeoff is the number of optimization steps encode_ils_iters
.
The Faiss LSQ implementation is limited to constant codebook sizes K_1 = ... = K_M.
The simplest usage of the additive quantizers is as codecs.
Like other Faiss codecs, they have a compute_codes
function and a decode
function.
Since encoding is approximate and iterative, the tradeoff between them is mainly on encoding time vs. accuracy.
In the following, we compare different quantizers in terms of reconstruction error and recall for symmetric comparison.
We compare the additive quantizer options on the SIFT1M dataset. The results are compareable with the Codec benchmarks. The regime is in 8x8 bits (64 bits total)
Data from this gist, generated by benchs/bench_quantizer.py.
Additive quantizers can be used to do compressed-domain distance comparisons to do efficient comparisons between one query and many encoded vectors.
The flat indexes are implemented in IndexAdditiveQuantizer
, parent class of IndexResidualQuantizer
and IndexLocalSearchQuantizer
.
The search function depends only on the additive quantization decoder, therefore it is in common between the two child classes.
For a given query q, it is possible to precompute look-up tables
LUT_m(i) = <T_m(i), q> for all m=1..M and i=1..K_m
Then the dot product with x' encoded as [i_1,..., i_M] is:
<q, x> ~= <q, x'> = LUT_1(i_1) + ... + LUT_m(i_M)
LUTs cannot be used directly to compute L2 distances. However,
||q - x'||^2 = ||q||^2 + ||x'||^2 - <q, x'>
Therefore if the norm of ||x'|| is stored, the distances can be computed.
This is determined by the search_type
field in the AdditiveQuantizer
object.
It can take the following values:
-
ST_decompress
: no norm stored, decompress database vector at runtime -
ST_LUT_nonorm
: just treat ||x'|| = 0 (may be OK for normalized vectors) -
ST_norm_float
,ST_norm_qint8
,ST_norm_qint4
: use various levels of compression
TODO evaluate the accuracy of ST_decompress vs. ST_norm_float and friends on SIFT1M with IndexResidualQuantizer and IndexLocalSearchQuantizer.
The additive quantizers can be used as a drop-in replacement for product or scalar quantization: the database vectors x are then stored in different buckets, as determined by a first-level coarse quantizer.
This is implemented in the objects IndexIVFAdditiveQuantizer
, IndexIVFResidualQuantizer
and IndexIVFLocalSearchQuantizer
.
In addition, each bucket is associated to a centroid c given by the coarse quantizer.
Is usually more accurate to encode the residual vector x-c since it usually has a lower norm.
The flag by_residual
determines whether the vector is stored directly or by residual.
TODO evaluate IndexIVFResidualQuantizer and IndexIVFLocalSearchQuantizer vs. IndexIVFPQ in terms of accuracy vs. distance computations because IVFAdditiveQuantizer is not well optimized for speed
Every quantizer has a set of reproduction values, ie. the span of elements that it can decode. If the number of values is not too large, they can be considered as a set of centroids. By performing nearest-neighbor search in the set of centroids, we obtain a coarse quantizer for an IVF index.
Visiting nprobe > 1 buckets is supported naturally by searching for the nprobe nearest centroids.
Compared to a "flat" coarse quantizer, the accuracy is a bit lower. However, the assignment is more efficient and the coarse quantizer codebook is much more compact.
The coarse quantizer versions are AdditiveCoarseQuantizer
and its children ResidualCoarseQuantizer
and LSQCoarseQuantizer
.
TODO comparison on 1M vectors / 262k centroids with IMI, IVF_HSNW
There is a special property of residual quantizers: a flat IndexResidual can be converted without re-encoding into an IndexIVFResidual. This works as follows:
-
codes are split into a prefix and a suffix, for example [i_1,..,i_3] and [i_4,...,i_M]
-
the residual codec is split into a
ResidualCoarseQuantizer
for the first 3 codebooks and an IndexIVFResidual for the remaining M-3 ones. -
the codes [i_4, ... , i_M] are stored in the inverted lists
At search time, the distances computed are equivalent to the flat IndexResidual, except that the search can be performed in a non-exhaustive way.
Faiss building blocks: clustering, PCA, quantization
Index IO, cloning and hyper parameter tuning
Threads and asynchronous calls
Inverted list objects and scanners
Indexes that do not fit in RAM
Brute force search without an index
Fast accumulation of PQ and AQ codes (FastScan)
Setting search parameters for one query
Binary hashing index benchmark