-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cosine similarity support for faiss engine #2376
base: main
Are you sure you want to change the base?
Conversation
d6f16a1
to
6671e64
Compare
Adding additional unit and integration test for radial search. Will mark it as ready once i add those tests |
ecfffb4
to
eee45d5
Compare
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
eee45d5
to
4658bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VijayanB - completed a first pass review
@@ -77,6 +77,11 @@ public float scoreTranslation(float rawScore) { | |||
return Math.max((2.0F - rawScore) / 2.0F, 0.0F); | |||
} | |||
|
|||
@Override | |||
public float scoreToDistanceTranslation(float score) { | |||
return score; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused - Why is this correct?
/** | ||
* Gets the compatible space type for the given space type parameter. | ||
* The subclass can override this method and returns the appropriate space type that | ||
* is compatible with the library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add context on why this is needed via an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
* space type if it's already compatible | ||
* @see SpaceType | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
* @see SpaceType | ||
*/ | ||
|
||
protected SpaceType getCompatibleSpaceType(SpaceType spaceType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call something like convertUserToMethodSpaceType? The naming here doesnt seem to give it enough description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -47,4 +48,6 @@ public interface KNNLibraryIndexingContext { | |||
* @return Get the per dimension processor | |||
*/ | |||
PerDimensionProcessor getPerDimensionProcessor(); | |||
|
|||
VectorTransformer getVectorTransformer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
KNNMethodConfigContext knnMethodConfigContext = getKNNMethodConfigContextFromModelMetadata(modelMetadata); | ||
// Need to handle BWC case | ||
if (knnMethodContext == null || knnMethodConfigContext == null) { | ||
vectorTransformer = VectorTransformerFactory.getVectorTransformer(modelMetadata.getKnnEngine(), modelMetadata.getSpaceType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add log here?
public interface VectorTransformer { | ||
|
||
/** | ||
* Transforms a float vector into a new vector of the same type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be in place or always a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, Transformer should generally return a new value rather than modifying the original value it receives, to preserve immutability.
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine) { | ||
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine, SpaceType spaceType) { | ||
|
||
// Cosine similarity is supported as Inner product by FAISS by normalizing input vector, hence, we have to normalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this check out of this class? This class is already very crowded and I want to avoid adding more checks around engines. Instead, could we investigate either adding it to https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/engine/KNNLibrarySearchContext.java and/or adding a method in KNNVectorFieldType (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldType.java#L85) that says "should normalize query" or, better yet, transformQuery?
@@ -681,6 +681,8 @@ protected void validatePreparse() { | |||
*/ | |||
protected abstract PerDimensionProcessor getPerDimensionProcessor(); | |||
|
|||
protected abstract VectorTransformer getVectorTransformer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add java doc for this one? Also, can you update javadoc for getVectorValidator() to say that it is validated before any transform calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -700,7 +703,8 @@ protected void parseCreateField(ParseContext context, int dimension, VectorDataT | |||
} | |||
final float[] array = floatsArrayOptional.get(); | |||
getVectorValidator().validateVector(array); | |||
context.doc().addAll(getFieldsForFloatVector(array)); | |||
final float[] transformedArray = getVectorTransformer().transform(array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called before the per-dimension processor too? What should contract around these 2 be? Im wondering if we even need the per-dimension or if we can wrap that in this new full vector transform.
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Description
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion, normalize the input vector, and add it to faiss index with type as inner product, and, before search, normalize query vector if space type is cosine and engine is faiss.
Since we will be storing normalized vector in segments, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). To avoid this additional latency, customers can normalize their data set and create inner product.
This also adds support to radial search, for both max distance and min score.
Related Issues
#2242
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.