Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for search using the "fields" parameter with knn_vector field #2314

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

e-emoto
Copy link

@e-emoto e-emoto commented Dec 7, 2024

Description

This change implements valueFetcher in KNNVectorFieldType so search can support using the "fields" parameter when used with the knn_vector field type. ITs for this change were added to OpenSearchIT after the change was tested manually.

Manual Testing Output

curl -X PUT "localhost:9200/test_index" -H 'Content-Type: application/json' -d' 
{
  "settings" : {
    "number_of_shards" : 1,
    "number_of_replicas" : 0,
    "index.knn": true
  },
  "mappings": {
       "properties": {
       "test_field1": {
           "type": "knn_vector",
           "dimension": 2,
           "method": {
               "name":"hnsw",
               "engine":"nmslib",
               "space_type": "l2",
               "parameters":{
                 "m":20,
                 "ef_construction": 23
               }
           }
      },"test_field2": {
           "type": "knn_vector",
           "dimension": 3,
           "method": {
               "name":"hnsw",
               "engine":"nmslib",
               "space_type": "l2",
               "parameters":{
                 "m":20,
                 "ef_construction": 23
               }
           }
      },"test_field3": {
           "type": "text"
      }
   }
  }
}
'

curl -X PUT "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index": { "_index": "test_index" } }
{ "test_field1": [1.5, 5.5], "test_field2": [1.5, 5.5, 3.5], "test_field3": "text 1"}
{ "index": { "_index": "test_index" } }
{ "test_field1": [2.5, 3.5], "test_field2": [2.5, 3.5, 7.5], "test_field3": "text 2"}
{ "index": { "_index": "test_index" } }
{ "test_field1": [4.5, 5.5], "test_field2": [4.5, 5.5, 1.5]}
{ "index": { "_index": "test_index" } }
{ "test_field1": [1.5, 5.5], "test_field2": [1.5, 5.5, 2.5]}
{ "index": { "_index": "test_index" } }
{ "test_field3": "text 3"}
{ "index": { "_index": "test_index" } }
{ "test_field3": "text 4"}
'

curl -X GET "localhost:9200/test_index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "fields": ["test_field1"],
  "query": {
    "knn": {
      "test_field1": {
        "vector": [2.0, 4.0],
        "k": 10
      }
    }
  }
}
'

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.6666667,
    "hits" : [
      {
        "_index" : "test_index",
        "_id" : "6lOiwpMB0RzOL9-Fc3K5",
        "_score" : 0.6666667,
        "_source" : {
          "test_field1" : [
            2.5,
            3.5
          ],
          "test_field2" : [
            2.5,
            3.5,
            7.5
          ],
          "test_field3" : "text 2"
        },
        "fields" : {
          "test_field1" : [
            2.5,
            3.5
          ]
        }
      },
      {
        "_index" : "test_index",
        "_id" : "6VOiwpMB0RzOL9-Fc3K5",
        "_score" : 0.2857143,
        "_source" : {
          "test_field1" : [
            1.5,
            5.5
          ],
          "test_field2" : [
            1.5,
            5.5,
            3.5
          ],
          "test_field3" : "text 1"
        },
        "fields" : {
          "test_field1" : [
            1.5,
            5.5
          ]
        }
      },
      {
        "_index" : "test_index",
        "_id" : "7FOiwpMB0RzOL9-Fc3K5",
        "_score" : 0.2857143,
        "_source" : {
          "test_field1" : [
            1.5,
            5.5
          ],
          "test_field2" : [
            1.5,
            5.5,
            2.5
          ]
        },
        "fields" : {
          "test_field1" : [
            1.5,
            5.5
          ]
        }
      },
      {
        "_index" : "test_index",
        "_id" : "61OiwpMB0RzOL9-Fc3K5",
        "_score" : 0.10526316,
        "_source" : {
          "test_field1" : [
            4.5,
            5.5
          ],
          "test_field2" : [
            4.5,
            5.5,
            1.5
          ]
        },
        "fields" : {
          "test_field1" : [
            4.5,
            5.5
          ]
        }
      }
    ]
  }
}

Related Issues

Resolves #1633

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

@e-emoto please fix the failing CIs

e-emoto and others added 2 commits December 9, 2024 10:07
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
Signed-off-by: Ethan Emoto <[email protected]>
@@ -51,7 +52,12 @@ public KNNVectorFieldType(String name, Map<String, String> metadata, VectorDataT

@Override
public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
throw new UnsupportedOperationException("KNN Vector do not support fields search");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this format variable do? Do we have to add any check for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do you see problem with backward compatibility? Does this work for index created before 2.19?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this format variable do? Do we have to add any check for that?

I'm not sure exactly what it does, in my testing it was always null when valueFetcher was called for KNNVectorFieldType.

Also do you see problem with backward compatibility? Does this work for index created before 2.19?

I don't see any reason why it would have issues with backward compatibility, but how would I test this to confirm it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is change in query path, I don't see any reason why BWC will fail.

Comment on lines +920 to +924
XContentBuilder builder = XContentFactory.jsonBuilder()
.startObject()
.field("fields", new String[] { "*" })
.startObject("query")
.startObject("match_all")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are using a match all query here? Since a query is a query and it will solve the purpose, I would recommend having both match_all and a k-NN query to ensure that all cases are covered.

@navneet1v
Copy link
Collaborator

ITs for this change were added to OpenSearchIT after the change was tested manually.

can you add the manual testing result on the PR?

@@ -51,7 +52,12 @@ public KNNVectorFieldType(String name, Map<String, String> metadata, VectorDataT

@Override
public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
throw new UnsupportedOperationException("KNN Vector do not support fields search");
return new ArraySourceValueFetcher(name(), context) {
@Override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a UT for this.

@@ -51,7 +52,12 @@ public KNNVectorFieldType(String name, Map<String, String> metadata, VectorDataT

@Override
public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchLookup, String format) {
throw new UnsupportedOperationException("KNN Vector do not support fields search");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is change in query path, I don't see any reason why BWC will fail.

@@ -900,6 +901,136 @@ public void testKNNIndex_whenBuildVectorGraphThresholdIsProvidedEndToEnd_thenBui
deleteKNNIndex(indexName);
}

public void testKNNIndexSearchFieldsParameter() throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also lets add an IT test where some the docs don't have vector field and some have vector field in it.

return new ArraySourceValueFetcher(name(), context) {
@Override
protected Object parseSourceValue(Object value) {
if (value instanceof ArrayList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, I am not sure what is exactly expected from this method. Im thinking maybe we follow what the completion mapper did and if its not a list, just wrap it in a list: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/CompletionFieldMapper.java#L394-L398.

@shatejas @navneet1v what do you guys think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Indices that include knn_vector field fail when search includes "fields" parameter
5 participants