[FEATURE] Score mode support other than max with KNN nested field #1743

heemin32 · 2024-06-11T19:40:28Z

Current KNN nested field works with max score mode which use max score among child documents(nested field document) as the parent document score. I would like to use other score mode like avg or sum of all child documents scores.

How score mode works with nested field

During query, it returns matched child document with it score and the child documents is joined back to its parent documents. Based on score mode, the parent document's score is calculated using returned child documents and their scores.

Challenge

KNN query does not return all child documents of a parent documents but only the one with max score. Therefore regardless of score mode, min, max, avg and sum, the score will be only max as of now.
Even if we could return all child documents of the selected parent documents with their scores, it won't guarantee that the final result is correct because there could be a parent documents which might have higher score but none of their child document was not returned during search phase.

Solution

Option 1

After querying KNN fields, we retrieve all sibling documents of the searched child documents, calculate their score and add them in the search result. The rest of the work will be handled by OpenSearch core. Still, the end result might not exactly match with what we might get from exact search because we are not comparing all the final parent document score but only subset of it.

Option 2

When the score mode is not max, we do knn search on nested doc level without any deduplication. Then, the score mode will be applied to only those returned nested field doc. This might not guarantee that the query will return the k result.

Option 3

Introduce index setting where you can execute knn search in nested field level. (old behavior where there is no deduplication per parent documents)

Alternative

1. Rescoring (Incorrect)

Retrieve parent documents based on max child document score. Then, we rescore the document by calculating score on child documents.

{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "knn": {
          "nested_field.my_vector": {
            "vector": [
              1,
              1,
              1
            ],
            "k": 2
          }
        }
      }
    }
  },
  "rescore": {
    "window_size": 2,
    "query": {
      "query_weight": 0.0,
      "rescore_query_weight": 1.0,
      "rescore_query": {
        "nested": {
          "path": "nested_field",
          "score_mode": "avg",
          "query": {
            "function_score": {
              "script_score": {
                "script": {
                  "lang": "knn",
                  "source": "knn_score",
                  "params": {
                    "field": "nested_field.my_vector",
                    "query_value": [
                      1,
                      1,
                      1
                    ],
                    "space_type": "l2"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

2. Exact search(Not efficient)

We can run exact search to find all child documents and its score

{
  "query": {
    "nested": {
      "path": "nested_field",
      "score_mode": "avg",
      "query": {
        "function_score": {
          "script_score": {
            "script": {
              "lang": "knn",
              "source": "knn_score",
              "params": {
                "field": "nested_field.my_vector",
                "query_value": [
                  1,
                  1,
                  1
                ],
                "space_type": "l2"
              }
            }
          }
        }
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

yuye-aws · 2024-09-13T02:35:23Z

There exists a use case by text_chunking processor. Despite we recommend max score mode to search the chunked embedding field: https://opensearch.org/docs/latest/search-plugins/text-chunking/#step-4-search-the-index-using-neural-search, it also makes sense to search with avg score mode.

Supporting knn query is a good idea since the neural and neural sparse query will inherit the behavior.

yuye-aws · 2024-09-13T02:44:58Z

Option 1

After querying KNN fields, we retrieve all sibling documents of the searched child documents, calculate their score and add them in the search result. The rest of the work will be handled by OpenSearch core. Still, the end result might not exactly match with what we might get from exact search because we are not comparing all the final parent document score but only subset of it.

Option 2

When the score mode is not max, we do knn search on nested doc level without any deduplication. Then, the score mode will be applied to only those returned nested field doc. This might not guarantee that the query will return the k result.

Option 1 makes more sense to me. Personally, i do not feel like the specific logic in option 2 on max score_mode.

heemin32 · 2024-12-11T22:13:06Z

Closing in favor of #2283
Feel free to reopen if there is a requirement that the mentioned PR cannot cover.

heemin32 added untriaged enhancement labels Jun 11, 2024

heemin32 added this to Vector Search RoadMap Jun 12, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Jun 12, 2024

vamshin added backlog and removed untriaged labels Jun 20, 2024

vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Jun 20, 2024

heemin32 mentioned this issue Jun 27, 2024

[FEATURE] add support for more than one kNN query on nested vectors with multiple inner hits and filter #1768

Closed

vamshin moved this from Backlog (Hot) to 2.17.0 in Vector Search RoadMap Jul 2, 2024

vamshin added v2.17.0 and removed backlog labels Jul 2, 2024

heemin32 mentioned this issue Jul 31, 2024

[BUG] Vector search with HNSW with Lucene engine does not return default (3) results in the nested knn_vector #1912

Closed

heemin32 removed the v2.17.0 label Aug 6, 2024

heemin32 moved this from 2.17.0 to Backlog (Hot) in Vector Search RoadMap Aug 6, 2024

heemin32 mentioned this issue Sep 13, 2024

[FEATURE] inner_hits in nested neural query should return all the chunks #2113

Open

vamshin moved this from Backlog (Hot) to 2.19.0 in Vector Search RoadMap Oct 4, 2024

vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Oct 4, 2024

heemin32 moved this from 2.19.0 to Backlog (Hot) in Vector Search RoadMap Oct 30, 2024

heemin32 mentioned this issue Nov 5, 2024

[RFC] Multiple inner hits for nested field #2249

Closed

4 tasks

heemin32 closed this as completed Dec 11, 2024

github-project-automation bot moved this from Backlog (Hot) to ✅ Done in Vector Search RoadMap Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Score mode support other than max with KNN nested field #1743

[FEATURE] Score mode support other than max with KNN nested field #1743

heemin32 commented Jun 11, 2024 •

edited

Loading

yuye-aws commented Sep 13, 2024

yuye-aws commented Sep 13, 2024

Option 1

Option 2

heemin32 commented Dec 11, 2024

[FEATURE] Score mode support other than max with KNN nested field #1743

[FEATURE] Score mode support other than max with KNN nested field #1743

Comments

heemin32 commented Jun 11, 2024 • edited Loading

How score mode works with nested field

Challenge

Solution

Option 1

Option 2

Option 3

Alternative

1. Rescoring (Incorrect)

2. Exact search(Not efficient)

yuye-aws commented Sep 13, 2024

yuye-aws commented Sep 13, 2024

Option 1

Option 2

heemin32 commented Dec 11, 2024

heemin32 commented Jun 11, 2024 •

edited

Loading