-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Score mode support other than max with KNN nested field #1743
Comments
There exists a use case by text_chunking processor. Despite we recommend max score mode to search the chunked embedding field: https://opensearch.org/docs/latest/search-plugins/text-chunking/#step-4-search-the-index-using-neural-search, it also makes sense to search with avg score mode. Supporting knn query is a good idea since the neural and neural sparse query will inherit the behavior. |
Option 1 makes more sense to me. Personally, i do not feel like the specific logic in option 2 on max score_mode. |
Closing in favor of #2283 |
Current KNN nested field works with max score mode which use max score among child documents(nested field document) as the parent document score. I would like to use other score mode like avg or sum of all child documents scores.
How score mode works with nested field
During query, it returns matched child document with it score and the child documents is joined back to its parent documents. Based on score mode, the parent document's score is calculated using returned child documents and their scores.
Challenge
KNN query does not return all child documents of a parent documents but only the one with max score. Therefore regardless of score mode, min, max, avg and sum, the score will be only max as of now.
Even if we could return all child documents of the selected parent documents with their scores, it won't guarantee that the final result is correct because there could be a parent documents which might have higher score but none of their child document was not returned during search phase.
Solution
Option 1
After querying KNN fields, we retrieve all sibling documents of the searched child documents, calculate their score and add them in the search result. The rest of the work will be handled by OpenSearch core. Still, the end result might not exactly match with what we might get from exact search because we are not comparing all the final parent document score but only subset of it.
Option 2
When the score mode is not max, we do knn search on nested doc level without any deduplication. Then, the score mode will be applied to only those returned nested field doc. This might not guarantee that the query will return the k result.
Option 3
Introduce index setting where you can execute knn search in nested field level. (old behavior where there is no deduplication per parent documents)
Alternative
1. Rescoring (Incorrect)
Retrieve parent documents based on max child document score. Then, we rescore the document by calculating score on child documents.
2. Exact search(Not efficient)
We can run exact search to find all child documents and its score
The text was updated successfully, but these errors were encountered: