-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Hybrid request does not return inner_hits for nested objects. #718
Comments
Can you please share more details for us to understand your request better: index mapping, query example, expected response? |
I removed vectors values, do you need them also? Index mapping :
Document example:
request:
response:
expected response:
|
This issue is also biting me. We have nested property which stores attachments on a document. We use the |
@dswitzer can we try 2 text queries with hybrid search and see if inner hits are coming or not. Reason I am asking this is for vector search there are improvements which are doing in 2.12 and 2.13 version relates to nested fields with vectors. |
@navneet1v The issue persist even if it contains query with non-vector fields only. |
@heemin32 thanks for confirming it. Can you please share the example on this issue on what and how you tested it. |
Create Index
Add doc
Query
ResponseExpect innerHit field is included in the result but no innerHit appears in the result.
|
@Kovsonq @dswitzer what is the main use case for those inner hits returned in the result? How critical is the score information for that use case? I spent some time checking what can be done for inner hits and our limitations. We can include an inner hits section in the response, similar to what's done for other queries in OpenSearch. The only limitation I'm seeing is with the scores. Inner hits have their own logic for retrieving scores; at a high level, they run a light version of the search again during the Fetch phase. At this point, the score normalization process for the hybrid query has been completed, and scores are updated in the query result section of the response. Scores added for inner hits will not be normalized but will be in raw form and scale. This means that, depending on the query, scores can be unbounded and will not correlate with the main hits in the query results (as those are normalized). |
My primary use case is to just be able to highlight the matching terms. The score of the inner hits does not matter much to me, because I'm just using it to highlight keyword matches. |
The primary use case for inner_hits in OpenSearch is to retrieve detailed matching information from nested objects within documents. This is particularly useful in scenarios where documents have complex structures with nested fields, and there is a need to understand which specific parts of these documents match the query criteria. In the context of nested objects, score information for inner hits is important because it allows users to identify the most relevant chunks or sub-documents within a larger document. When a hybrid search is performed, having access to the scores of inner hits enables users to rank and prioritize these nested sections effectively. Scenario: we need to return the top 20 most relevant nested documents (not parent documents) for the query. |
@Kovsonq |
After doing deep dive for this request I can conclude that we need more time and some additional mechanisms (most likely include core OpenSearch) to implement this feature correctly.
In result user may have false impression that high final position of the document in due to hits in I've created issue in core OpenSearch for possible extension mechanisms opensearch-project/OpenSearch#14546 |
I'm also trying to do the same, it seems also that the normalization isn't being applied correctly for hybrid search on nested fields as well. I've verified for normalizing using all of the values of the nested field, using the highest value of the nested field for each doc, using the sum of the values of the nested field. The normalization just doesn't come out correctly. For context my use case is to run hybrid search on chunks of documents and ideally I wouldn't need to create a new document in opensearch for every chunk that I want to index. I believe this is a common use case, it would be super AMAZING if we could get this support! |
Is there any blocking issue to support this feature? cc: @martin-gaievski @vibrantvarun |
@yuye-aws yes, there are fundamental blockers for inner hits: the process is split into two parts, first run at the shard level and doesn't have access to normalized scores and combined order of documents, second part is at the fetch phase and it's also at the shard level. Second item has additional problem of query and fetch phases not communicating with each other directly. |
@martin-gaievski Thanks for your prompt reply. Although I do not have much context for the inner hits and hybrid query, it really seems to be a tricky problem to resolve. Is there any existing info for me to get more knowledge? (Like PR #776) |
Still really excited to have this support! We're waiting for this to switch over to OpenSearch, it has everything else we need, but to hack around this to create our own implementation using just the top level docs is too messy. |
Is your feature request related to a problem?
Yes, I'm experiencing a problem when I use the hybrid search plugin in OpenSearch v2.11.0. Specifically, when I include the "inner_hits" parameter in my query for nested objects, I do not receive any inner hits in the response. This is causing frustration as my system requires this level of detail for optimal operation.
What solution would you like?
I would like the hybrid search plugin to be updated to include the functionality to correctly return inner hits from nested queries. Ideally, this would function seamlessly as it does in standard OpenSearch queries. This improvement would allow me and other users to fully utilize the power of the hybrid search plugin.
The text was updated successfully, but these errors were encountered: