-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot iterate over nested list of objects to apply text embeddings processor #469
Comments
@zane-neo can you take a look into this. |
So, on looking at the neural-search plugin code, this case (list of maps inside a nested map) wasn't supported. I could add this case & now the text embeddings processor works with a a nested field which has a list of objects. Processor configuration
Request with simulate endpoint
The corresponding output after my fix:
I would like to contribute this "fix" (or feature depending on how you would like to classify it). I can fork the project & make the commit there. Can somehow look at my changes? @zane-neo @navneet1v (I have made the changes & if its okay to commit, I will add tests etc.) |
@zane-neo I looked into it & it seems like a issue with indexing mutli-valued fields. I think it makes sense since as per my last understanding, Lucene still did not support mutli valued vector fields (apache/lucene#12313) The issue here is with a nested field which can be a list of objects. Since the nested objects are independant documents in Lucene, having a vector field in each nested object should not be an issue (already verfied). Therefore, in my opinion this issue is not very closely related to the issue with multi valued fields. |
@krishy91 Ok, do you have a PR regarding the fix? I haven't got time to reproduce this issue but would like to check your PR first. |
@zane-neo I have a rough sketch of the solution which works. I have created a draft pull request here: #477 If you guys feel the solution is reasonable (I think so as it is adding just another case which needs to be handled), I will go ahead & fix the validateEmbeddingFieldsValue method (which checks if supplied input is valid) & add some tests for it. (before the pull request can be deemed fit for merging) |
@krishy91 When we implemented this feature, we considered mostly the split documents case(which is a list of string), so not supporting the list of object case. But I do think your case is valid, I have no concern to provide this capability to user. cc: @navneet1v @model-collapse. |
@zane-neo @navneet1v @model-collapse PR is open and already in use. We would love to see it merged and available in 2.12. (or 2.11.2) |
@krishy91 Recently did a test on the nested map type configuration, found some issues:
And simulate the pipeline processor:
Result:
The embedding results override the original values in nested maps which seems a bug needs to be fixed. |
Created this issue to track: #686 |
@zane-neo can this be closed? |
Closing this as it's been fixed. |
What is the bug?
When a nested field contains a list of objects, and a foreach processor is used to generate the text embeddings for a text in each nested object, it does not work.
How can one reproduce the bug?
This is way of using the foreach processor (to access internal nested objects with _ingest._value). Since text embedding processor does not current support . notation for fields, I had to write in the below way.
The above does not work. But the thing is, neural search supports query nested fields (as nested query) and there is the possibility to only get the actual matches using inner_hits.
What is the expected behavior?
The expected behavior is that for the above example, text embedding should be generated (vector-field) for both the nested objects in the above example.
I understand that multi-value fields aren't currently supported for generating text embeddings, but since we are dealing here with nested objects & they are independant objects in lucene, this should work.
What is your host/environment?
I tested with the official docker image of Opensearch 2.9 running on Windows.
The text was updated successfully, but these errors were encountered: