-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] ML Inference Processors #2173
Comments
Different processors may have different pre-processing or post-processing logics. Can we customize these in the new processor? For example, when there are multiple inference fields in one document,
The processor should fetch those TOKEN_WEIGHT_MAP and assign them to the target field. Can you give an example of how to configure a |
BTW, in neural-search plugin we have implemented an abstract class for ml-inference processor: https://github.com/opensearch-project/neural-search/blob/ea49d3c5006efff9dfa36e69791ae9a8e468d25a/src/main/java/org/opensearch/neuralsearch/processor/InferenceProcessor.java#L35. It can be a reference |
currently, the remote model and local model are returning different formats, which makes the post-processing works complicated. We are also planning of standardizing the local model predict output with the same format as the remote model predict output, then it will ease the pain of post processing. But to the point of multiple inference input fields, it depends on whether the users would like to run one round of prediction call or multiple rounds of prediction call. for example, the inference processors can run one model with multiple inference.
in this setting, it will run the model twice and mapping the output accordingly to two document fields. the sample response would be
|
I think for now it's not depend on the preference of users, but the cluster is using remote connector or local deployment. For local deployment we're always running prediction for single document (This may change in the future version because of batch ingestion feature (RFC @chishui ) ). For remote connector we'll send all input docs in one batch and recieve the results in one batch, and the response format is different from local deployment for now. If we want to implement this new processor I think we should take these into consideration. The processor should be able to recognize the number/type of inference results for different deployment types and different ml use cases. |
Different processors have different inputs and output data structure, e.g. text_embedding processor can accept object type and extract specific fields user configured to inference, but generativeQA input might be a single string, other processors might different with them as well based on their functionality. Also their output data structure can be different: text_embedding is a dense vector and a QA might be a string. So different processors might have totally different configuration, we need to consider if it make sense to simply put all different configuration together. Did we investigate on the current processor configuration or do we have an example that can match all processor configurations? Also we need to think of complex data structure like nested objects, it would be better if the example has complex data structure support. |
Remote models don't all have the same output schema, do they? Not sure how this can be accomplished. I'll also bring up the rerank processor since I haven't seen it mentioned anywhere. 1 particular reranker (ml-opensearch) uses an ml inference. Would the plan for that be to move it to an inference processor or to point it at the InferenceProcessorInterface (or whatev that'll be called)? |
If processors are cohesive, similar by nature, it totally makes sense to merge them into one. But if they differ a lot, coupling them together may not bring convenient to users but confusion to them. Although, there is "tag" and "description" parameters user can use to call out the purpose of the processor, but they are optional, and it'll be confusing to users when they use multiple such processors here and there. Additionally, processors will use "inference_parameters" to pass parameters, I'm not sure if it's enough to support all potential use cases and how we enforce certain parameter to be required for certain processor. It's like having a single OpenAI API for all ML tasks, it's doable, but we need to evaluate the pros and cons to see which option outcompetes the other. |
Why we need to run the model twice? Currently the logic like text embedding will gather all input field texts and send to ml model together. Is there any scenario we need to call model twice instead of together in one processor? |
It depends on the model input and also use case. Some models only accept one input field, then two input fields require two rounds of prediction. And yes if a model accepts multiple input fields, we can call model and feed multiple input fields in one prediction. That's a common case. Please keep in the mind that all remote models are deployed with a connector, where the model input field name can be defined, the pre-processing function and post processing function will help with the transformation of the data format for model input and output as well. For the example, this blueprint: "request_body": "{ "input": ${parameters.input}}", --> is expected a model input field Similarly, the "pre_process_function": "connector.pre_process.openai.embedding", "post_process_function": "connector.post_process.openai.embedding" are responsible for pre-processing data to model input and post processing data to become desired ingest document. |
I want to emphasize that in the design of using ml connectors, ml_inference processors and other processors, it provides flexibilities for users to handle various models, in your first concern about different input format, for example, object type, ml connectors will help handling different input format, Handling object type model input:for example, I want to use a language classification model , this model's predict function is expecting an object, in the format of {"input": ["text"]}, this can be a good complicated case, it's object with a map of list.
1. Using the one field as ml inputthen in the ml connector, if the use case is using the one field from the document to identify we can define the connectors and in the response body
in this request body, it helps formatting the parameters.input field into the desire model input format. In using this model to predict in ml common, we don't need to worry about the format of an object of a map with list, instead, we use parameters.input as we defined in the connectors.
Let's use the inference processors during ingestions for
Now the documents getting ingest already has the model output field named
Since I turned on dynamic mapping, to use the language label field, we can search using doc path
2. Using the multiple fields as ml inputAnother use case, @zhichao-aws also mentioned that many models now accepts multiple model input fields, nowadays, text_embedding models and classification models accepts multiple model inputs, we just need to config the connectors properly to meet this multiple input fields requirements. Similarly in the response body, we config two input fields.
In this case, it's looking for two input fields to the document, and format properly, the ml_inference processor will handle the mappings for reviews -> input1, products -> input2.
3. Formatting with other processors.in the connectors, it's supported writing post_process_function, and also we can use other processors before ml_inference_processor to handle model input field format or after to handle model output format. I am continuing the for the second step, and would like parse the model output field let's modify the ingest pipeline for the same index
then when ingesting the same document, it returns
|
Does it support multiply output field? For example, I want to map field A result to field Aout, field B result to field Bout. |
@xinyual yes, it supports multiple output fields' mapping. Because input_map and output_map are both (List of Map) in the parameters. for example, if a model returns
you can define the output_map as
As long as the response is a Map, the dot path notation is also supported to find subfields in the model output. |
Is this mean that if multiple fields need embedding, we will run the predict mutilple times?
Looks like this is specifically for remote model, will we implement same for local model? |
Not sure I see this in the PR, but will the user be able to set up ML ingest nodes separately from other ML (inference) nodes? |
so multiple fields are used for embedding and would like to conduct a mini batch in one prediction,
if two document fields mapped to the same model input fields, the processors would concatenate into a list ["this is review1","product 1"] and send to one prediction call. I will include this example in the IT tests. |
Hi @austintlee , we didn't consider this into requirement when we come up with the design. But can you address your use case, for the reason why you want to separate the ML ingest nodes and ML inference nodes? |
We will support local model incrementally. The reason behind that is the local model now has different inpuDataset for predictions. And we would like to unify the inputDataset for local models then enable the ml inference processors for local models. |
We can implement incrementally, but I prefer we have the design in the beginning, so that we can know if current implementation for remote case is best or not, otherwise we may need to change the current implementation when implementing for local model. |
Please also consider the use-case of asymmetric embedding models (e.g. https://huggingface.co/intfloat/multilingual-e5-small). These models require the content to be embedded to be prefixed by "signal strings" that give the model the information whether it is embedding passages or queries. I haven't seen this use-case reflected in the discussion, but I might just have missed it in the comments. The support for asymmetric embedding models has been newly introduced to ml-commons (cf. #1799). |
Thanks @br3no , we will test this case. |
updated timeline: the search response processor will be released in 2.16 open search version. |
For ML inference response processors, it involves a list of documents in the search response under "_source" 's value. For inference scenario, there are two scenarios:
2.One-to-one: taking one field from one document as model input and send one prediction call. N document will make N prediction call, and every prediction output will add back to the document. Many to one can be the default setting of ML inference response processors. How to support One-to-one inference? Here are two proposed solutions: using this sample response for discussion Sample response in hits:
Option 1: port foreach processor to search pipeline, for each processor required to take a
Pros:
Cons:
Option 2: add flag in ML inference processor to let user define the mode (many to 1, 1:1).
Pros:
Cons:
|
Problem statement
Currently, there are different implementations of search processors/ingest processors that use a machine learning model, e.g, TextEmbeddingProcessor for text-embedding models , GenerativeQAResponseProcessor for large language models, PersonalizeRankingResponseProcessor for reranking models housing in AWS personalized Service. Looking forward, when each type of machine learning models has a separate type of processor, the number of processors will grow to be enormous. It will be in-convenient for users when configuring different processors. However, ML Commons plugin supports connecting to a foundation model hosted on an external platform and uploading your own pre-trained model to the OpenSearch cluster, users can utilize the model_id from ml-commons plugins to apply in search/ingest process. We can simplified the multiple implementations of search processors/ingest processors that use a machine learning model.
Motivation:
To Improve the ease of using machine learning models to process ingest/search requests, we are introducing a Machine Learning Inference Processor to OpenSearch-ml-common to uses a machine learning model to read from the data and add the prediction outcome to the data that is being ingested through the ingest pipeline, or return the prediction outcomes along with the search response that is returned through the search pipeline.
Scope:
Out of Scope:
-ML inference Processor is focusing on model inferences and does not handle data transformations. Developers would consider data formatting methods before using ML Inference Processors if the documents and search hit does not match the format of model input. For example, adding the preprocess methods to model connectors (example), utilizing data transforming processing (e.g Split Processors, JsonProcessor ).
Proposed Design:
Create ML Inference processors(ingest side), ML Inference search requests processors, ML Inference search response processors that share the same parameters and extend from same interface that handles getModelInferenceResult.
ML Inference Processors parameters:
true
for local models andfalse
for externally hosted modelsSample Process:
using the following example for a text embedding remote model wupL7Y0Bm1mYgYg_PasK that is connected in ml-common,
Added after gathering feedbacks for different use cases,
0. using multiple rounds of predictions
Sometimes, a model only accept one model input fields, and we would like to predicts on multiple fields, we need to run the model multiple times. The inference processors can run one model with multiple inference.
in this setting, it will run the model twice and mapping the output accordingly to two document fields.
the sample response would be
Handling object type model input:
for example, I want to use a language classification model , this model's predict function is expecting an object, in the format of {"input": ["text"]}, this can be a good complicated case, it's object with a map of list.
1. Using the one field as ml input
then in the ml connector, if the use case is using the one field from the document to identify we can define the connectors and in the response body
in this request body, it helps formatting the parameters.input field into the desire model input format. In using this model to predict in ml common, we don't need to worry about the format of an object of a map with list, instead, we use parameters.input as we defined in the connectors.
Let's use the inference processors during ingestions for
reviews
field:Now the documents getting ingest already has the model output field named
reviews_language_classification
Since I turned on dynamic mapping, to use the language label field, we can search using doc path
reviews_language_classification.label
in search queries .2. Using the multiple fields as ml input
Another use case, @zhichao-aws also mentioned that many models now accepts multiple model input fields, nowadays, text_embedding models and classification models accepts multiple model inputs, we just need to config the connectors properly to meet this multiple input fields requirements. Similarly in the response body, we config two input fields.
In this case, it's looking for two input fields to the document, and format properly, the ml_inference processor will handle the mappings for reviews -> input1, products -> input2.
3. Formatting with other processors.
in the connectors, it's supported writing post_process_function, and also we can use other processors before ml_inference_processor to handle model input field format or after to handle model output format.
I am continuing the for the second step, and would like parse the model output field
reviews_products_language_classification
into two fields. this is to seperate an array and append to new fields, we can use a script processor and remove processor to bundle with it. In the future, we can also add a new type of processor maybe called "seperate_append" processor then it would be easier to use.let's modify the ingest pipeline for the same index
then when ingesting the same document, it returns
The text was updated successfully, but these errors were encountered: