-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference task type endpoints #3545
base: main
Are you sure you want to change the base?
Conversation
@@ -1,5 +1,5 @@ | |||
{ | |||
"inference.stream_inference": { | |||
"inference.stream_completion": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we might have a streaming endpoint for text embeddings for example.
*/ | ||
export class SparseEmbeddingInferenceResult { | ||
// TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant | ||
sparse_embedding: Array<SparseEmbeddingResult> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see us having a variant here for a different type of response (like byte encoding for text embedding). That would be returned using the same URL so it wouldn't be a new response. Should we make this a variant and make sparse_embedding
optional?
I suppose changing some from required to optional in the future would be a breaking change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For text embeddings, the pattern used in the InferenceResult
class (also in this file) is to have a different variant for each type.
text_embedding_bytes?: Array<TextEmbeddingByteResult>
text_embedding_bits?: Array<TextEmbeddingByteResult>
text_embedding?: Array<TextEmbeddingResult>
Sparse would be the same:
sparse_embedding?: Array<SparseEmbeddingResult>
sparse_embedding_byte?: Array<SparseEmbeddingByteResult>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually when I make the sparse embedding type a variant, I get an error that indicates there must be multiple fields in the type to be able to leverage the variant type. So I think we can make this change when we need to.
* TextEmbeddingInferenceResult is an aggregation of mutually exclusive text_embedding variants | ||
* @variants container | ||
*/ | ||
export class TextEmbeddingInferenceResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, one URL multiple response formats so keeping this as it was.
/** | ||
* Defines the completion result. | ||
*/ | ||
export class CompletionInferenceResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to other ideas for naming the classes. *Result
was already taken for everything for the nested field which is why I went with *InferenceResult
.
/** | ||
* Query input. | ||
*/ | ||
query: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query
is required for the rerank task type.
/** | ||
* Optional task settings | ||
*/ | ||
task_settings?: TaskSettings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this because I think it was missing before.
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
"url": { | ||
"paths": [ | ||
{ | ||
"path": "/_inference/chat_completion/{inference_id}/_unified", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be _stream
"path": "/_inference/chat_completion/{inference_id}/_unified", | |
"path": "/_inference/chat_completion/{inference_id}/_stream", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it does have _unified
at the end. I think technically we could remove it since the client code doesn't need the results to be in a specific format for SSE.
Just before merging this, please add the |
… into ml-inference-task-type-separation
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
1 similar comment
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Sorry for the delay, getting back to this now. I created the PR here: elastic/elasticsearch#121078 Waiting to see if I need to fix the formatting 🤔 I just copied the files directly 🤷♂️ |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add chat_completion
to the TaskType enum https://github.com/elastic/elasticsearch-specification/blob/main/specification/inference/_types/TaskType.ts#L27
"url": { | ||
"paths": [ | ||
{ | ||
"path": "/_inference/chat_completion/{inference_id}/_unified", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"path": "/_inference/chat_completion/{inference_id}/_unified", | |
"path": "/_inference/chat_completion/{inference_id}/_stream", |
*/ | ||
export class SparseEmbeddingInferenceResult { | ||
// TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant | ||
sparse_embedding: Array<SparseEmbeddingResult> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For text embeddings, the pattern used in the InferenceResult
class (also in this file) is to have a different variant for each type.
text_embedding_bytes?: Array<TextEmbeddingByteResult>
text_embedding_bits?: Array<TextEmbeddingByteResult>
text_embedding?: Array<TextEmbeddingResult>
Sparse would be the same:
sparse_embedding?: Array<SparseEmbeddingResult>
sparse_embedding_byte?: Array<SparseEmbeddingByteResult>
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Following you can find the validation results for the APIs you have changed.
You can validate these APIs yourself by using the |
Created a docs request here to address the docs migration failure: elastic/docs-content#339 |
This PR makes breaking changes to the client for the inference API. Prior to this PR we had a single endpoint for most task types supported in the inference API:
_inference/<optional_task_type>/<inference id>
. After discussion with @swallez we decided to make the task type required in the URL. This way we could have separate requests and responses for each task type.This PR does not include another item of work to make well defined
task_settings
for each route. Correct me if I'm wrong, but I don't believe that would be a breaking change? If it is not a breaking change, I think we can defer that work until later.