Inference task type endpoints #3545

jonathan-buttner · 2025-01-16T20:54:25Z

This PR makes breaking changes to the client for the inference API. Prior to this PR we had a single endpoint for most task types supported in the inference API: _inference/<optional_task_type>/<inference id>. After discussion with @swallez we decided to make the task type required in the URL. This way we could have separate requests and responses for each task type.

This PR does not include another item of work to make well defined task_settings for each route. Correct me if I'm wrong, but I don't believe that would be a breaking change? If it is not a breaking change, I think we can defer that work until later.

jonathan-buttner · 2025-01-16T20:55:31Z

specification/_json_spec/inference.stream_completion.json

@@ -1,5 +1,5 @@
 {
-  "inference.stream_inference": {
+  "inference.stream_completion": {


In the future we might have a streaming endpoint for text embeddings for example.

jonathan-buttner · 2025-01-16T20:58:31Z

specification/inference/_types/Results.ts

+ */
+export class SparseEmbeddingInferenceResult {
+  // TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant
+  sparse_embedding: Array<SparseEmbeddingResult>


I could see us having a variant here for a different type of response (like byte encoding for text embedding). That would be returned using the same URL so it wouldn't be a new response. Should we make this a variant and make sparse_embedding optional?

I suppose changing some from required to optional in the future would be a breaking change right?

For text embeddings, the pattern used in the InferenceResult class (also in this file) is to have a different variant for each type.

text_embedding_bytes?: Array<TextEmbeddingByteResult> text_embedding_bits?: Array<TextEmbeddingByteResult> text_embedding?: Array<TextEmbeddingResult>

Sparse would be the same:

sparse_embedding?: Array<SparseEmbeddingResult> sparse_embedding_byte?: Array<SparseEmbeddingByteResult>

Actually when I make the sparse embedding type a variant, I get an error that indicates there must be multiple fields in the type to be able to leverage the variant type. So I think we can make this change when we need to.

jonathan-buttner · 2025-01-16T20:59:44Z

specification/inference/_types/Results.ts

+ * TextEmbeddingInferenceResult is an aggregation of mutually exclusive text_embedding variants
+ * @variants container
+ */
+export class TextEmbeddingInferenceResult {


Same thing here, one URL multiple response formats so keeping this as it was.

jonathan-buttner · 2025-01-16T21:03:46Z

specification/inference/_types/Results.ts

+/**
+ * Defines the completion result.
+ */
+export class CompletionInferenceResult {


I'm open to other ideas for naming the classes. *Result was already taken for everything for the nested field which is why I went with *InferenceResult.

jonathan-buttner · 2025-01-16T21:05:08Z

specification/inference/rerank/RerankRequest.ts

+    /**
+     * Query input.
+     */
+    query: string


query is required for the rerank task type.

jonathan-buttner · 2025-01-16T21:05:57Z

specification/inference/stream_completion/StreamInferenceRequest.ts

+    /**
+     * Optional task settings
+     */
+    task_settings?: TaskSettings


Adding this because I think it was missing before.

github-actions · 2025-01-16T21:16:13Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

davidkyle · 2025-01-20T12:02:38Z

specification/_json_spec/inference.chat_completion_inference.json

+    "url": {
+      "paths": [
+        {
+          "path": "/_inference/chat_completion/{inference_id}/_unified",


Shouldn't this be _stream

Suggested change

"path": "/_inference/chat_completion/{inference_id}/_unified",

"path": "/_inference/chat_completion/{inference_id}/_stream",

No, it does have _unified at the end. I think technically we could remove it since the client code doesn't need the results to be in a specific format for SSE.

pquentin · 2025-01-21T12:30:14Z

Just before merging this, please add the _json_spec contents to the rest-api-spec folder in the Elasticsearch repository. We're working on making this repository the source of truth, but we're not there yet and we currently have tooling that syncs rest-api-spec to _json_spec and breaks if they don't match.

… into ml-inference-task-type-separation

github-actions · 2025-01-28T18:01:39Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

github-actions · 2025-01-28T18:01:53Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

jonathan-buttner · 2025-01-28T18:09:56Z

_json_spec

Sorry for the delay, getting back to this now. I created the PR here: elastic/elasticsearch#121078

Waiting to see if I need to fix the formatting 🤔 I just copied the files directly 🤷‍♂️

github-actions · 2025-01-28T19:32:50Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

davidkyle

Please add chat_completion to the TaskType enum https://github.com/elastic/elasticsearch-specification/blob/main/specification/inference/_types/TaskType.ts#L27

davidkyle · 2025-02-05T10:06:19Z

specification/_json_spec/inference.chat_completion_unified.json

+    "url": {
+      "paths": [
+        {
+          "path": "/_inference/chat_completion/{inference_id}/_unified",


Suggested change

"path": "/_inference/chat_completion/{inference_id}/_unified",

"path": "/_inference/chat_completion/{inference_id}/_stream",

davidkyle · 2025-02-05T10:32:54Z

specification/inference/_types/Results.ts

+ */
+export class SparseEmbeddingInferenceResult {
+  // TODO should we make this optional if we ever support multiple encoding types? So we can make it a variant
+  sparse_embedding: Array<SparseEmbeddingResult>


For text embeddings, the pattern used in the InferenceResult class (also in this file) is to have a different variant for each type.

text_embedding_bytes?: Array<TextEmbeddingByteResult> text_embedding_bits?: Array<TextEmbeddingByteResult> text_embedding?: Array<TextEmbeddingResult>

Sparse would be the same:

sparse_embedding?: Array<SparseEmbeddingResult> sparse_embedding_byte?: Array<SparseEmbeddingByteResult>

github-actions · 2025-02-05T13:32:27Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

github-actions · 2025-02-05T13:50:25Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

github-actions · 2025-02-05T13:53:50Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

github-actions · 2025-02-05T15:00:31Z

Following you can find the validation results for the APIs you have changed.

API	Status	Request	Response
`inference.chat_completion_unified`	⚪	Missing test	Missing test
`inference.completion`	⚪	Missing test	Missing test
`inference.delete`	⚪	Missing test	Missing test
`inference.get`	🟢	1/1	1/1
`inference.put`	⚪	Missing test	Missing test
`inference.rerank`	⚪	Missing test	Missing test
`inference.sparse_embedding`	⚪	Missing test	Missing test
`inference.stream_completion`	⚪	Missing test	Missing test
`inference.text_embedding`	⚪	Missing test	Missing test
`inference.update`	⚪	Missing test	Missing test

You can validate these APIs yourself by using the make validate target.

jonathan-buttner · 2025-02-05T17:44:23Z

Created a docs request here to address the docs migration failure: elastic/docs-content#339

Refactoring inference endpoints

687063f

jonathan-buttner added specification backport 8.x labels Jan 16, 2025

jonathan-buttner requested a review from a team as a code owner January 16, 2025 20:54

jonathan-buttner commented Jan 16, 2025

View reviewed changes

jonathan-buttner added 2 commits January 16, 2025 16:06

Fixing stream completion url and removing the old url and class

0442e31

generating spec

05864d4

jonathan-buttner requested review from prwhelan, davidkyle and dan-rubinstein January 16, 2025 21:17

davidkyle reviewed Jan 20, 2025

View reviewed changes

davidkyle mentioned this pull request Jan 20, 2025

Add rest-api-spec for unified inference API elastic/elasticsearch#120447

Merged

jonathan-buttner added 2 commits January 28, 2025 12:00

Merge branch 'main' of github.com:elastic/elasticsearch-specification…

78ce8a1

… into ml-inference-task-type-separation

Adding doc id

749c78c

jonathan-buttner mentioned this pull request Jan 28, 2025

[ML] [Inference API] Adding client API spec files for task type in URL elastic/elasticsearch#121078

Open

Renaming to match filename

02219ba

davidkyle added backport 8.18 backport 9.0 labels Feb 5, 2025

davidkyle reviewed Feb 5, 2025

View reviewed changes

jonathan-buttner added 2 commits February 5, 2025 08:28

Resolving merge conflicts

bf22312

Switching to stream and regenerating files

bd16539

Using variant and adding _stream

797d6b5

Removing variant

bc1a277

Adding chat_completion and fixing update api

14edb60

jonathan-buttner mentioned this pull request Feb 5, 2025

[REQUEST]: Inference API removing references to the _unified URL suffix elastic/docs-content#339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference task type endpoints #3545

Inference task type endpoints #3545

jonathan-buttner commented Jan 16, 2025 •

edited

Loading

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

davidkyle Feb 5, 2025

jonathan-buttner Feb 5, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

jonathan-buttner Jan 16, 2025

github-actions bot commented Jan 16, 2025

davidkyle Jan 20, 2025

jonathan-buttner Jan 21, 2025

pquentin commented Jan 21, 2025

github-actions bot commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

jonathan-buttner commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

davidkyle left a comment

davidkyle Feb 5, 2025

davidkyle Feb 5, 2025

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

jonathan-buttner commented Feb 5, 2025

	"path": "/_inference/chat_completion/{inference_id}/_unified",
	"path": "/_inference/chat_completion/{inference_id}/_stream",

Inference task type endpoints #3545

Are you sure you want to change the base?

Inference task type endpoints #3545

Conversation

jonathan-buttner commented Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pquentin commented Jan 21, 2025

github-actions bot commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

jonathan-buttner commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

github-actions bot commented Feb 5, 2025

jonathan-buttner commented Feb 5, 2025

jonathan-buttner commented Jan 16, 2025 •

edited

Loading