Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Adds adaptive_allocations to inference and trained model API docs #111476

Merged
merged 10 commits into from
Aug 1, 2024
49 changes: 48 additions & 1 deletion docs/reference/inference/service-elasticsearch.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,23 @@ include::inference-shared.asciidoc[tag=service-settings]
These settings are specific to the `elasticsearch` service.
--

`adaptive_allocation`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
If `adaptive_allocation` is enabled, do not set the value of `num_allocations`.

`enabled`::::
(Optional, Boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]

`max_number_of_allocations`::::
(Required, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]

`min_number_of_allocations`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]

`model_id`:::
(Required, string)
The name of the model to use for the {infer} task.
Expand All @@ -59,7 +76,9 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal

`num_allocations`:::
(Required, integer)
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
The total number of allocations this model is assigned across machine learning nodes.
Increasing this value generally increases the throughput.
If `adaptive_allocation` is enabled, do not set this value.

`num_threads`:::
(Required, integer)
Expand Down Expand Up @@ -137,3 +156,31 @@ PUT _inference/text_embedding/my-msmarco-minilm-model <1>
<1> Provide an unique identifier for the inference endpoint. The `inference_id` must be unique and must not match the `model_id`.
<2> The `model_id` must be the ID of a text embedding model which has already been
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].

[discrete]
[[inference-example-adaptive-allocation]]
==== Setting adaptive allocation for E5 via the `elasticsearch` service

The following example shows how to create an {infer} endpoint called
`my-e5-model` to perform a `text_embedding` task type and configure adaptive
allocation.

The API request below will automatically download the E5 model if it isn't
already downloaded and then deploy the model.

[source,console]
------------------------------------------------------------
PUT _inference/text_embedding/my-e5-model
{
"service": "elasticsearch",
"service_settings": {
"adaptive_allocation": {
"enabled": true,
"max_number_of_allocations": 10,
"min_number_of_allocations": 3
},
"model_id": ".multilingual-e5-small"
}
}
------------------------------------------------------------
// TEST[skip:TBD]
48 changes: 47 additions & 1 deletion docs/reference/inference/service-elser.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,28 @@ include::inference-shared.asciidoc[tag=service-settings]
These settings are specific to the `elser` service.
--

`adaptive_allocation`:::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
If `adaptive_allocation` is enabled, do not set the value of `num_allocations`.

`enabled`::::
(Optional, Boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]

`max_number_of_allocations`::::
(Required, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]

`min_number_of_allocations`::::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]

`num_allocations`:::
(Required, integer)
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
The total number of allocations this model is assigned across machine learning nodes.
Increasing this value generally increases the throughput.
If `adaptive_allocation` is enabled, do not set this value.

`num_threads`:::
(Required, integer)
Expand Down Expand Up @@ -107,3 +126,30 @@ This error usually just reflects a timeout, while the model downloads in the bac
You can check the download progress in the {ml-app} UI.
If using the Python client, you can set the `timeout` parameter to a higher value.
====

[discrete]
[[inference-example-elser-adaptive-allocation]]
==== Setting adaptive allocation for the ELSER service

The following example shows how to create an {infer} endpoint called
`my-elser-model` to perform a `sparse_embedding` task type and configure
adaptive allocation.

The request below will automatically download the ELSER model if it isn't
already downloaded and then deploy the model.

[source,console]
------------------------------------------------------------
PUT _inference/sparse_embedding/my-elser-model
{
"service": "elser",
"service_settings": {
"adaptive_allocation": {
"enabled": true,
"max_number_of_allocations": 10,
"min_number_of_allocations": 3
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]
23 changes: 23 additions & 0 deletions docs/reference/ml/ml-shared.asciidoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
tag::adaptive-allocation[]
Adaptive allocation configuration object.
If enabled, the number of allocations of the trained model is set based on the current load the process gets.
When the load exceeds 90% of the current capacity, a new model allocation is automatically created (respecting the value of `max_number_of_allocations` if it's set).
When the load is less than 85% of the current capacity, a model allocation is automatically removed (respecting the value of `max_number_of_allocations` if it's set).
The number of model allocations cannot be scaled down to less than `1` this way.
end::adaptive-allocation[]

tag::adaptive-allocation-enabled[]
If `true`, `adaptive_allocation` is enabled.
Defaults to `false`.
end::adaptive-allocation-enabled[]

tag::adaptive-allocation-max-number[]
Specifies the maximum number of allocations to scale to.
If set, it must be greater than or equal to `min_number_of_allocations`.
end::adaptive-allocation-max-number[]

tag::adaptive-allocation-min-number[]
Specifies the minimum number of allocations to scale to.
If set, it must be greater than or equal to `1`.
end::adaptive-allocation-min-number[]

tag::aggregations[]
If set, the {dfeed} performs aggregation searches. Support for aggregations is
limited and should be used only with low cardinality data. For more information,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,23 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=model-id]
[[start-trained-model-deployment-query-params]]
== {api-query-parms-title}

`adaptive_allocation`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
If `adaptive_allocation` is enabled, do not set the value of `number_of_allocations`.

`enabled`:::
(Optional, Boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]

`max_number_of_allocations`:::
(Required, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]

`min_number_of_allocations`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]

`cache_size`::
(Optional, <<byte-units,byte value>>)
The inference cache size (in memory outside the JVM heap) per node for the
Expand All @@ -73,7 +90,8 @@ Defaults to `model_id`.
`number_of_allocations`::
(Optional, integer)
The total number of allocations this model is assigned across {ml} nodes.
Increasing this value generally increases the throughput. Defaults to 1.
Increasing this value generally increases the throughput. Defaults to `1`.
If `adaptive_allocation` is enabled, do not set this value.

`priority`::
(Optional, string)
Expand Down Expand Up @@ -182,3 +200,24 @@ The `my_model` trained model can be deployed again with a different ID:
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
--------------------------------------------------
// TEST[skip:TBD]


[[start-trained-model-deployment-adaptive-allocation-example]]
=== Setting adaptive allocations

The following example starts a new deployment of the `my_model` trained model
with the ID `my_model_for_search` and enables adaptive allocations with the
minimum number of 3 allocations and the maximum number of 10.

[source,console]
--------------------------------------------------
POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
{
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
}
}
--------------------------------------------------
// TEST[skip:TBD]
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,35 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
[[update-trained-model-deployment-request-body]]
== {api-request-body-title}

`adaptive_allocation`::
(Optional, object)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
If `adaptive_allocation` is enabled, do not set the value of `number_of_allocations`.

`enabled`:::
(Optional, Boolean)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]

`max_number_of_allocations`:::
(Required, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]

`min_number_of_allocations`:::
(Optional, integer)
include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]

`number_of_allocations`::
(Optional, integer)
The total number of allocations this model is assigned across {ml} nodes.
Increasing this value generally increases the throughput.
If `adaptive_allocation` is enabled, do not set this value.


[[update-trained-model-deployment-example]]
== {api-examples-title}

The following example updates the deployment for a
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -84,3 +102,20 @@ The API returns the following results:
}
}
----

The following example updates the deployment for a
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to
enable adaptive allocation:

[source,console]
--------------------------------------------------
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update
{
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 3,
"max_number_of_allocations": 10
}
}
--------------------------------------------------
// TEST[skip:TBD]