elastic · szabosteve · Aug 1, 2024 · Jul 31, 2024 · Jul 31, 2024 · Jul 31, 2024
diff --git a/docs/reference/inference/service-elasticsearch.asciidoc b/docs/reference/inference/service-elasticsearch.asciidoc
@@ -51,6 +51,23 @@ include::inference-shared.asciidoc[tag=service-settings]
 These settings are specific to the `elasticsearch` service.
 --
 
+`adaptive_allocation`:::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
+If `adaptive_allocation` is enabled, do not set the value of `num_allocations`.
+
+`enabled`::::
+(Optional, Boolean)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
+
+`max_number_of_allocations`::::
+(Required, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
+
+`min_number_of_allocations`::::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
+
 `model_id`:::
 (Required, string)
 The name of the model to use for the {infer} task.
@@ -59,7 +76,9 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal
 
 `num_allocations`:::
 (Required, integer)
-The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
+The total number of allocations this model is assigned across machine learning nodes.
+Increasing this value generally increases the throughput.
+If `adaptive_allocation` is enabled, do not set this value.
 
 `num_threads`:::
 (Required, integer)
@@ -137,3 +156,31 @@ PUT _inference/text_embedding/my-msmarco-minilm-model <1>
 <1> Provide an unique identifier for the inference endpoint. The `inference_id` must be unique and must not match the `model_id`.
 <2> The `model_id` must be the ID of a text embedding model which has already been
 {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
+
+[discrete]
+[[inference-example-adaptive-allocation]]
+==== Setting adaptive allocation for E5 via the `elasticsearch` service
+
+The following example shows how to create an {infer} endpoint called
+`my-e5-model` to perform a `text_embedding` task type and configure adaptive
+allocation.
+
+The API request below will automatically download the E5 model if it isn't
+already downloaded and then deploy the model.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/my-e5-model
+{
+  "service": "elasticsearch",
+  "service_settings": {
+    "adaptive_allocation": {
+      "enabled": true,
+      "max_number_of_allocations": 10,
+      "min_number_of_allocations": 3
+    },
+    "model_id": ".multilingual-e5-small"
+  }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
diff --git a/docs/reference/inference/service-elser.asciidoc b/docs/reference/inference/service-elser.asciidoc
@@ -48,9 +48,28 @@ include::inference-shared.asciidoc[tag=service-settings]
 These settings are specific to the `elser` service.
 --
 
+`adaptive_allocation`:::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
+If `adaptive_allocation` is enabled, do not set the value of `num_allocations`.
+
+`enabled`::::
+(Optional, Boolean)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
+
+`max_number_of_allocations`::::
+(Required, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
+
+`min_number_of_allocations`::::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
+
 `num_allocations`:::
 (Required, integer)
-The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
+The total number of allocations this model is assigned across machine learning nodes.
+Increasing this value generally increases the throughput.
+If `adaptive_allocation` is enabled, do not set this value.
 
 `num_threads`:::
 (Required, integer)
@@ -107,3 +126,30 @@ This error usually just reflects a timeout, while the model downloads in the bac
 You can check the download progress in the {ml-app} UI.
 If using the Python client, you can set the `timeout` parameter to a higher value.
 ====
+
+[discrete]
+[[inference-example-elser-adaptive-allocation]]
+==== Setting adaptive allocation for the ELSER service
+
+The following example shows how to create an {infer} endpoint called
+`my-elser-model` to perform a `sparse_embedding` task type and configure
+adaptive allocation.
+
+The request below will automatically download the ELSER model if it isn't
+already downloaded and then deploy the model.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/sparse_embedding/my-elser-model
+{
+  "service": "elser",
+  "service_settings": {
+    "adaptive_allocation": {
+      "enabled": true,
+      "max_number_of_allocations": 10,
+      "min_number_of_allocations": 3
+    }
+  }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
diff --git a/docs/reference/ml/ml-shared.asciidoc b/docs/reference/ml/ml-shared.asciidoc
@@ -1,3 +1,26 @@
+tag::adaptive-allocation[]
+Adaptive allocation configuration object.
+If enabled, the number of allocations of the trained model is set based on the current load the process gets.
+When the load exceeds 90% of the current capacity, a new model allocation is automatically created (respecting the value of `max_number_of_allocations` if it's set).
+When the load is less than 85% of the current capacity, a model allocation is automatically removed (respecting the value of `max_number_of_allocations` if it's set).
+The number of model allocations cannot be scaled down to less than `1` this way.
+end::adaptive-allocation[]
+
+tag::adaptive-allocation-enabled[]
+If `true`, `adaptive_allocation` is enabled.
+Defaults to `false`.
+end::adaptive-allocation-enabled[]
+
+tag::adaptive-allocation-max-number[]
+Specifies the maximum number of allocations to scale to.
+If set, it must be greater than or equal to `min_number_of_allocations`.
+end::adaptive-allocation-max-number[]
+
+tag::adaptive-allocation-min-number[]
+Specifies the minimum number of allocations to scale to.
+If set, it must be greater than or equal to `1`.
+end::adaptive-allocation-min-number[]
+
 tag::aggregations[]
 If set, the {dfeed} performs aggregation searches. Support for aggregations is
 limited and should be used only with low cardinality data. For more information,

diff --git a/docs/reference/ml/trained-models/apis/start-trained-model-deployment.asciidoc b/docs/reference/ml/trained-models/apis/start-trained-model-deployment.asciidoc
@@ -58,6 +58,23 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=model-id]
 [[start-trained-model-deployment-query-params]]
 == {api-query-parms-title}
 
+`adaptive_allocation`::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
+If `adaptive_allocation` is enabled, do not set the value of `number_of_allocations`.
+
+`enabled`:::
+(Optional, Boolean)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
+
+`max_number_of_allocations`:::
+(Required, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
+
+`min_number_of_allocations`:::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
+
 `cache_size`::
 (Optional, <<byte-units,byte value>>)
 The inference cache size (in memory outside the JVM heap) per node for the
@@ -73,7 +90,8 @@ Defaults to `model_id`.
 `number_of_allocations`::
 (Optional, integer)
 The total number of allocations this model is assigned across {ml} nodes.
-Increasing this value generally increases the throughput. Defaults to 1.
+Increasing this value generally increases the throughput. Defaults to `1`.
+If `adaptive_allocation` is enabled, do not set this value.
 
 `priority`::
 (Optional, string)
@@ -182,3 +200,24 @@ The `my_model` trained model can be deployed again with a different ID:
 POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
 --------------------------------------------------
 // TEST[skip:TBD]
+
+
+[[start-trained-model-deployment-adaptive-allocation-example]]
+=== Setting adaptive allocations
+
+The following example starts a new deployment of the `my_model` trained model
+with the ID `my_model_for_search` and enables adaptive allocations with the
+minimum number of 3 allocations and the maximum number of 10. 
+
+[source,console]
+--------------------------------------------------
+POST _ml/trained_models/my_model/deployment/_start?deployment_id=my_model_for_search
+{
+  "adaptive_allocations": {
+    "enabled": true,
+    "min_number_of_allocations": 3,
+    "max_number_of_allocations": 10
+  }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
diff --git a/docs/reference/ml/trained-models/apis/update-trained-model-deployment.asciidoc b/docs/reference/ml/trained-models/apis/update-trained-model-deployment.asciidoc
@@ -37,17 +37,35 @@ include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=deployment-id]
 [[update-trained-model-deployment-request-body]]
 == {api-request-body-title}
 
+`adaptive_allocation`::
+(Optional, object)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation]
+If `adaptive_allocation` is enabled, do not set the value of `number_of_allocations`.
+
+`enabled`:::
+(Optional, Boolean)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-enabled]
+
+`max_number_of_allocations`:::
+(Required, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-max-number]
+
+`min_number_of_allocations`:::
+(Optional, integer)
+include::{es-ref-dir}/ml/ml-shared.asciidoc[tag=adaptive-allocation-min-number]
+
 `number_of_allocations`::
 (Optional, integer)
 The total number of allocations this model is assigned across {ml} nodes.
 Increasing this value generally increases the throughput.
+If `adaptive_allocation` is enabled, do not set this value.
 
 
 [[update-trained-model-deployment-example]]
 == {api-examples-title}
 
 The following example updates the deployment for a
- `elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
+`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to have 4 allocations:
 
 [source,console]
 --------------------------------------------------
@@ -84,3 +102,20 @@ The API returns the following results:
     }
 }
 ----
+
+The following example updates the deployment for a
+`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model to
+enable adaptive allocation:
+
+[source,console]
+--------------------------------------------------
+POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_update
+{
+  "adaptive_allocations": {
+    "enabled": true,
+    "min_number_of_allocations": 3,
+    "max_number_of_allocations": 10
+  }
+}
+--------------------------------------------------
+// TEST[skip:TBD]