From e6ae5fcd7c8ae2b17e3276dc5d868380868eafb9 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Mon, 8 Jan 2024 16:16:32 +0100 Subject: [PATCH 01/10] Add docs for Datadog scaler with Cluster Agent Signed-off-by: Ara Pulido --- content/docs/2.13/scalers/datadog.md | 168 +++++++++++++++++++++++++-- 1 file changed, 160 insertions(+), 8 deletions(-) diff --git a/content/docs/2.13/scalers/datadog.md b/content/docs/2.13/scalers/datadog.md index 49115a514..d0e6ed390 100644 --- a/content/docs/2.13/scalers/datadog.md +++ b/content/docs/2.13/scalers/datadog.md @@ -12,15 +12,166 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). +There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. + +## Using the Datadog Cluster Agent + +With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API. + +### Deploy the Datadog Cluster Agent with enabled external metrics + +First, deploy the Datadog Cluster Agent enabling the external metrics provider, but without registering it as an `APIService` (to avoid clashing with KEDA). + +If you are using Helm to deploy the Cluster Agent, set: + +* `clusterAgent.metricsProvider.enabled` to `true` +* `clusterAgent.metricsProvider.registerAPIService` to `false` + +If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: + +``` +apiVersion: datadoghq.com/v2alpha1 +kind: DatadogAgent +metadata: + name: datadog +spec: + features: + externalMetricsServer: + enabled: true + useDatadogMetrics: true + registerAPIService: false +[...] +``` + +### Create a DatadogMetric object for each scaling query + +To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: + +```yaml +apiVersion: datadoghq.com/v1alpha1 +kind: DatadogMetric +metadata: + annotations: + external-metrics.datadoghq.com/always-active: "true" + name: nginx-hits +spec: + query: sum:nginx.net.request_per_s{kube_deployment:nginx} +``` + +### Trigger Specification + +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog Cluster Agent as proxy. + +```yaml +triggers: +- type: datadog + metricType: Value + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "7.75" + activationQueryValue: "1.1" + type: "global" # Deprecated in favor of trigger.metricType + metricUnavailableValue: "1.5" +``` + +**Parameter list:** + +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Values: true, false, Default: false, Optional) +- `datadogMetricName` - The name of the `DatadogMetric` object to drive the scaling events. +- `datadogMetricNamespace` - The namespace of the `DatadogMetric` object to drive the scaling events. +- `targetValue` - Value to reach to start scaling (This value can be a float). +- `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) +- `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) +- `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) +- `lastAvailablePointOffset`: The offset to retrieve the X to last data point. The value of last data point of some queries might be inaccurate [because of the implicit rollup function](https://docs.datadoghq.com/dashboards/functions/rollup/#rollup-interval-enforced-vs-custom), try to adjust to `1` if you encounter this issue. (Default: `0`, Optional) +- `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) + +> 💡 **NOTE:** The `type` parameter is deprecated in favor of the global `metricType` and will be removed in a future release. Users are advised to use `metricType` instead. + +### Authentication + +The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. + +You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters + along with secret credentials in `TriggerAuthentication` as mentioned below: + +**Bearer authentication:** +- `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) +- `token` - Token that should be placed in the `Authorization` header. The header will be `Authorization: Bearer {token}`. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. +- `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. +- `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) + +### Example + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-config + namespace: my-project +type: Opaque +data: + datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed + unsafeSsl: # Optional: base64 encoded value of `true` or `false` + authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) +--- +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: datadog-cluster-agent-creds + namespace: my-project +spec: + secretTargetRef: + - parameter: token + name: dd-cluster-agent-token + key: token + - parameter: datadogNamespace + name: datadog-config + key: namespace + - parameter: unsafeSsl + name: datadog-config + key: unsafeSsl + - parameter: authMode + name: datadog-config + key: authMode +--- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: datadog-scaledobject + namespace: my-project +spec: + scaleTargetRef: + name: nginx + maxReplicaCount: 3 + minReplicaCount: 1 + pollingInterval: 60 + triggers: + - type: datadog + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "2" + type: "global" + authenticationRef: + name: datadog-cluster-agent-creds +``` + +## Using the Datadog REST API + ### Trigger Specification -This specification describes the `datadog` trigger that scales based on a Datadog metric. +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog REST API. ```yaml triggers: - type: datadog metricType: Value metadata: + useClusterAgentProxy: "false" query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" @@ -34,6 +185,7 @@ triggers: **Parameter list:** +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Default: false) - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) @@ -123,7 +275,7 @@ spec: name: keda-trigger-auth-datadog-secret ``` -## Polling intervals and Datadog rate limiting +### Polling intervals and Datadog rate limiting [API Datadog endpoints are rate limited](https://docs.datadoghq.com/api/latest/rate-limits/). Depending on the @@ -145,14 +297,14 @@ was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. -## Multi-Query Support +### Multi-Query Support To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). > 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. -### Example 1 - Aggregating Similar Metrics +#### Example 1 - Aggregating Similar Metrics Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). @@ -187,7 +339,7 @@ The example above looks at the `http.requests` value for a service; taking two v This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. -### Example 2 - Driving scale directly +#### Example 2 - Driving scale directly When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. @@ -223,9 +375,9 @@ spec: Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. -## Cases of unexpected metrics value in DataDog API response +### Cases of unexpected metrics value in DataDog API response -### Latest data point is unavailable +#### Latest data point is unavailable By default, Datadog scaler retrieves the metrics with time window from `now - metadata.age (in seconds)` to `now`, however, some kinds of queries need a small delay (usually 30 secs - 2 mins) before data is available when querying from the API. In this case, adjust `timeWindowOffset` to ensure that the latest point of your query is always available. @@ -255,7 +407,7 @@ spec: ``` Check [here](https://github.com/kedacore/keda/pull/3954#discussion_r1042820206) for the details of this issue -### The value of last data point is inaccurate +#### The value of last data point is inaccurate Datadog implicitly rolls up data points automatically with the `avg` method, effectively displaying the average of all data points within a time interval for a given metric. Essentially, there is a rollup for each point. The values at the end attempt to have the rollup applied. When this occurs, it looks at a X second bucket according to your time window, and will default average those values together. Since this is the last point in the query, there are no other values to average with in that X second bucket. This leads to the value of last data point that was not rolled up in the same fashion as the others, and leads to an inaccurate number. In these cases, adjust `lastAvailablePointOffset` to 1 to use the second to last points of an API response would be the most accurate. From 2c5f8a0e15fd9fc3a85ec79c94f7e65a81b863a4 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Tue, 9 Jan 2024 16:12:04 +0100 Subject: [PATCH 02/10] Add optional parameters for the Cluster Agent API server Signed-off-by: Ara Pulido --- content/docs/2.13/scalers/datadog.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/content/docs/2.13/scalers/datadog.md b/content/docs/2.13/scalers/datadog.md index d0e6ed390..31df1eb9f 100644 --- a/content/docs/2.13/scalers/datadog.md +++ b/content/docs/2.13/scalers/datadog.md @@ -97,12 +97,16 @@ The Datadog scaler with Cluster Agent supports one type of authentication - Bear You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters along with secret credentials in `TriggerAuthentication` as mentioned below: -**Bearer authentication:** +**Common to all authentication types** - `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) -- `token` - Token that should be placed in the `Authorization` header. The header will be `Authorization: Bearer {token}`. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. - `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. +- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) +- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) - `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) +**Bearer authentication:** +- `token` - The ServiceAccount token to connect to the Datadog Cluster Agent. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. + ### Example ```yaml From cadb958523be2f06ff16381a7629a283da7d86bc Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Fri, 19 Jan 2024 14:47:24 +0100 Subject: [PATCH 03/10] Move new DD docs to 2.14 Signed-off-by: Ara Pulido --- content/docs/2.13/scalers/datadog.md | 172 ++------------------------- content/docs/2.14/scalers/datadog.md | 172 +++++++++++++++++++++++++-- 2 files changed, 172 insertions(+), 172 deletions(-) diff --git a/content/docs/2.13/scalers/datadog.md b/content/docs/2.13/scalers/datadog.md index 31df1eb9f..49115a514 100644 --- a/content/docs/2.13/scalers/datadog.md +++ b/content/docs/2.13/scalers/datadog.md @@ -12,170 +12,15 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). -There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. - -## Using the Datadog Cluster Agent - -With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API. - -### Deploy the Datadog Cluster Agent with enabled external metrics - -First, deploy the Datadog Cluster Agent enabling the external metrics provider, but without registering it as an `APIService` (to avoid clashing with KEDA). - -If you are using Helm to deploy the Cluster Agent, set: - -* `clusterAgent.metricsProvider.enabled` to `true` -* `clusterAgent.metricsProvider.registerAPIService` to `false` - -If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: - -``` -apiVersion: datadoghq.com/v2alpha1 -kind: DatadogAgent -metadata: - name: datadog -spec: - features: - externalMetricsServer: - enabled: true - useDatadogMetrics: true - registerAPIService: false -[...] -``` - -### Create a DatadogMetric object for each scaling query - -To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: - -```yaml -apiVersion: datadoghq.com/v1alpha1 -kind: DatadogMetric -metadata: - annotations: - external-metrics.datadoghq.com/always-active: "true" - name: nginx-hits -spec: - query: sum:nginx.net.request_per_s{kube_deployment:nginx} -``` - -### Trigger Specification - -This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog Cluster Agent as proxy. - -```yaml -triggers: -- type: datadog - metricType: Value - metadata: - useClusterAgentProxy: "true" - datadogMetricName: "nginx-hits" - datadogMetricNamespace: "default" - targetValue: "7.75" - activationQueryValue: "1.1" - type: "global" # Deprecated in favor of trigger.metricType - metricUnavailableValue: "1.5" -``` - -**Parameter list:** - -- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Values: true, false, Default: false, Optional) -- `datadogMetricName` - The name of the `DatadogMetric` object to drive the scaling events. -- `datadogMetricNamespace` - The namespace of the `DatadogMetric` object to drive the scaling events. -- `targetValue` - Value to reach to start scaling (This value can be a float). -- `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) -- `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) -- `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) -- `lastAvailablePointOffset`: The offset to retrieve the X to last data point. The value of last data point of some queries might be inaccurate [because of the implicit rollup function](https://docs.datadoghq.com/dashboards/functions/rollup/#rollup-interval-enforced-vs-custom), try to adjust to `1` if you encounter this issue. (Default: `0`, Optional) -- `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) - -> 💡 **NOTE:** The `type` parameter is deprecated in favor of the global `metricType` and will be removed in a future release. Users are advised to use `metricType` instead. - -### Authentication - -The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. - -You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters - along with secret credentials in `TriggerAuthentication` as mentioned below: - -**Common to all authentication types** -- `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) -- `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. -- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) -- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) -- `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) - -**Bearer authentication:** -- `token` - The ServiceAccount token to connect to the Datadog Cluster Agent. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. - -### Example - -```yaml -apiVersion: v1 -kind: Secret -metadata: - name: datadog-config - namespace: my-project -type: Opaque -data: - datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed - unsafeSsl: # Optional: base64 encoded value of `true` or `false` - authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) ---- -apiVersion: keda.sh/v1alpha1 -kind: TriggerAuthentication -metadata: - name: datadog-cluster-agent-creds - namespace: my-project -spec: - secretTargetRef: - - parameter: token - name: dd-cluster-agent-token - key: token - - parameter: datadogNamespace - name: datadog-config - key: namespace - - parameter: unsafeSsl - name: datadog-config - key: unsafeSsl - - parameter: authMode - name: datadog-config - key: authMode ---- -apiVersion: keda.sh/v1alpha1 -kind: ScaledObject -metadata: - name: datadog-scaledobject - namespace: my-project -spec: - scaleTargetRef: - name: nginx - maxReplicaCount: 3 - minReplicaCount: 1 - pollingInterval: 60 - triggers: - - type: datadog - metadata: - useClusterAgentProxy: "true" - datadogMetricName: "nginx-hits" - datadogMetricNamespace: "default" - targetValue: "2" - type: "global" - authenticationRef: - name: datadog-cluster-agent-creds -``` - -## Using the Datadog REST API - ### Trigger Specification -This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog REST API. +This specification describes the `datadog` trigger that scales based on a Datadog metric. ```yaml triggers: - type: datadog metricType: Value metadata: - useClusterAgentProxy: "false" query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" @@ -189,7 +34,6 @@ triggers: **Parameter list:** -- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Default: false) - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) @@ -279,7 +123,7 @@ spec: name: keda-trigger-auth-datadog-secret ``` -### Polling intervals and Datadog rate limiting +## Polling intervals and Datadog rate limiting [API Datadog endpoints are rate limited](https://docs.datadoghq.com/api/latest/rate-limits/). Depending on the @@ -301,14 +145,14 @@ was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. -### Multi-Query Support +## Multi-Query Support To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). > 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. -#### Example 1 - Aggregating Similar Metrics +### Example 1 - Aggregating Similar Metrics Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). @@ -343,7 +187,7 @@ The example above looks at the `http.requests` value for a service; taking two v This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. -#### Example 2 - Driving scale directly +### Example 2 - Driving scale directly When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. @@ -379,9 +223,9 @@ spec: Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. -### Cases of unexpected metrics value in DataDog API response +## Cases of unexpected metrics value in DataDog API response -#### Latest data point is unavailable +### Latest data point is unavailable By default, Datadog scaler retrieves the metrics with time window from `now - metadata.age (in seconds)` to `now`, however, some kinds of queries need a small delay (usually 30 secs - 2 mins) before data is available when querying from the API. In this case, adjust `timeWindowOffset` to ensure that the latest point of your query is always available. @@ -411,7 +255,7 @@ spec: ``` Check [here](https://github.com/kedacore/keda/pull/3954#discussion_r1042820206) for the details of this issue -#### The value of last data point is inaccurate +### The value of last data point is inaccurate Datadog implicitly rolls up data points automatically with the `avg` method, effectively displaying the average of all data points within a time interval for a given metric. Essentially, there is a rollup for each point. The values at the end attempt to have the rollup applied. When this occurs, it looks at a X second bucket according to your time window, and will default average those values together. Since this is the last point in the query, there are no other values to average with in that X second bucket. This leads to the value of last data point that was not rolled up in the same fashion as the others, and leads to an inaccurate number. In these cases, adjust `lastAvailablePointOffset` to 1 to use the second to last points of an API response would be the most accurate. diff --git a/content/docs/2.14/scalers/datadog.md b/content/docs/2.14/scalers/datadog.md index cf7e01d24..49c679c75 100644 --- a/content/docs/2.14/scalers/datadog.md +++ b/content/docs/2.14/scalers/datadog.md @@ -13,15 +13,170 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). +There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. + +## Using the Datadog Cluster Agent + +With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API. + +### Deploy the Datadog Cluster Agent with enabled external metrics + +First, deploy the Datadog Cluster Agent enabling the external metrics provider, but without registering it as an `APIService` (to avoid clashing with KEDA). + +If you are using Helm to deploy the Cluster Agent, set: + +* `clusterAgent.metricsProvider.enabled` to `true` +* `clusterAgent.metricsProvider.registerAPIService` to `false` + +If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: + +``` +apiVersion: datadoghq.com/v2alpha1 +kind: DatadogAgent +metadata: + name: datadog +spec: + features: + externalMetricsServer: + enabled: true + useDatadogMetrics: true + registerAPIService: false +[...] +``` + +### Create a DatadogMetric object for each scaling query + +To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: + +```yaml +apiVersion: datadoghq.com/v1alpha1 +kind: DatadogMetric +metadata: + annotations: + external-metrics.datadoghq.com/always-active: "true" + name: nginx-hits +spec: + query: sum:nginx.net.request_per_s{kube_deployment:nginx} +``` + +### Trigger Specification + +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog Cluster Agent as proxy. + +```yaml +triggers: +- type: datadog + metricType: Value + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "7.75" + activationQueryValue: "1.1" + type: "global" # Deprecated in favor of trigger.metricType + metricUnavailableValue: "1.5" +``` + +**Parameter list:** + +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Values: true, false, Default: false, Optional) +- `datadogMetricName` - The name of the `DatadogMetric` object to drive the scaling events. +- `datadogMetricNamespace` - The namespace of the `DatadogMetric` object to drive the scaling events. +- `targetValue` - Value to reach to start scaling (This value can be a float). +- `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) +- `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) +- `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) +- `lastAvailablePointOffset`: The offset to retrieve the X to last data point. The value of last data point of some queries might be inaccurate [because of the implicit rollup function](https://docs.datadoghq.com/dashboards/functions/rollup/#rollup-interval-enforced-vs-custom), try to adjust to `1` if you encounter this issue. (Default: `0`, Optional) +- `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) + +> 💡 **NOTE:** The `type` parameter is deprecated in favor of the global `metricType` and will be removed in a future release. Users are advised to use `metricType` instead. + +### Authentication + +The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. + +You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters + along with secret credentials in `TriggerAuthentication` as mentioned below: + +**Common to all authentication types** +- `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) +- `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. +- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) +- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) +- `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) + +**Bearer authentication:** +- `token` - The ServiceAccount token to connect to the Datadog Cluster Agent. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. + +### Example + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-config + namespace: my-project +type: Opaque +data: + datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed + unsafeSsl: # Optional: base64 encoded value of `true` or `false` + authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) +--- +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: datadog-cluster-agent-creds + namespace: my-project +spec: + secretTargetRef: + - parameter: token + name: dd-cluster-agent-token + key: token + - parameter: datadogNamespace + name: datadog-config + key: namespace + - parameter: unsafeSsl + name: datadog-config + key: unsafeSsl + - parameter: authMode + name: datadog-config + key: authMode +--- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: datadog-scaledobject + namespace: my-project +spec: + scaleTargetRef: + name: nginx + maxReplicaCount: 3 + minReplicaCount: 1 + pollingInterval: 60 + triggers: + - type: datadog + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "2" + type: "global" + authenticationRef: + name: datadog-cluster-agent-creds +``` + +## Using the Datadog REST API + ### Trigger Specification -This specification describes the `datadog` trigger that scales based on a Datadog metric. +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog REST API. ```yaml triggers: - type: datadog metricType: Value metadata: + useClusterAgentProxy: "false" query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" @@ -35,6 +190,7 @@ triggers: **Parameter list:** +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Default: false) - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) @@ -124,7 +280,7 @@ spec: name: keda-trigger-auth-datadog-secret ``` -## Polling intervals and Datadog rate limiting +### Polling intervals and Datadog rate limiting [API Datadog endpoints are rate limited](https://docs.datadoghq.com/api/latest/rate-limits/). Depending on the @@ -146,14 +302,14 @@ was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. -## Multi-Query Support +### Multi-Query Support To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). > 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. -### Example 1 - Aggregating Similar Metrics +#### Example 1 - Aggregating Similar Metrics Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). @@ -188,7 +344,7 @@ The example above looks at the `http.requests` value for a service; taking two v This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. -### Example 2 - Driving scale directly +#### Example 2 - Driving scale directly When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. @@ -224,9 +380,9 @@ spec: Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. -## Cases of unexpected metrics value in DataDog API response +### Cases of unexpected metrics value in DataDog API response -### Latest data point is unavailable +#### Latest data point is unavailable By default, Datadog scaler retrieves the metrics with time window from `now - metadata.age (in seconds)` to `now`, however, some kinds of queries need a small delay (usually 30 secs - 2 mins) before data is available when querying from the API. In this case, adjust `timeWindowOffset` to ensure that the latest point of your query is always available. @@ -256,7 +412,7 @@ spec: ``` Check [here](https://github.com/kedacore/keda/pull/3954#discussion_r1042820206) for the details of this issue -### The value of last data point is inaccurate +#### The value of last data point is inaccurate Datadog implicitly rolls up data points automatically with the `avg` method, effectively displaying the average of all data points within a time interval for a given metric. Essentially, there is a rollup for each point. The values at the end attempt to have the rollup applied. When this occurs, it looks at a X second bucket according to your time window, and will default average those values together. Since this is the last point in the query, there are no other values to average with in that X second bucket. This leads to the value of last data point that was not rolled up in the same fashion as the others, and leads to an inaccurate number. In these cases, adjust `lastAvailablePointOffset` to 1 to use the second to last points of an API response would be the most accurate. From 1a05170d1288a9db3e2335ef04625825e488481c Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Fri, 9 Feb 2024 15:53:43 +0100 Subject: [PATCH 04/10] Add missing cluster agent option Signed-off-by: Ara Pulido --- content/docs/2.14/scalers/datadog.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/2.14/scalers/datadog.md b/content/docs/2.14/scalers/datadog.md index 49c679c75..350a65418 100644 --- a/content/docs/2.14/scalers/datadog.md +++ b/content/docs/2.14/scalers/datadog.md @@ -17,7 +17,7 @@ There are two ways to poll Datadog for a query value using the Datadog scaler: u ## Using the Datadog Cluster Agent -With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API. +With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API, as the Cluster Agent retrieves metric values in batches. ### Deploy the Datadog Cluster Agent with enabled external metrics @@ -27,6 +27,7 @@ If you are using Helm to deploy the Cluster Agent, set: * `clusterAgent.metricsProvider.enabled` to `true` * `clusterAgent.metricsProvider.registerAPIService` to `false` +* `clusterAgent.metricsProvider.useDatadogMetrics` to `true` If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: From f32301c74a198b094c7eb454ca9b6bd5149d1de5 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Mon, 19 Feb 2024 15:16:16 +0100 Subject: [PATCH 05/10] Add option in the cluster agent to avoid autogenerating DatadogMetric objects Signed-off-by: Ara Pulido --- content/docs/2.14/scalers/datadog.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/content/docs/2.14/scalers/datadog.md b/content/docs/2.14/scalers/datadog.md index 350a65418..7ea8a218c 100644 --- a/content/docs/2.14/scalers/datadog.md +++ b/content/docs/2.14/scalers/datadog.md @@ -28,6 +28,7 @@ If you are using Helm to deploy the Cluster Agent, set: * `clusterAgent.metricsProvider.enabled` to `true` * `clusterAgent.metricsProvider.registerAPIService` to `false` * `clusterAgent.metricsProvider.useDatadogMetrics` to `true` +* `clusterAgent.env` to `[{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: false}]` If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: @@ -42,6 +43,9 @@ spec: enabled: true useDatadogMetrics: true registerAPIService: false + override: + clusterAgent: + env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: false}] [...] ``` From 12360597fea6a050aa854008b0aa71bcfd9c3ba0 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Thu, 13 Jun 2024 14:26:08 +0200 Subject: [PATCH 06/10] fix typo in boolean string type Signed-off-by: Ara Pulido --- content/docs/2.14/scalers/datadog.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/2.14/scalers/datadog.md b/content/docs/2.14/scalers/datadog.md index 7ea8a218c..10cdff6a0 100644 --- a/content/docs/2.14/scalers/datadog.md +++ b/content/docs/2.14/scalers/datadog.md @@ -28,7 +28,7 @@ If you are using Helm to deploy the Cluster Agent, set: * `clusterAgent.metricsProvider.enabled` to `true` * `clusterAgent.metricsProvider.registerAPIService` to `false` * `clusterAgent.metricsProvider.useDatadogMetrics` to `true` -* `clusterAgent.env` to `[{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: false}]` +* `clusterAgent.env` to `[{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}]` If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: @@ -45,7 +45,7 @@ spec: registerAPIService: false override: clusterAgent: - env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: false}] + env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}] [...] ``` From 310dcfd98be63f34c11de3f5e033815cec319332 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Thu, 13 Jun 2024 14:30:46 +0200 Subject: [PATCH 07/10] Move Datadog cluster agent docs to 2.15 instead Signed-off-by: Ara Pulido --- content/docs/2.14/scalers/datadog.md | 177 ++------------------------- content/docs/2.15/scalers/datadog.md | 177 +++++++++++++++++++++++++-- 2 files changed, 177 insertions(+), 177 deletions(-) diff --git a/content/docs/2.14/scalers/datadog.md b/content/docs/2.14/scalers/datadog.md index 10cdff6a0..cf7e01d24 100644 --- a/content/docs/2.14/scalers/datadog.md +++ b/content/docs/2.14/scalers/datadog.md @@ -13,175 +13,15 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). -There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. - -## Using the Datadog Cluster Agent - -With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API, as the Cluster Agent retrieves metric values in batches. - -### Deploy the Datadog Cluster Agent with enabled external metrics - -First, deploy the Datadog Cluster Agent enabling the external metrics provider, but without registering it as an `APIService` (to avoid clashing with KEDA). - -If you are using Helm to deploy the Cluster Agent, set: - -* `clusterAgent.metricsProvider.enabled` to `true` -* `clusterAgent.metricsProvider.registerAPIService` to `false` -* `clusterAgent.metricsProvider.useDatadogMetrics` to `true` -* `clusterAgent.env` to `[{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}]` - -If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: - -``` -apiVersion: datadoghq.com/v2alpha1 -kind: DatadogAgent -metadata: - name: datadog -spec: - features: - externalMetricsServer: - enabled: true - useDatadogMetrics: true - registerAPIService: false - override: - clusterAgent: - env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}] -[...] -``` - -### Create a DatadogMetric object for each scaling query - -To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: - -```yaml -apiVersion: datadoghq.com/v1alpha1 -kind: DatadogMetric -metadata: - annotations: - external-metrics.datadoghq.com/always-active: "true" - name: nginx-hits -spec: - query: sum:nginx.net.request_per_s{kube_deployment:nginx} -``` - -### Trigger Specification - -This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog Cluster Agent as proxy. - -```yaml -triggers: -- type: datadog - metricType: Value - metadata: - useClusterAgentProxy: "true" - datadogMetricName: "nginx-hits" - datadogMetricNamespace: "default" - targetValue: "7.75" - activationQueryValue: "1.1" - type: "global" # Deprecated in favor of trigger.metricType - metricUnavailableValue: "1.5" -``` - -**Parameter list:** - -- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Values: true, false, Default: false, Optional) -- `datadogMetricName` - The name of the `DatadogMetric` object to drive the scaling events. -- `datadogMetricNamespace` - The namespace of the `DatadogMetric` object to drive the scaling events. -- `targetValue` - Value to reach to start scaling (This value can be a float). -- `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) -- `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) -- `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) -- `lastAvailablePointOffset`: The offset to retrieve the X to last data point. The value of last data point of some queries might be inaccurate [because of the implicit rollup function](https://docs.datadoghq.com/dashboards/functions/rollup/#rollup-interval-enforced-vs-custom), try to adjust to `1` if you encounter this issue. (Default: `0`, Optional) -- `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) - -> 💡 **NOTE:** The `type` parameter is deprecated in favor of the global `metricType` and will be removed in a future release. Users are advised to use `metricType` instead. - -### Authentication - -The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. - -You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters - along with secret credentials in `TriggerAuthentication` as mentioned below: - -**Common to all authentication types** -- `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) -- `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. -- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) -- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) -- `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) - -**Bearer authentication:** -- `token` - The ServiceAccount token to connect to the Datadog Cluster Agent. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. - -### Example - -```yaml -apiVersion: v1 -kind: Secret -metadata: - name: datadog-config - namespace: my-project -type: Opaque -data: - datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed - unsafeSsl: # Optional: base64 encoded value of `true` or `false` - authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) ---- -apiVersion: keda.sh/v1alpha1 -kind: TriggerAuthentication -metadata: - name: datadog-cluster-agent-creds - namespace: my-project -spec: - secretTargetRef: - - parameter: token - name: dd-cluster-agent-token - key: token - - parameter: datadogNamespace - name: datadog-config - key: namespace - - parameter: unsafeSsl - name: datadog-config - key: unsafeSsl - - parameter: authMode - name: datadog-config - key: authMode ---- -apiVersion: keda.sh/v1alpha1 -kind: ScaledObject -metadata: - name: datadog-scaledobject - namespace: my-project -spec: - scaleTargetRef: - name: nginx - maxReplicaCount: 3 - minReplicaCount: 1 - pollingInterval: 60 - triggers: - - type: datadog - metadata: - useClusterAgentProxy: "true" - datadogMetricName: "nginx-hits" - datadogMetricNamespace: "default" - targetValue: "2" - type: "global" - authenticationRef: - name: datadog-cluster-agent-creds -``` - -## Using the Datadog REST API - ### Trigger Specification -This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog REST API. +This specification describes the `datadog` trigger that scales based on a Datadog metric. ```yaml triggers: - type: datadog metricType: Value metadata: - useClusterAgentProxy: "false" query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" @@ -195,7 +35,6 @@ triggers: **Parameter list:** -- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Default: false) - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) @@ -285,7 +124,7 @@ spec: name: keda-trigger-auth-datadog-secret ``` -### Polling intervals and Datadog rate limiting +## Polling intervals and Datadog rate limiting [API Datadog endpoints are rate limited](https://docs.datadoghq.com/api/latest/rate-limits/). Depending on the @@ -307,14 +146,14 @@ was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. -### Multi-Query Support +## Multi-Query Support To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). > 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. -#### Example 1 - Aggregating Similar Metrics +### Example 1 - Aggregating Similar Metrics Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). @@ -349,7 +188,7 @@ The example above looks at the `http.requests` value for a service; taking two v This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. -#### Example 2 - Driving scale directly +### Example 2 - Driving scale directly When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. @@ -385,9 +224,9 @@ spec: Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. -### Cases of unexpected metrics value in DataDog API response +## Cases of unexpected metrics value in DataDog API response -#### Latest data point is unavailable +### Latest data point is unavailable By default, Datadog scaler retrieves the metrics with time window from `now - metadata.age (in seconds)` to `now`, however, some kinds of queries need a small delay (usually 30 secs - 2 mins) before data is available when querying from the API. In this case, adjust `timeWindowOffset` to ensure that the latest point of your query is always available. @@ -417,7 +256,7 @@ spec: ``` Check [here](https://github.com/kedacore/keda/pull/3954#discussion_r1042820206) for the details of this issue -#### The value of last data point is inaccurate +### The value of last data point is inaccurate Datadog implicitly rolls up data points automatically with the `avg` method, effectively displaying the average of all data points within a time interval for a given metric. Essentially, there is a rollup for each point. The values at the end attempt to have the rollup applied. When this occurs, it looks at a X second bucket according to your time window, and will default average those values together. Since this is the last point in the query, there are no other values to average with in that X second bucket. This leads to the value of last data point that was not rolled up in the same fashion as the others, and leads to an inaccurate number. In these cases, adjust `lastAvailablePointOffset` to 1 to use the second to last points of an API response would be the most accurate. diff --git a/content/docs/2.15/scalers/datadog.md b/content/docs/2.15/scalers/datadog.md index cf7e01d24..10cdff6a0 100644 --- a/content/docs/2.15/scalers/datadog.md +++ b/content/docs/2.15/scalers/datadog.md @@ -13,15 +13,175 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). +There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. + +## Using the Datadog Cluster Agent + +With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API, as the Cluster Agent retrieves metric values in batches. + +### Deploy the Datadog Cluster Agent with enabled external metrics + +First, deploy the Datadog Cluster Agent enabling the external metrics provider, but without registering it as an `APIService` (to avoid clashing with KEDA). + +If you are using Helm to deploy the Cluster Agent, set: + +* `clusterAgent.metricsProvider.enabled` to `true` +* `clusterAgent.metricsProvider.registerAPIService` to `false` +* `clusterAgent.metricsProvider.useDatadogMetrics` to `true` +* `clusterAgent.env` to `[{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}]` + +If you are using the Datadog Operator, add the following options to your `DatadogAgent` object: + +``` +apiVersion: datadoghq.com/v2alpha1 +kind: DatadogAgent +metadata: + name: datadog +spec: + features: + externalMetricsServer: + enabled: true + useDatadogMetrics: true + registerAPIService: false + override: + clusterAgent: + env: [{name: DD_EXTERNAL_METRICS_PROVIDER_ENABLE_DATADOGMETRIC_AUTOGEN, value: "false"}] +[...] +``` + +### Create a DatadogMetric object for each scaling query + +To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: + +```yaml +apiVersion: datadoghq.com/v1alpha1 +kind: DatadogMetric +metadata: + annotations: + external-metrics.datadoghq.com/always-active: "true" + name: nginx-hits +spec: + query: sum:nginx.net.request_per_s{kube_deployment:nginx} +``` + +### Trigger Specification + +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog Cluster Agent as proxy. + +```yaml +triggers: +- type: datadog + metricType: Value + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "7.75" + activationQueryValue: "1.1" + type: "global" # Deprecated in favor of trigger.metricType + metricUnavailableValue: "1.5" +``` + +**Parameter list:** + +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Values: true, false, Default: false, Optional) +- `datadogMetricName` - The name of the `DatadogMetric` object to drive the scaling events. +- `datadogMetricNamespace` - The namespace of the `DatadogMetric` object to drive the scaling events. +- `targetValue` - Value to reach to start scaling (This value can be a float). +- `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) +- `type` - Whether to start scaling based on the value or the average between pods. (Values: `average`, `global`, Default:`average`, Optional) +- `age`: The time window (in seconds) to retrieve metrics from Datadog. (Default: `90`, Optional) +- `lastAvailablePointOffset`: The offset to retrieve the X to last data point. The value of last data point of some queries might be inaccurate [because of the implicit rollup function](https://docs.datadoghq.com/dashboards/functions/rollup/#rollup-interval-enforced-vs-custom), try to adjust to `1` if you encounter this issue. (Default: `0`, Optional) +- `metricUnavailableValue`: The value of the metric to return to the HPA if Datadog doesn't find a metric value for the specified time window. If not set, an error will be returned to the HPA, which will log a warning. (Optional, This value can be a float) + +> 💡 **NOTE:** The `type` parameter is deprecated in favor of the global `metricType` and will be removed in a future release. Users are advised to use `metricType` instead. + +### Authentication + +The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. + +You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters + along with secret credentials in `TriggerAuthentication` as mentioned below: + +**Common to all authentication types** +- `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) +- `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. +- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) +- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) +- `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) + +**Bearer authentication:** +- `token` - The ServiceAccount token to connect to the Datadog Cluster Agent. The service account needs to have permissions to `get`, `watch`, and `list` all `external.metrics.k8s.io` resources. + +### Example + +```yaml +apiVersion: v1 +kind: Secret +metadata: + name: datadog-config + namespace: my-project +type: Opaque +data: + datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed + unsafeSsl: # Optional: base64 encoded value of `true` or `false` + authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) +--- +apiVersion: keda.sh/v1alpha1 +kind: TriggerAuthentication +metadata: + name: datadog-cluster-agent-creds + namespace: my-project +spec: + secretTargetRef: + - parameter: token + name: dd-cluster-agent-token + key: token + - parameter: datadogNamespace + name: datadog-config + key: namespace + - parameter: unsafeSsl + name: datadog-config + key: unsafeSsl + - parameter: authMode + name: datadog-config + key: authMode +--- +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +metadata: + name: datadog-scaledobject + namespace: my-project +spec: + scaleTargetRef: + name: nginx + maxReplicaCount: 3 + minReplicaCount: 1 + pollingInterval: 60 + triggers: + - type: datadog + metadata: + useClusterAgentProxy: "true" + datadogMetricName: "nginx-hits" + datadogMetricNamespace: "default" + targetValue: "2" + type: "global" + authenticationRef: + name: datadog-cluster-agent-creds +``` + +## Using the Datadog REST API + ### Trigger Specification -This specification describes the `datadog` trigger that scales based on a Datadog metric. +This specification describes the `datadog` trigger that scales based on a Datadog query, using the Datadog REST API. ```yaml triggers: - type: datadog metricType: Value metadata: + useClusterAgentProxy: "false" query: "sum:trace.redis.command.hits{env:none,service:redis}.as_count()" queryValue: "7.75" activationQueryValue: "1.1" @@ -35,6 +195,7 @@ triggers: **Parameter list:** +- `useClusterAgentProxy` - Whether to use the Cluster Agent as proxy to get the query values. (Default: false) - `query` - The Datadog query to run. - `queryValue` - Value to reach to start scaling (This value can be a float). - `activationQueryValue` - Target value for activating the scaler. Learn more about activation [here](./../concepts/scaling-deployments.md#activating-and-scaling-thresholds).(Default: `0`, Optional, This value can be a float) @@ -124,7 +285,7 @@ spec: name: keda-trigger-auth-datadog-secret ``` -## Polling intervals and Datadog rate limiting +### Polling intervals and Datadog rate limiting [API Datadog endpoints are rate limited](https://docs.datadoghq.com/api/latest/rate-limits/). Depending on the @@ -146,14 +307,14 @@ was started with `--horizontal-pod-autoscaler-sync-period=30`, the HPA will poll Datadog for a metric value every 30 seconds while the number of replicas is between 1 and N. -## Multi-Query Support +### Multi-Query Support To reduce issues with API rate limiting from Datadog, it is possible to send a single query, which contains multiple queries, comma-seperated. When doing this, the results from each query are aggregated based on the `queryAggregator` value (eg: `max` or `average`). > 💡 **NOTE:** Because the average/max aggregation operation happens at the scaler level, there won't be any validation or errors if the queries don't make sense to aggregate. Be sure to read and understand the two patterns below before using Multi-Query. -### Example 1 - Aggregating Similar Metrics +#### Example 1 - Aggregating Similar Metrics Simple aggregation works well, when wanting to scale on more than one metric with similar return values/scale (ie. where multiple metrics can use a single `queryValue` and still make sense). @@ -188,7 +349,7 @@ The example above looks at the `http.requests` value for a service; taking two v This works particularly well when scaling against the same metric, but with slightly different parameters, or methods like ```week_before()``` for example. -### Example 2 - Driving scale directly +#### Example 2 - Driving scale directly When wanting to scale on non-similar metrics, whilst still benefiting from reduced API calls with multi-query support, the easiest way to do this is to make each query directly return the desired scale (eg: number of pods), and then `max` or `average` the results to get the desired target scale. @@ -224,9 +385,9 @@ spec: Using the example above, if we assume that `http.requests` is currently returning `360`, dividing that by `180` in the query, results in a value of `2`; if `http.backlog` returns `90`, dividing that by `30` in the query, results in a value of `3`. With the `max` Aggregator set, the scaler will set the target scale to `3` as that is the higher value from all returned queries. -## Cases of unexpected metrics value in DataDog API response +### Cases of unexpected metrics value in DataDog API response -### Latest data point is unavailable +#### Latest data point is unavailable By default, Datadog scaler retrieves the metrics with time window from `now - metadata.age (in seconds)` to `now`, however, some kinds of queries need a small delay (usually 30 secs - 2 mins) before data is available when querying from the API. In this case, adjust `timeWindowOffset` to ensure that the latest point of your query is always available. @@ -256,7 +417,7 @@ spec: ``` Check [here](https://github.com/kedacore/keda/pull/3954#discussion_r1042820206) for the details of this issue -### The value of last data point is inaccurate +#### The value of last data point is inaccurate Datadog implicitly rolls up data points automatically with the `avg` method, effectively displaying the average of all data points within a time interval for a given metric. Essentially, there is a rollup for each point. The values at the end attempt to have the rollup applied. When this occurs, it looks at a X second bucket according to your time window, and will default average those values together. Since this is the last point in the query, there are no other values to average with in that X second bucket. This leads to the value of last data point that was not rolled up in the same fashion as the others, and leads to an inaccurate number. In these cases, adjust `lastAvailablePointOffset` to 1 to use the second to last points of an API response would be the most accurate. From caae13448413ee0e5451179f76231838039a47a5 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Wed, 19 Jun 2024 11:35:43 +0200 Subject: [PATCH 08/10] Mark using the Cluster Agent as proxy as experimental Signed-off-by: Ara Pulido --- content/docs/2.15/scalers/datadog.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/2.15/scalers/datadog.md b/content/docs/2.15/scalers/datadog.md index 10cdff6a0..1492fd250 100644 --- a/content/docs/2.15/scalers/datadog.md +++ b/content/docs/2.15/scalers/datadog.md @@ -13,9 +13,9 @@ polling interval. For more detailed information about polling intervals check [the Polling intervals and Datadog rate limiting section](#polling-intervals-and-datadog-rate-limiting). -There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. It is recommended to use the Datadog Cluster Agent as proxy, as it will reduce the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. +There are two ways to poll Datadog for a query value using the Datadog scaler: using the REST API endpoints, or using the [Datadog Cluster Agent](https://docs.datadoghq.com/containers/cluster_agent/) as proxy. Using the Datadog Cluster Agent as proxy reduces the chance of reaching rate limits. As both types are different in terms of usage and authentication, this documentation handles them separately. -## Using the Datadog Cluster Agent +## Using the Datadog Cluster Agent (Experimental) With this method, the Datadog scaler will be connecting to the Datadog Cluster Agent to retrieve the query values that will be used to drive the KEDA scaling events. This reduces the risk of reaching rate limits for the Datadog API, as the Cluster Agent retrieves metric values in batches. From 0bfacb97c335abdd657f81f1604e3d265f07e325 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Wed, 3 Jul 2024 12:19:55 +0200 Subject: [PATCH 09/10] datadogMetricsService paramter is mandatory Signed-off-by: Ara Pulido --- content/docs/2.15/scalers/datadog.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/2.15/scalers/datadog.md b/content/docs/2.15/scalers/datadog.md index 1492fd250..a1383c496 100644 --- a/content/docs/2.15/scalers/datadog.md +++ b/content/docs/2.15/scalers/datadog.md @@ -100,14 +100,13 @@ triggers: The Datadog scaler with Cluster Agent supports one type of authentication - Bearer authentication. -You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters - along with secret credentials in `TriggerAuthentication` as mentioned below: +You can use `TriggerAuthentication` CRD to configure the authentication. Specify `authMode` and other trigger parameters along with secret credentials in `TriggerAuthentication` as mentioned below: **Common to all authentication types** - `authMode` - The authentication mode to connect to the Cluster Agent. (Values: bearer, Default: bearer, Optional) - `datadogNamespace` - The namespace where the Datadog Cluster Agent is deployed. -- `datadogMetricsService` - The service name for the Cluster Agent Metrics API. (Default: datadog-cluster-agent-metrics-api, Optional) -- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8080, Optional) +- `datadogMetricsService` - The service name for the Cluster Agent metrics server. To find the name of the service, check the available services in the Datadog namespace and look for the `*-cluster-agent-metrics*` name pattern. +- `datadogMetricsServicePort` - The port of the service for the Cluster Agent Metrics API. (Default: 8443, Optional) - `unsafeSsl` - Skip certificate validation when connecting over HTTPS. (Values: true, false, Default: false, Optional) **Bearer authentication:** @@ -124,6 +123,7 @@ metadata: type: Opaque data: datadogNamespace: # Required: base64 encoded value of the namespace where the Datadog Cluster Agent is deployed + datadogMetricsService: # Required: base64 encoded value of the Cluster Agent metrics server service unsafeSsl: # Optional: base64 encoded value of `true` or `false` authMode: # Required: base64 encoded value of the authentication mode (in this case, bearer) --- From b75c1959f640db5aa88e4549dbd8aa81ac50b6e6 Mon Sep 17 00:00:00 2001 From: Ara Pulido Date: Wed, 24 Jul 2024 12:52:36 +0200 Subject: [PATCH 10/10] Establish 1.8.0 as the minimum version of the Datadog Operator required for this Signed-off-by: Ara Pulido --- content/docs/2.15/scalers/datadog.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/docs/2.15/scalers/datadog.md b/content/docs/2.15/scalers/datadog.md index a1383c496..764bba940 100644 --- a/content/docs/2.15/scalers/datadog.md +++ b/content/docs/2.15/scalers/datadog.md @@ -49,6 +49,8 @@ spec: [...] ``` +NOTE: Using the Datadog Operator for this purpose requires version 1.8.0 of the operator or later. + ### Create a DatadogMetric object for each scaling query To use the Datadog Cluster Agent to retrieve the query values from Datadog, first, create a [`DatadogMetric`](https://docs.datadoghq.com/containers/guide/cluster_agent_autoscaling_metrics/?tab=helm#create-the-datadogmetric-object) object with the query to drive your scaling events. You will need to add the `external-metrics.datadoghq.com/always-active: "true"` annotation, to ensure the Cluster Agent retrieves the query value. Example: