Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm): Adding KEDA autoscaling support #7282

Merged
merged 62 commits into from
Feb 13, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
9ff0e92
feat(helm): Adding KEDA autoscaling support
beatkind Feb 3, 2024
f7d7afd
fix: update changelog
beatkind Feb 3, 2024
ce04869
feat(helm): Porting the changes from #6971 into helm chart
beatkind Feb 3, 2024
9ad9f67
feat: Adding better changelog & values documentation outlining the ex…
beatkind Feb 13, 2024
c0f29e7
fix: Remove duplicate query field, add base url in CHANGELOG.md
beatkind Feb 13, 2024
38702d2
helm: align grpc server connection lifetime settings with jsonnet (#7…
narqo Feb 5, 2024
c59d35e
querymiddleware: Fix race condition in shardActiveSeriesMiddleware (#…
narqo Feb 5, 2024
21bbd22
version: add UserAgent() (#7264)
narqo Feb 5, 2024
916c340
helm: remove -server.grpc.keepalive.max-connection-idle from common c…
narqo Feb 5, 2024
05b2dce
Compactor: export estimated number of compaction jobs based on bucket…
pstibrany Feb 5, 2024
16a59ad
Add KubePersistentVolumeFillingUp runbook (#7297)
pracucci Feb 5, 2024
3a778ae
Internal: remove unnecessary parameter to NoCompactionMarkFilter (#7301)
pstibrany Feb 5, 2024
04b8224
Name query metrics for easier discovery (#7302)
56quarters Feb 5, 2024
1b3708e
fix(deps): update module github.com/aws/aws-sdk-go to v1.50.11 (#7288)
renovate[bot] Feb 6, 2024
6798bed
fix(deps): update module github.com/klauspost/compress to v1.17.6 (#7…
renovate[bot] Feb 6, 2024
213a453
chore(deps): update anchore/sbom-action action to v0.15.8 (#7286)
renovate[bot] Feb 6, 2024
105d82f
chore(deps): update grafana/agent docker tag to v0.39.2 (#7287)
renovate[bot] Feb 6, 2024
7f1c9fe
chore(deps): update grafana/grafana docker tag to v10.3.1 (#7292)
renovate[bot] Feb 6, 2024
28be80c
fix(deps): update module github.com/failsafe-go/failsafe-go to v0.4.4…
renovate[bot] Feb 6, 2024
6e3ff83
Chore: removed unused parameter from GenerateBlockFromSpec() (#7303)
pracucci Feb 6, 2024
14d241a
Update mimir-prometheus (#7293)
pracucci Feb 6, 2024
55c978e
Release mimir-distributed Helm chart 5.3.0-weekly.276 (#7294)
grafanabot Feb 6, 2024
0c6d6db
Open circuit breakers on timeouts and per-instance limit errors only …
duricanikolic Feb 7, 2024
f1c8e71
Get rid of iterators.chunkIterator and iterators.chunkMergeIterator (…
duricanikolic Feb 7, 2024
185c2fe
Compactor: Language fixes (#7315)
aknuds1 Feb 7, 2024
1627df3
Do not register compat metrics in mimirtool (#7314)
grobinson-grafana Feb 7, 2024
6f57c5c
Compactor: Un-export symbols that don't need to be exported (#7317)
aknuds1 Feb 7, 2024
28e09c5
Circuit breakers: add client.ErrCircuitBreakerOpen type (#7324)
duricanikolic Feb 8, 2024
bbcb640
Add mimirpb.CIRCUIT_BREAKER_OPEN error cause (#7330)
duricanikolic Feb 8, 2024
1d2d2a7
store-gateway: remove cortex_bucket_store_blocks_loaded_by_duration (…
dimitarvdimitrov Feb 8, 2024
c9c074b
ruler: don't retry on non-retriable error (#7216)
narqo Feb 8, 2024
3624447
Update Alertmanager to f69a508 (#7332)
grobinson-grafana Feb 8, 2024
eaae699
Helm: add ruler specific service account (#7132)
QuantumEnigmaa Feb 8, 2024
84a2add
frontend/transport: log non-2xx replies from downstream as non-succes…
narqo Feb 8, 2024
dffd834
querymiddleware: Pool snappy writer in shard activity series (#7308)
narqo Feb 8, 2024
c1e523d
Helm: make PSP configurable (#7190)
QuantumEnigmaa Feb 8, 2024
b22fed6
Helm - Templatable host for gateway ingress/route (#7218)
Itaykal Feb 8, 2024
33b6a8a
[Docs] Update migrate-from-single-zone-with-helm.md (#7327)
eamonryan Feb 8, 2024
d3797d6
Always sort labels in distributors (#7326)
Logiraptor Feb 8, 2024
0c8a166
Do not check for ingester ring state before creating TSDB, or compact…
pracucci Feb 9, 2024
7952c2e
Compactor: String format compaction plan as comma separated blocks (#…
aknuds1 Feb 9, 2024
262ae64
Add a lifetime manager for Vault authentication tokens (#7337)
fayzal-g Feb 9, 2024
2dba521
fix(deps): update github.com/grafana/dskit digest to f245b48 (#7283)
renovate[bot] Feb 9, 2024
c8e62c8
Packaging: remove reload from systemd file as mimir does not take int…
wilfriedroset Feb 9, 2024
1745d88
Docs: No longer mark OTLP endpoint as experimental (#7348)
aknuds1 Feb 10, 2024
aa3813c
Update golang.org/x/exp digest to 2c58cdc (#7352)
renovate[bot] Feb 12, 2024
f7c3cb7
Update module github.com/aws/aws-sdk-go to v1.50.15 (#7353)
renovate[bot] Feb 12, 2024
831f9e2
Update module github.com/minio/minio-go/v7 to v7.0.67 (#7354)
renovate[bot] Feb 12, 2024
f720020
Update dependency puppeteer to v21.11.0 (#7355)
renovate[bot] Feb 12, 2024
c5e9dfe
Update helm/kind-action action to v1.9.0 (#7357)
renovate[bot] Feb 12, 2024
271a805
Update module cloud.google.com/go/storage to v1.37.0 (#7358)
renovate[bot] Feb 12, 2024
22b163e
Jsonnet / Helm: improve distributors graceful shutdown (#7361)
pracucci Feb 12, 2024
66a893a
Release mimir-distributed Helm chart 5.3.0-weekly.277 (#7362)
grafanabot Feb 12, 2024
8ba0cad
Distributor: Make `-distributor.enable-otlp-metadata-storage` flag de…
aknuds1 Feb 12, 2024
f95dc9d
Mark -ingester.limit-inflight-requests-using-grpc-method-limiter and …
pracucci Feb 12, 2024
f9e9d6f
Do not consider out-of-order blocks when filtering compactable jobs (…
jhalterman Feb 12, 2024
f10561c
mimir: Inject span profiler into tracer (#7363)
narqo Feb 13, 2024
3a7e509
Add experimental partitions ring lifecycler support (#7349)
pracucci Feb 13, 2024
188d181
feat(helm): Adding KEDA autoscaling support
beatkind Feb 13, 2024
87add14
chore: rebase branch with main
beatkind Feb 13, 2024
cf2e13e
Merge branch 'grafana:main' into add-helm-keda
beatkind Feb 13, 2024
c96d4c6
chore: make build-helm-tests
beatkind Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions operations/helm/charts/mimir-distributed/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Entries should include a reference to the Pull Request that introduced the chang

## main / unreleased

* [FEATURE] Added experimental feature for deploying [KEDA](https://keda.sh) ScaledObjects as part of the helm chart for the components: distributor, querier, query-frontend and ruler. Autoscaling can be enabled via `distributor.kedaAutoscaling`, `ruler.kedaAutoscaling`, `query_frontend.kedaAutoscaling`, and `querier.kedaAutoscaling`. Requires metamonitoring, for more details on metamonitoring see [Monitor the health of your system](https://grafana.com/docs/helm-charts/mimir-distributed/latest/run-production-environment-with-helm/monitor-system-health/). See [grafana/mimir#7367](https://github.com/grafana/mimir/issues/7367) for a migration procedure. #7282
* [CHANGE] Rollout-operator: remove default CPU limit. #7125
* [CHANGE] Ring: relaxed the hash ring heartbeat period and timeout for distributor, ingester, store-gateway and compactor: #6860
* `-distributor.ring.heartbeat-period` set to `1m`
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Pin kube version so results are the same for running in CI and locally where the installed kube version may be different.
kubeVersionOverride: "1.20"

metaMonitoring:
grafanaAgent:
metrics:
enabled: false
remote:
url: https://mimir.example.com/api/v1/push # test with setting a different remote for the monitoring

distributor:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1

ruler:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1

querier:
kedaAutoscaling:
enabled: true
minReplicaCount: 2
maxReplicaCount: 10
pollingInterval: 10
querySchedulerInflightRequestsThreshold: 6
customHeaders:
X-Scope-OrgID: tenant-1

query_frontend:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Pin kube version so results are the same for running in CI and locally where the installed kube version may be different.
kubeVersionOverride: "1.20"

metaMonitoring:
grafanaAgent:
metrics:
enabled: false
# Leave the remote empty to use the default to send it to Mimir directly
# remote: #

distributor:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1

ruler:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1

querier:
kedaAutoscaling:
enabled: true
minReplicaCount: 2
maxReplicaCount: 10
pollingInterval: 10
querySchedulerInflightRequestsThreshold: 6
customHeaders:
X-Scope-OrgID: tenant-1

query_frontend:
kedaAutoscaling:
enabled: true
minReplicaCount: 1
maxReplicaCount: 10
pollingInterval: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
customHeaders:
X-Scope-OrgID: tenant-1
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,10 @@ Return if we should create a SecurityContextConstraints. Takes into account user
{{ include "mimir.gatewayUrl" . }}/api/v1/push
{{- end -}}

{{- define "mimir.remoteReadUrl.inCluster" -}}
{{ include "mimir.gatewayUrl" . }}{{ include "mimir.prometheusHttpPrefix" . }}
{{- end -}}

{{/*
Creates dict for zone-aware replication configuration
Params:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ metadata:
{{- toYaml .Values.distributor.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
{{- if not .Values.distributor.kedaAutoscaling.enabled }}
# If replicas is not number (when using values file it's float64, when using --set arg it's int64) and is false (i.e. null) don't set it
{{- if or (or (kindIs "int64" .Values.distributor.replicas) (kindIs "float64" .Values.distributor.replicas)) (.Values.distributor.replicas) }}
replicas: {{ .Values.distributor.replicas }}
{{- end }}
{{- end }}
selector:
matchLabels:
{{- include "mimir.selectorLabels" (dict "ctx" . "component" "distributor" "memberlist" true) | nindent 6 }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{{- if .Values.distributor.kedaAutoscaling.enabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "distributor") }}
labels:
{{- include "mimir.labels" (dict "ctx" . "component" "distributor") | nindent 4 }}
annotations:
{{- toYaml .Values.distributor.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
advanced:
horizontalPodAutoscalerConfig:
{{- with .Values.distributor.kedaAutoscaling.behavior }}
behavior:
{{- toYaml . | nindent 8 }}
{{- end }}
maxReplicaCount: {{ .Values.distributor.kedaAutoscaling.maxReplicaCount }}
minReplicaCount: {{ .Values.distributor.kedaAutoscaling.minReplicaCount }}
pollingInterval: {{ .Values.distributor.kedaAutoscaling.pollingInterval }}
scaleTargetRef:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "distributor") }}
apiVersion: apps/v1
kind: Deployment
triggers:
- metadata:
query: max_over_time(sum(sum by (pod) (rate(container_cpu_usage_seconds_total{container="distributor",namespace="{{ .Release.Namespace }}"}[5m])) and max by (pod) (up{container="distributor",namespace="{{ .Release.Namespace }}"}) > 0)[15m:]) * 1000
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $cpu_request := dig "requests" "cpu" nil .Values.distributor.resources }}
threshold: {{ mulf (include "mimir.parseCPU" (dict "value" $cpu_request)) (divf .Values.distributor.kedaAutoscaling.targetCPUUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.distributor.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.distributor.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
- metadata:
query: max_over_time(sum((sum by (pod) (container_memory_working_set_bytes{container="distributor",namespace="{{ .Release.Namespace }}"}) and max by (pod) (up{container="distributor",namespace="{{ .Release.Namespace }}"}) > 0) or vector(0))[15m:]) + sum(sum by (pod) (max_over_time(kube_pod_container_resource_requests{container="distributor",namespace="{{ .Release.Namespace }}", resource="memory"}[15m])) and max by (pod) (changes(kube_pod_container_status_restarts_total{container="distributor",namespace="{{ .Release.Namespace }}"}[15m]) > 0) and max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor",namespace="{{ .Release.Namespace }}", reason="OOMKilled"}) or vector(0))
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $mem_request := dig "requests" "memory" nil .Values.distributor.resources }}
threshold: {{ mulf (include "mimir.siToBytes" (dict "value" $mem_request)) (divf .Values.distributor.kedaAutoscaling.targetMemoryUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.distributor.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.distributor.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{/*
Convert labels to string like: key1=value1, key2=value2, ...
Example:
customHeaders:
X-Scope-OrgID: tenant-1
becomes:
customHeaders: "X-Scope-OrgID=tenant-1"
Params:
map = map to convert to csv string
*/}}
{{- define "mimir.lib.mapToCSVString" -}}
{{- $list := list -}}
{{- range $k, $v := $.map -}}
{{- $list = append $list (printf "%s=%s" $k $v) -}}
{{- end -}}
{{ join "," $list }}
{{- end -}}
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,15 @@
cluster: {{ include "mimir.clusterName" $.ctx | quote}}
{{- end -}}
{{- end -}}

{{- define "mimir.metaMonitoring.metrics.remoteReadUrl" -}}
{{- with $.ctx.Values.metaMonitoring.grafanaAgent.metrics }}
{{- $writeBackToMimir := not (.remote).url -}}
{{- if $writeBackToMimir -}}
{{- include "mimir.remoteReadUrl.inCluster" $.ctx }}
{{- else -}}
{{- $parsed := urlParse (.remote).url -}}
{{ $parsed.scheme }}://{{ $parsed.host }}/prometheus
{{- end }}
{{- end -}}
{{- end -}}
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ metadata:
{{- toYaml .Values.querier.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
{{- if not .Values.querier.kedaAutoscaling.enabled }}
# If replicas is not number (when using values file it's float64, when using --set arg it's int64) and is false (i.e. null) don't set it
{{- if or (or (kindIs "int64" .Values.querier.replicas) (kindIs "float64" .Values.querier.replicas)) (.Values.querier.replicas) }}
replicas: {{ .Values.querier.replicas }}
{{- end }}
{{- end }}
selector:
matchLabels:
{{- include "mimir.selectorLabels" (dict "ctx" . "component" "querier" "memberlist" true) | nindent 6 }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{{- if .Values.querier.kedaAutoscaling.enabled }}
{{- if not .Values.query_scheduler.enabled }}
{{- fail "KEDA autoscaling for querier requires query scheduler to be enabled" }}
{{- end }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "querier") }}
labels:
{{- include "mimir.labels" (dict "ctx" . "component" "querier") | nindent 4 }}
annotations:
{{- toYaml .Values.querier.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
advanced:
horizontalPodAutoscalerConfig:
{{- with .Values.querier.kedaAutoscaling.behavior }}
behavior:
{{- toYaml . | nindent 8 }}
{{- end }}
maxReplicaCount: {{ .Values.querier.kedaAutoscaling.maxReplicaCount }}
minReplicaCount: {{ .Values.querier.kedaAutoscaling.minReplicaCount }}
pollingInterval: {{ .Values.querier.kedaAutoscaling.pollingInterval }}
scaleTargetRef:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "querier") }}
apiVersion: apps/v1
kind: Deployment
triggers:
- metadata:
query: sum(max_over_time(cortex_query_scheduler_inflight_requests{container="query-scheduler",namespace="{{ .Release.Namespace }}",quantile="0.5"}[1m]))
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
threshold: {{ .Values.querier.kedaAutoscaling.querySchedulerInflightRequestsThreshold | quote }}
{{- if .Values.querier.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.querier.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
name: cortex_querier_hpa_default
type: prometheus
- metadata:
query: sum(rate(cortex_querier_request_duration_seconds_sum{container="querier",namespace="{{ .Release.Namespace }}"}[1m]))
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
threshold: {{ .Values.querier.kedaAutoscaling.querySchedulerInflightRequestsThreshold | quote }}
{{- if .Values.querier.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.querier.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
name: cortex_querier_hpa_default_requests_duration
type: prometheus
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ metadata:
{{- toYaml .Values.query_frontend.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
{{- if not .Values.query_frontend.kedaAutoscaling.enabled }}
# If replicas is not number (when using values file it's float64, when using --set arg it's int64) and is false (i.e. null) don't set it
{{- if or (or (kindIs "int64" .Values.query_frontend.replicas) (kindIs "float64" .Values.query_frontend.replicas)) (.Values.query_frontend.replicas) }}
replicas: {{ .Values.query_frontend.replicas }}
{{- end }}
{{- end }}
selector:
matchLabels:
{{- include "mimir.selectorLabels" (dict "ctx" . "component" "query-frontend") | nindent 6 }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{{- if .Values.query_frontend.kedaAutoscaling.enabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "query-frontend") }}
labels:
{{- include "mimir.labels" (dict "ctx" . "component" "query-frontend") | nindent 4 }}
annotations:
{{- toYaml .Values.query_frontend.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
advanced:
horizontalPodAutoscalerConfig:
{{- with .Values.query_frontend.kedaAutoscaling.behavior }}
behavior:
{{- toYaml . | nindent 8 }}
{{- end }}
maxReplicaCount: {{ .Values.query_frontend.kedaAutoscaling.maxReplicaCount }}
minReplicaCount: {{ .Values.query_frontend.kedaAutoscaling.minReplicaCount }}
pollingInterval: {{ .Values.query_frontend.kedaAutoscaling.pollingInterval }}
scaleTargetRef:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "query-frontend") }}
apiVersion: apps/v1
kind: Deployment
triggers:
- metadata:
query: max_over_time(sum(sum by (pod) (rate(container_cpu_usage_seconds_total{container="query-frontend",namespace="{{ .Release.Namespace }}"}[5m])) and max by (pod) (up{container="query-frontend",namespace="{{ .Release.Namespace }}"}) > 0)[15m:]) * 1000
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $cpu_request := dig "requests" "cpu" nil .Values.query_frontend.resources }}
threshold: {{ mulf (include "mimir.parseCPU" (dict "value" $cpu_request)) (divf .Values.query_frontend.kedaAutoscaling.targetCPUUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.query_frontend.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.query_frontend.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
- metadata:
query: max_over_time(sum((sum by (pod) (container_memory_working_set_bytes{container="query-frontend",namespace="{{ .Release.Namespace }}"}) and max by (pod) (up{container="query-frontend",namespace="{{ .Release.Namespace }}"}) > 0) or vector(0))[15m:]) + sum(sum by (pod) (max_over_time(kube_pod_container_resource_requests{container="query-frontend",namespace="{{ .Release.Namespace }}", resource="memory"}[15m])) and max by (pod) (changes(kube_pod_container_status_restarts_total{container="query-frontend",namespace="{{ .Release.Namespace }}"}[15m]) > 0) and max by (pod) (kube_pod_container_status_last_terminated_reason{container="query-frontend",namespace="{{ .Release.Namespace }}", reason="OOMKilled"}) or vector(0))
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $mem_request := dig "requests" "memory" nil .Values.query_frontend.resources }}
threshold: {{ mulf (include "mimir.siToBytes" (dict "value" $mem_request)) (divf .Values.query_frontend.kedaAutoscaling.targetMemoryUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.query_frontend.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.query_frontend.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ metadata:
{{- toYaml .Values.ruler.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
{{- if not .Values.ruler.kedaAutoscaling.enabled }}
replicas: {{ .Values.ruler.replicas }}
{{- end }}
selector:
matchLabels:
{{- include "mimir.selectorLabels" (dict "ctx" . "component" "ruler" "memberlist" true) | nindent 6 }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{{- if .Values.ruler.kedaAutoscaling.enabled }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "ruler") }}
labels:
{{- include "mimir.labels" (dict "ctx" . "component" "ruler") | nindent 4 }}
annotations:
{{- toYaml .Values.ruler.annotations | nindent 4 }}
namespace: {{ .Release.Namespace | quote }}
spec:
advanced:
horizontalPodAutoscalerConfig:
{{- with .Values.ruler.kedaAutoscaling.behavior }}
behavior:
{{- toYaml . | nindent 8 }}
{{- end }}
maxReplicaCount: {{ .Values.ruler.kedaAutoscaling.maxReplicaCount }}
minReplicaCount: {{ .Values.ruler.kedaAutoscaling.minReplicaCount }}
pollingInterval: {{ .Values.ruler.kedaAutoscaling.pollingInterval }}
scaleTargetRef:
name: {{ include "mimir.resourceName" (dict "ctx" . "component" "ruler") }}
apiVersion: apps/v1
kind: Deployment
triggers:
- metadata:
query: max_over_time(sum(sum by (pod) (rate(container_cpu_usage_seconds_total{container="ruler",namespace="{{ .Release.Namespace }}"}[5m])) and max by (pod) (up{container="ruler",namespace="{{ .Release.Namespace }}"}) > 0)[15m:]) * 1000
query: max_over_time(sum(rate(container_cpu_usage_seconds_total{container="ruler",namespace="{{ .Release.Namespace }}"}[5m]))[15m:]) * 1000
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $cpu_request := dig "requests" "cpu" nil .Values.ruler.resources }}
threshold: {{ mulf (include "mimir.parseCPU" (dict "value" $cpu_request)) (divf .Values.ruler.kedaAutoscaling.targetCPUUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.ruler.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.ruler.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
- metadata:
query: max_over_time(sum((sum by (pod) (container_memory_working_set_bytes{container="ruler",namespace="{{ .Release.Namespace }}"}) and max by (pod) (up{container="ruler",namespace="{{ .Release.Namespace }}"}) > 0) or vector(0))[15m:]) + sum(sum by (pod) (max_over_time(kube_pod_container_resource_requests{container="ruler",namespace="{{ .Release.Namespace }}", resource="memory"}[15m])) and max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler",namespace="{{ .Release.Namespace }}"}[15m]) > 0) and max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler",namespace="{{ .Release.Namespace }}", reason="OOMKilled"}) or vector(0))
query: max_over_time(sum(container_memory_working_set_bytes{container="ruler",namespace="{{ .Release.Namespace }}"})[15m:])
serverAddress: {{ include "mimir.metaMonitoring.metrics.remoteReadUrl" (dict "ctx" $) }}
{{- $mem_request := dig "requests" "memory" nil .Values.ruler.resources }}
threshold: {{ mulf (include "mimir.siToBytes" (dict "value" $mem_request)) (divf .Values.ruler.kedaAutoscaling.targetMemoryUtilizationPercentage 100) | floor | int64 | quote }}
{{- if .Values.ruler.kedaAutoscaling.customHeaders }}
customHeaders: {{ (include "mimir.lib.mapToCSVString" (dict "map" .Values.ruler.kedaAutoscaling.customHeaders)) | quote }}
{{- end }}
type: prometheus
{{- end }}
Loading
Loading