Skip to content

Commit

Permalink
Mixin: Follow naming convention for bucketReplicate (thanos-io#4859)
Browse files Browse the repository at this point in the history
* Follow naming convention for bucketReplicate dashboard and alert

Signed-off-by: Jéssica Lins <[email protected]>

* Run make docs

Signed-off-by: Jéssica Lins <[email protected]>

* Examples clean, make docs

Signed-off-by: Jéssica Lins <[email protected]>

* Remove someAlert from thanos-component-absent test case

Signed-off-by: Jéssica Lins <[email protected]>

* Add util function to change from snake case to camel case

Signed-off-by: Jéssica Lins <[email protected]>

* Use snake case for query frontend dashboard file

Signed-off-by: Jéssica Lins <[email protected]>

* Update links to query frontend and bucket replicate dashhboards

Signed-off-by: Jéssica Lins <[email protected]>

* Extract to sanitizeComponent function

Signed-off-by: Jéssica Lins <[email protected]>

* Rename to sanitizeComponentName

Signed-off-by: Jéssica Lins <[email protected]>
  • Loading branch information
Jéssica Lins authored Nov 19, 2021
1 parent b0b853b commit d2d74da
Show file tree
Hide file tree
Showing 14 changed files with 50 additions and 56 deletions.
13 changes: 1 addition & 12 deletions examples/alerts/alerts.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@ rules:
description: Thanos Replicate is failing to run, {{$value | humanize}}% of attempts
failed.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate
summary: Thanose Replicate is failing to run.
summary: Thanos Replicate is failing to run.
expr: |
(
sum by (job) (rate(thanos_replicate_replication_runs_total{result="error", job=~".*thanos-bucket-replicate.*"}[5m]))
Expand Down Expand Up @@ -612,17 +612,6 @@ rules:
```yaml mdox-exec="cat examples/tmp/thanos-component-absent.yaml"
name: thanos-component-absent
rules:
- alert: ThanosBucketReplicateIsDown
annotations:
description: ThanosBucketReplicate has disappeared. Prometheus target for the
component cannot be discovered.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateisdown
summary: Thanos component has disappeared.
expr: |
absent(up{job=~".*thanos-bucket-replicate.*"} == 1)
for: 5m
labels:
severity: critical
- alert: ThanosCompactIsDown
annotations:
description: ThanosCompact has disappeared. Prometheus target for the component
Expand Down
13 changes: 1 addition & 12 deletions examples/alerts/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,7 @@ groups:
description: Thanos Replicate is failing to run, {{$value | humanize}}% of attempts
failed.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate
summary: Thanose Replicate is failing to run.
summary: Thanos Replicate is failing to run.
expr: |
(
sum by (job) (rate(thanos_replicate_replication_runs_total{result="error", job=~".*thanos-bucket-replicate.*"}[5m]))
Expand All @@ -569,17 +569,6 @@ groups:
severity: critical
- name: thanos-component-absent
rules:
- alert: ThanosBucketReplicateIsDown
annotations:
description: ThanosBucketReplicate has disappeared. Prometheus target for the
component cannot be discovered.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateisdown
summary: Thanos component has disappeared.
expr: |
absent(up{job=~".*thanos-bucket-replicate.*"} == 1)
for: 5m
labels:
severity: critical
- alert: ThanosCompactIsDown
annotations:
description: ThanosCompact has disappeared. Prometheus target for the component
Expand Down
2 changes: 1 addition & 1 deletion examples/dashboards/dashboards.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ There exists Grafana dashboards for each component (not all of them complete) ta
- [Thanos Overview](overview.json)
- [Thanos Compact](compact.json)
- [Thanos Querier](query.json)
- [Thanos Query Frontend](queryFrontend.json)
- [Thanos Query Frontend](query_frontend.json)
- [Thanos Store](store.json)
- [Thanos Receiver](receive.json)
- [Thanos Sidecar](sidecar.json)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1112,6 +1112,6 @@
},
"timezone": "UTC",
"title": "Thanos / Query Frontend",
"uid": "9bc9f8bb21d4d18193c3fe772b36c306",
"uid": "303c4e660a475c4c8cf6aee97da3a24a",
"version": 0
}
3 changes: 1 addition & 2 deletions mixin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,7 @@ This project is intended to be used as a library. You can extend and customize d
thanosPrometheusCommonDimensions: 'namespace, pod',
title: '%(prefix)sSidecar' % $.dashboard.prefix,
},
// TODO(kakkoyun): Fix naming convention: bucketReplicate
bucket_replicate+:: {
bucketReplicate+:: {
selector: 'job=~".*thanos-bucket-replicate.*"',
title: '%(prefix)sBucketReplicate' % $.dashboard.prefix,
},
Expand Down
10 changes: 5 additions & 5 deletions mixin/alerts/bucket_replicate.libsonnet
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
local thanos = self,
bucket_replicate+:: {
bucketReplicate+:: {
selector: error 'must provide selector for Thanos Bucket Replicate dashboard',
errorThreshold: 10,
p99LatencyThreshold: 20,
dimensions: std.join(', ', std.objectFields(thanos.targetGroups) + ['job']),
},
prometheusAlerts+:: {
groups+: if thanos.bucket_replicate == null then [] else [
groups+: if thanos.bucketReplicate == null then [] else [
local location = if std.length(std.objectFields(thanos.targetGroups)) > 0 then ' in %s' % std.join('/', ['{{$labels.%s}}' % level for level in std.objectFields(thanos.targetGroups)]) else '';
{
name: 'thanos-bucket-replicate',
Expand All @@ -16,15 +16,15 @@
alert: 'ThanosBucketReplicateErrorRate',
annotations: {
description: 'Thanos Replicate is failing to run%s, {{$value | humanize}}%% of attempts failed.' % location,
summary: 'Thanose Replicate is failing to run%s.' % location,
summary: 'Thanos Replicate is failing to run%s.' % location,
},
expr: |||
(
sum by (%(dimensions)s) (rate(thanos_replicate_replication_runs_total{result="error", %(selector)s}[5m]))
/ on (%(dimensions)s) group_left
sum by (%(dimensions)s) (rate(thanos_replicate_replication_runs_total{%(selector)s}[5m]))
) * 100 >= %(errorThreshold)s
||| % thanos.bucket_replicate,
||| % thanos.bucketReplicate,
'for': '5m',
labels: {
severity: 'critical',
Expand All @@ -42,7 +42,7 @@
and
sum by (%(dimensions)s) (rate(thanos_replicate_replication_run_duration_seconds_bucket{%(selector)s}[5m])) > 0
)
||| % thanos.bucket_replicate,
||| % thanos.bucketReplicate,
'for': '5m',
labels: {
severity: 'critical',
Expand Down
3 changes: 1 addition & 2 deletions mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,7 @@
thanosPrometheusCommonDimensions: 'namespace, pod',
title: '%(prefix)sSidecar' % $.dashboard.prefix,
},
// TODO(kakkoyun): Fix naming convention: bucketReplicate
bucket_replicate+:: {
bucketReplicate+:: {
selector: 'job=~".*thanos-bucket-replicate.*"',
title: '%(prefix)sBucketReplicate' % $.dashboard.prefix,
},
Expand Down
28 changes: 14 additions & 14 deletions mixin/dashboards/bucket_replicate.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ local g = import '../lib/thanos-grafana-builder/builder.libsonnet';

{
local thanos = self,
bucket_replicate+:: {
bucketReplicate+:: {
selector: error 'must provide selector for Thanos Bucket Replicate dashboard',
title: error 'must provide title for Thanos Bucket Replicate dashboard',
dashboard:: {
Expand All @@ -11,22 +11,22 @@ local g = import '../lib/thanos-grafana-builder/builder.libsonnet';
},
},
grafanaDashboards+:: {
[if thanos.bucket_replicate != null then 'bucket_replicate.json']:
g.dashboard(thanos.bucket_replicate.title)
[if thanos.bucketReplicate != null then 'bucket_replicate.json']:
g.dashboard(thanos.bucketReplicate.title)
.addRow(
g.row('Bucket Replicate Runs')
.addPanel(
g.panel('Rate') +
g.qpsErrTotalPanel(
'thanos_replicate_replication_runs_total{result="error", %s}' % thanos.bucket_replicate.dashboard.selector,
'thanos_replicate_replication_runs_total{%s}' % thanos.bucket_replicate.dashboard.selector,
thanos.bucket_replicate.dashboard.dimensions
'thanos_replicate_replication_runs_total{result="error", %s}' % thanos.bucketReplicate.dashboard.selector,
'thanos_replicate_replication_runs_total{%s}' % thanos.bucketReplicate.dashboard.selector,
thanos.bucketReplicate.dashboard.dimensions
)
)
.addPanel(
g.panel('Errors', 'Shows rate of errors.') +
g.queryPanel(
'sum by (%(dimensions)s, result) (rate(thanos_replicate_replication_runs_total{result="error", %(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s, result) (rate(thanos_replicate_replication_runs_total{result="error", %(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
'{{result}}'
) +
{ yaxes: g.yaxes('percentunit') } +
Expand All @@ -36,8 +36,8 @@ local g = import '../lib/thanos-grafana-builder/builder.libsonnet';
g.panel('Duration', 'Shows how long has it taken to run a replication cycle.') +
g.latencyPanel(
'thanos_replicate_replication_run_duration_seconds',
'result="success", %s' % thanos.bucket_replicate.dashboard.selector,
thanos.bucket_replicate.dashboard.dimensions
'result="success", %s' % thanos.bucketReplicate.dashboard.selector,
thanos.bucketReplicate.dashboard.dimensions
)
)
)
Expand All @@ -47,11 +47,11 @@ local g = import '../lib/thanos-grafana-builder/builder.libsonnet';
g.panel('Metrics') +
g.queryPanel(
[
'sum by (%(dimensions)s) (rate(blocks_meta_synced{state="loaded", %(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s) (rate(blocks_meta_synced{state="failed", %(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_blocks_already_replicated_total{%(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_blocks_replicated_total{%(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_objects_replicated_total{%(selector)s}[$interval]))' % thanos.bucket_replicate.dashboard,
'sum by (%(dimensions)s) (rate(blocks_meta_synced{state="loaded", %(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
'sum by (%(dimensions)s) (rate(blocks_meta_synced{state="failed", %(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_blocks_already_replicated_total{%(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_blocks_replicated_total{%(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
'sum by (%(dimensions)s) (rate(thanos_replicate_objects_replicated_total{%(selector)s}[$interval]))' % thanos.bucketReplicate.dashboard,
],
['meta loads', 'partial meta reads', 'already replicated blocks', 'replicated blocks', 'replicated objects']
)
Expand Down
4 changes: 3 additions & 1 deletion mixin/dashboards/defaults.libsonnet
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
local utils = import '../lib/utils.libsonnet';
{
local thanos = self,
local grafanaDashboards = super.grafanaDashboards,
Expand All @@ -13,7 +14,8 @@
// Automatically add a uid to each dashboard based on the base64 encoding
// of the file name and set the timezone to be 'default'.
grafanaDashboards:: {
local component = std.split(filename, '.')[0],
local component = utils.sanitizeComponentName(std.split(filename, '.')[0]),

[filename]: grafanaDashboards[filename] {
uid: std.md5(filename),
timezone: thanos.dashboard.timezone,
Expand Down
2 changes: 1 addition & 1 deletion mixin/dashboards/query_frontend.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ local utils = import '../lib/utils.libsonnet';
},
},
grafanaDashboards+:: {
[if thanos.queryFrontend != null then 'queryFrontend.json']:
[if thanos.queryFrontend != null then 'query_frontend.json']:
local queryFrontendHandlerSelector = utils.joinLabels([thanos.queryFrontend.dashboard.selector, 'handler="query-frontend"']);
local queryFrontendTripperwareSelector = utils.joinLabels([thanos.queryFrontend.dashboard.selector, 'tripperware="query_range"']);
local queryFrontendOpSelector = utils.joinLabels([thanos.queryFrontend.dashboard.selector, 'op="query_range"']);
Expand Down
17 changes: 17 additions & 0 deletions mixin/lib/utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,21 @@
},

joinLabels(labels): std.join(', ', std.filter(function(x) std.length(std.stripChars(x, ' ')) > 0, labels)),

firstCharUppercase(parts): std.join(
'',
[
std.join(
'',
[std.asciiUpper(std.stringChars(part)[0]), std.substr(part, 1, std.length(part) - 1)]
)
for part in parts[1:std.length(parts)]
]
),

toCamelCase(parts): std.join('', [parts[0], self.firstCharUppercase(parts)]),

componentParts(name): std.split(name, '_'),

sanitizeComponentName(name): if std.length(self.componentParts(name)) > 1 then self.toCamelCase(self.componentParts(name)) else name,
}
4 changes: 2 additions & 2 deletions mixin/rules/bucket_replicate.libsonnet
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
local thanos = self,
bucket_replicate+:: {
bucketReplicate+:: {
selector: error 'must provide selector for Thanos Bucket Replicate dashboard',
},
prometheusRules+:: {
groups+: if thanos.bucket_replicate == null then [] else [
groups+: if thanos.bucketReplicate == null then [] else [
{
name: 'thanos-bucket-replicate.rules',
rules: [],
Expand Down
3 changes: 1 addition & 2 deletions mixin/runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

|Name|Summary|Description|Severity|Runbook|
|---|---|---|---|---|
|ThanosBucketReplicateErrorRate|Thanose Replicate is failing to run.|Thanos Replicate is failing to run, {{$value humanize}}% of attempts failed.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate)|
|ThanosBucketReplicateErrorRate|Thanos Replicate is failing to run.|Thanos Replicate is failing to run, {{$value humanize}}% of attempts failed.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateerrorrate)|
|ThanosBucketReplicateRunLatency|Thanos Replicate has a high latency for replicate operations.|Thanos Replicate {{$labels.job}} has a 99th percentile latency of {{$value}} seconds for the replicate operations.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicaterunlatency](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicaterunlatency)|

## thanos-compact
Expand All @@ -32,7 +32,6 @@

|Name|Summary|Description|Severity|Runbook|
|---|---|---|---|---|
|ThanosBucketReplicateIsDown|Thanos component has disappeared.|ThanosBucketReplicate has disappeared. Prometheus target for the component cannot be discovered.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateisdown](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosbucketreplicateisdown)|
|ThanosCompactIsDown|Thanos component has disappeared.|ThanosCompact has disappeared. Prometheus target for the component cannot be discovered.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompactisdown](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanoscompactisdown)|
|ThanosQueryIsDown|Thanos component has disappeared.|ThanosQuery has disappeared. Prometheus target for the component cannot be discovered.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryisdown](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosqueryisdown)|
|ThanosReceiveIsDown|Thanos component has disappeared.|ThanosReceive has disappeared. Prometheus target for the component cannot be discovered.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceiveisdown](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceiveisdown)|
Expand Down
2 changes: 1 addition & 1 deletion pkg/rules/rules_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ func testRulesAgainstExamples(t *testing.T, dir string, server rulespb.RulesServ
{
Name: "thanos-component-absent",
File: filepath.Join(dir, "alerts.yaml"),
Rules: []*rulespb.Rule{someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert},
Rules: []*rulespb.Rule{someAlert, someAlert, someAlert, someAlert, someAlert, someAlert},
Interval: 60,
PartialResponseStrategy: storepb.PartialResponseStrategy_ABORT,
},
Expand Down

0 comments on commit d2d74da

Please sign in to comment.