Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable reporting gRPC codes as labels in cortex and ingester_client request duration metrics #6561

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
* [ENHANCEMENT] Ingester: Add `-ingester.instance-limits.max-inflight-push-requests-bytes`. This limit protects the ingester against requests that together may cause an OOM. #6492
* [ENHANCEMENT] Ingester: add new per-tenant `cortex_ingester_local_limits` metric to expose the calculated local per-tenant limits seen at each ingester. Exports the local per-tenant series limit with label `{limit="max_global_series_per_user"}` #6403
* [ENHANCEMENT] Query-frontend: added "queue_time_seconds" field to "query stats" log. This is total time that query and subqueries spent in the queue, before queriers picked it up. #6537
* [ENHANCEMENT] All: enabled reporting gRPC codes as `status_code` label in `cortex_request_duration_seconds` and `cortex_ingester_client_request_duration_seconds` metrics. #6561
* [BUGFIX] Ring: Ensure network addresses used for component hash rings are formatted correctly when using IPv6. #6068
* [BUGFIX] Query-scheduler: don't retain connections from queriers that have shut down, leading to gradually increasing enqueue latency over time. #6100 #6145
* [BUGFIX] Ingester: prevent query logic from continuing to execute after queries are canceled. #6085
Expand All @@ -74,6 +75,7 @@

### Mixin

* [CHANGE] Dashboards: enabled reporting gRPC codes as `status_code` label in Mimir dashboards. In case of gRPC calls, the successful `status_code` label on `cortex_request_duration_seconds` and `cortex_ingester_client_request_duration_seconds` metrics has changed from 'success' and '2xx' to 'OK'. #6561
* [ENHANCEMENT] Dashboards: Optionally show rejected requests on Mimir Writes dashboard. Useful when used together with "early request rejection". #6132
* [ENHANCEMENT] Alerts: added a critical alert for `CompactorSkippedBlocksWithOutOfOrderChunks` when multiple blocks are affected. #6410
* [ENHANCEMENT] Dashboards: Added the min-replicas for autoscaling dashboards. #6528
Expand Down
10 changes: 10 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,16 @@
"fieldType": "boolean",
"fieldCategory": "advanced"
},
{
"kind": "field",
"name": "report_grpc_codes_in_instrumentation_label",
"required": false,
"desc": "If set to true, gRPC statuses will be reported in instrumentation labels with their string representations. Otherwise, they will be reported as \"error\".",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "server.report-grpc-codes-in-instrumentation-label",
"fieldType": "boolean"
},
{
"kind": "field",
"name": "graceful_shutdown_timeout",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2597,6 +2597,8 @@ Usage of ./cmd/mimir/mimir:
Base path to serve all API routes from (e.g. /v1/)
-server.register-instrumentation
Register the intrumentation handlers (/metrics etc). (default true)
-server.report-grpc-codes-in-instrumentation-label
If set to true, gRPC statuses will be reported in instrumentation labels with their string representations. Otherwise, they will be reported as "error".
-server.tls-cipher-suites string
Comma-separated list of cipher suites to use. If blank, the default Go cipher suites is used.
-server.tls-min-version string
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -671,6 +671,8 @@ Usage of ./cmd/mimir/mimir:
Optionally log request headers.
-server.log-request-headers-exclude-list string
Comma separated list of headers to exclude from loggin. Only used if server.log-request-headers is true.
-server.report-grpc-codes-in-instrumentation-label
If set to true, gRPC statuses will be reported in instrumentation labels with their string representations. Otherwise, they will be reported as "error".
-server.tls-cipher-suites string
Comma-separated list of cipher suites to use. If blank, the default Go cipher suites is used.
-server.tls-min-version string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,11 @@ grpc_tls_config:
# CLI flag: -server.register-instrumentation
[register_instrumentation: <boolean> | default = true]

# If set to true, gRPC statuses will be reported in instrumentation labels with
# their string representations. Otherwise, they will be reported as "error".
# CLI flag: -server.report-grpc-codes-in-instrumentation-label
[report_grpc_codes_in_instrumentation_label: <boolean> | default = false]

# (advanced) Timeout for graceful shutdowns
# CLI flag: -server.graceful-shutdown-timeout
[graceful_shutdown_timeout: <duration> | default = 30s]
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ require (
github.com/golang/snappy v0.0.4
github.com/google/gopacket v1.1.19
github.com/gorilla/mux v1.8.0
github.com/grafana/dskit v0.0.0-20231031132813-52f4e8d82d59
github.com/grafana/dskit v0.0.0-20231104112041-b3823cb5cdb4
github.com/grafana/e2e v0.1.1
github.com/hashicorp/golang-lru v1.0.2 // indirect
github.com/json-iterator/go v1.1.12
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -538,8 +538,8 @@ github.com/gosimple/slug v1.1.1 h1:fRu/digW+NMwBIP+RmviTK97Ho/bEj/C9swrCspN3D4=
github.com/gosimple/slug v1.1.1/go.mod h1:ER78kgg1Mv0NQGlXiDe57DpCyfbNywXXZ9mIorhxAf0=
github.com/grafana-tools/sdk v0.0.0-20220919052116-6562121319fc h1:PXZQA2WCxe85Tnn+WEvr8fDpfwibmEPgfgFEaC87G24=
github.com/grafana-tools/sdk v0.0.0-20220919052116-6562121319fc/go.mod h1:AHHlOEv1+GGQ3ktHMlhuTUwo3zljV3QJbC0+8o2kn+4=
github.com/grafana/dskit v0.0.0-20231031132813-52f4e8d82d59 h1:tWbF2UD8HgvpqyxV60zmUDDzJI2gybMJxw4/RY/UNTo=
github.com/grafana/dskit v0.0.0-20231031132813-52f4e8d82d59/go.mod h1:8dsy5tQOkeNQyjXpm5mQsbCu3H5uzeBD35MzRQFznKU=
github.com/grafana/dskit v0.0.0-20231104112041-b3823cb5cdb4 h1:GKbD6ueLnL7FQaZmIg8vc/kwzPOSgEiKdI7MVqRqBr8=
github.com/grafana/dskit v0.0.0-20231104112041-b3823cb5cdb4/go.mod h1:8dsy5tQOkeNQyjXpm5mQsbCu3H5uzeBD35MzRQFznKU=
github.com/grafana/e2e v0.1.1 h1:/b6xcv5BtoBnx8cZnCiey9DbjEc8z7gXHO5edoeRYxc=
github.com/grafana/e2e v0.1.1/go.mod h1:RpNLgae5VT+BUHvPE+/zSypmOXKwEu4t+tnEMS1ATaE=
github.com/grafana/goautoneg v0.0.0-20231010094147-47ce5e72a9ae h1:Yxbw9jKGJVC6qAK5Ubzzb/qZwM6rRMMqaDc/d4Vp3pM=
Expand Down
4 changes: 2 additions & 2 deletions integration/ingester_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,7 @@ func TestIngesterQuerying(t *testing.T) {
return counts[0], nil
}

successfulQueryRequests, err := queryRequestCount("2xx")
successfulQueryRequests, err := queryRequestCount("OK")
require.NoError(t, err)

cancelledQueryRequests, err := queryRequestCount("cancel")
Expand Down Expand Up @@ -662,7 +662,7 @@ func TestIngesterQueryingWithRequestMinimization(t *testing.T) {
[]string{"cortex_request_duration_seconds"},
e2e.WithLabelMatchers(
labels.MustNewMatcher(labels.MatchEqual, "route", "/cortex.Ingester/QueryStream"),
labels.MustNewMatcher(labels.MatchEqual, "status_code", "success"),
labels.MustNewMatcher(labels.MatchEqual, "status_code", "OK"),
),
e2e.SkipMissingMetrics,
e2e.WithMetricCount,
Expand Down
Loading