Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester: allow only POST method on /ingester/shutdown #7707

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@
* [CHANGE] Distributor: the metric `cortex_distributor_sample_delay_seconds` has been deprecated and will be removed in Mimir 2.14. #7516
* [CHANGE] Query-frontend: The deprecated YAML setting `frontend.cache_unaligned_requests` has been moved to `limits.cache_unaligned_requests`. #7519
* [CHANGE] Querier: the CLI flag `-querier.minimize-ingester-requests` has been moved from "experimental" to "advanced". #7638
* [CHANGE] Ingester: allow only POST method on `/ingester/shutdown`, as previously it was too easy to accidentally trigger through GET requests. At the same time, add an option to keep the existing behavior by introducing an `-api.get-request-for-ingester-shutdown-enabled` flag. This flag will be removed in Mimir 2.15. #7707
* [FEATURE] Introduce `-server.log-source-ips-full` option to log all IPs from `Forwarded`, `X-Real-IP`, `X-Forwarded-For` headers. #7250
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [FEATURE] Cardinality API: added a new `count_method` parameter which enables counting active label names. #7085
Expand Down
11 changes: 11 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,17 @@
"fieldType": "boolean",
"fieldCategory": "deprecated"
},
{
"kind": "field",
"name": "get_request_for_ingester_shutdown_enabled",
"required": false,
"desc": "Enable GET requests to the /ingester/shutdown endpoint to trigger an ingester shutdown. This is a potentially dangerous operation and should only be enabled consciously.",
"fieldValue": null,
"fieldDefaultValue": false,
"fieldFlag": "api.get-request-for-ingester-shutdown-enabled",
"fieldType": "boolean",
"fieldCategory": "deprecated"
},
{
"kind": "field",
"name": "alertmanager_http_prefix",
Expand Down
2 changes: 2 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,8 @@ Usage of ./cmd/mimir/mimir:
[experimental] Enable UTF-8 strict mode. Allows UTF-8 characters in the matchers for routes and inhibition rules, in silences, and in the labels for alerts. It is recommended to check both alertmanager_matchers_disagree_total and alertmanager_matchers_incompatible_total metrics before using this mode as otherwise some tenant configurations might fail to load.
-alertmanager.web.external-url string
The URL under which Alertmanager is externally reachable (eg. could be different than -http.alertmanager-http-prefix in case Alertmanager is served via a reverse proxy). This setting is used both to configure the internal requests router and to generate links in alert templates. If the external URL has a path portion, it will be used to prefix all HTTP endpoints served by Alertmanager, both the UI and API. (default http://localhost:8080/alertmanager)
-api.get-request-for-ingester-shutdown-enabled
[deprecated] Enable GET requests to the /ingester/shutdown endpoint to trigger an ingester shutdown. This is a potentially dangerous operation and should only be enabled consciously.
-api.skip-label-name-validation-header-enabled
Allows to skip label name validation via X-Mimir-SkipLabelNameValidation header on the http write path. Use with caution as it breaks PromQL. Allowing this for external clients allows any client to send invalid label names. After enabling it, requests with a specific HTTP header set to true will not have label names validated.
-auth.multitenancy-enabled
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,12 @@ api:
# CLI flag: -distributor.enable-otlp-metadata-storage
[enable_otel_metadata_translation: <boolean> | default = true]

# (deprecated) Enable GET requests to the /ingester/shutdown endpoint to
# trigger an ingester shutdown. This is a potentially dangerous operation and
# should only be enabled consciously.
# CLI flag: -api.get-request-for-ingester-shutdown-enabled
[get_request_for_ingester_shutdown_enabled: <boolean> | default = false]

# (advanced) HTTP URL path under which the Alertmanager ui and api will be
# served.
# CLI flag: -http.alertmanager-http-prefix
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Complete the following steps to scale down ingesters deployed in a single zone.

1. Scale down one ingester at a time:

a. Invoke the `/ingester/shutdown` API endpoint on the ingester to terminate.
a. Send a POST request to the `/ingester/shutdown` API endpoint on the ingester to terminate.

b. Wait until the API endpoint call has successfully returned and the ingester logged "finished flushing and shipping TSDB blocks".

Expand All @@ -134,7 +134,7 @@ To simplify the scale down process, you can leverage ingesters deployed in multi

For each zone, complete the following steps:

1. Invoke the `/ingester/shutdown` API endpoint on all ingesters that you want to terminate.
1. Send a POST request to the `/ingester/shutdown` API endpoint on all ingesters that you want to terminate.
1. Wait until the API endpoint calls have successfully returned and the ingester has logged "finished flushing and shipping TSDB blocks".
1. Send a `SIGINT` or `SIGTERM` signal to the processes of the ingesters that you want to terminate.
1. Wait until the blocks uploaded by terminated ingesters are available for querying before proceeding with the next zone.
Expand Down
4 changes: 2 additions & 2 deletions docs/sources/mimir/references/http-api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ This document groups API endpoints by service. Note that the API endpoints are e
| [HA tracker status](#ha-tracker-status) | Distributor | `GET /distributor/ha_tracker` |
| [Flush chunks / blocks](#flush-chunks--blocks) | Ingester | `GET,POST /ingester/flush` |
| [Prepare for Shutdown](#prepare-for-shutdown) | Ingester | `GET,POST,DELETE /ingester/prepare-shutdown` |
| [Shutdown](#shutdown) | Ingester | `GET,POST /ingester/shutdown` |
| [Shutdown](#shutdown) | Ingester | `POST /ingester/shutdown` |
| [Ingesters ring status](#ingesters-ring-status) | Distributor,Ingester | `GET /ingester/ring` |
| [Ingester tenants](#ingester-tenants) | Ingester | `GET /ingester/tenants` |
| [Ingester tenant TSDB](#ingester-tenant-tsdb) | Ingester | `GET /ingester/tsdb/{tenant}` |
Expand Down Expand Up @@ -409,7 +409,7 @@ This API endpoint is usually used by Kubernetes-specific scale down automations
### Shutdown

```
GET,POST /ingester/shutdown
POST /ingester/shutdown
```

This endpoint flushes in-memory time series data from ingesters to the long-term storage, and then shuts down the ingester service.
Expand Down
9 changes: 8 additions & 1 deletion pkg/api/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ type Config struct {
// TODO: Remove option in Mimir 2.14.
EnableOtelMetadataStorage bool `yaml:"enable_otel_metadata_translation" category:"deprecated"`

// TODO: Remove option in Mimir 2.15.
GETRequestForIngesterShutdownEnabled bool `yaml:"get_request_for_ingester_shutdown_enabled" category:"deprecated"`

AlertmanagerHTTPPrefix string `yaml:"alertmanager_http_prefix" category:"advanced"`
PrometheusHTTPPrefix string `yaml:"prometheus_http_prefix" category:"advanced"`

Expand All @@ -71,6 +74,7 @@ type Config struct {
func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.SkipLabelNameValidationHeader, "api.skip-label-name-validation-header-enabled", false, "Allows to skip label name validation via X-Mimir-SkipLabelNameValidation header on the http write path. Use with caution as it breaks PromQL. Allowing this for external clients allows any client to send invalid label names. After enabling it, requests with a specific HTTP header set to true will not have label names validated.")
f.BoolVar(&cfg.EnableOtelMetadataStorage, "distributor.enable-otlp-metadata-storage", true, "If true, store metadata when ingesting metrics via OTLP. This makes metric descriptions and types available for metrics ingested via OTLP.")
f.BoolVar(&cfg.GETRequestForIngesterShutdownEnabled, "api.get-request-for-ingester-shutdown-enabled", false, "Enable GET requests to the /ingester/shutdown endpoint to trigger an ingester shutdown. This is a potentially dangerous operation and should only be enabled consciously.")
cfg.RegisterFlagsWithPrefix("", f)
}

Expand Down Expand Up @@ -296,7 +300,10 @@ func (a *API) RegisterIngester(i Ingester) {
a.RegisterRoute("/ingester/flush", http.HandlerFunc(i.FlushHandler), false, true, "GET", "POST")
a.RegisterRoute("/ingester/prepare-shutdown", http.HandlerFunc(i.PrepareShutdownHandler), false, true, "GET", "POST", "DELETE")
a.RegisterRoute("/ingester/prepare-partition-downscale", http.HandlerFunc(i.PreparePartitionDownscaleHandler), false, true, "GET", "POST", "DELETE")
a.RegisterRoute("/ingester/shutdown", http.HandlerFunc(i.ShutdownHandler), false, true, "GET", "POST")
a.RegisterRoute("/ingester/shutdown", http.HandlerFunc(i.ShutdownHandler), false, true, "POST")
if a.cfg.GETRequestForIngesterShutdownEnabled {
a.RegisterDeprecatedRoute("/ingester/shutdown", http.HandlerFunc(i.ShutdownHandler), false, true, "GET")
}
a.RegisterRoute("/ingester/tsdb_metrics", http.HandlerFunc(i.UserRegistryHandler), true, true, "GET")

a.indexPage.AddLinks(defaultWeight, "Ingester", []IndexPageLink{
Expand Down
61 changes: 61 additions & 0 deletions pkg/api/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@ import (
"fmt"
"net"
"net/http"
"net/http/httptest"
"strconv"
"testing"

"github.com/go-kit/log"
"github.com/gorilla/mux"
"github.com/grafana/dskit/server"
"github.com/prometheus/client_golang/prometheus"
"github.com/stretchr/testify/require"

"github.com/grafana/mimir/pkg/querier/tenantfederation"
Expand Down Expand Up @@ -193,6 +195,62 @@ func TestApiGzip(t *testing.T) {
})
}

type MockIngester struct {
Ingester
}

func (mi MockIngester) ShutdownHandler(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusNoContent)
}

func TestApiIngesterShutdown(t *testing.T) {
for _, tc := range []struct {
name string
setFlag bool
expectedStatusCode int
}{
{
name: "flag set to true, enable GET request for ingester shutdown",
setFlag: true,
expectedStatusCode: http.StatusNoContent,
},
{
name: "flag not set (default), disable GET request for ingester shutdown",
setFlag: false,
expectedStatusCode: http.StatusMethodNotAllowed,
},
} {
t.Run(tc.name, func(t *testing.T) {
cfg := Config{
GETRequestForIngesterShutdownEnabled: tc.setFlag,
}
serverCfg := getServerConfig(t)
federationCfg := tenantfederation.Config{}
srv, err := server.New(serverCfg)
require.NoError(t, err)

go func() { _ = srv.Run() }()
t.Cleanup(srv.Stop)

api, err := New(cfg, federationCfg, serverCfg, srv, log.NewNopLogger())
require.NoError(t, err)

api.RegisterIngester(&MockIngester{})

req := httptest.NewRequest("GET", "/ingester/shutdown", nil)
w := httptest.NewRecorder()
api.server.HTTP.ServeHTTP(w, req)
require.Equal(t, tc.expectedStatusCode, w.Code)

// for POST request, it should always return 204
req = httptest.NewRequest("POST", "/ingester/shutdown", nil)
w = httptest.NewRecorder()
api.server.HTTP.ServeHTTP(w, req)
require.Equal(t, http.StatusNoContent, w.Code)
})
}
}

// Generates server config, with gRPC listening on random port.
func getServerConfig(t *testing.T) server.Config {
grpcHost, grpcPortNum := getHostnameAndRandomPort(t)
Expand All @@ -206,6 +264,9 @@ func getServerConfig(t *testing.T) server.Config {
GRPCListenPort: grpcPortNum,

GRPCServerMaxRecvMsgSize: 1024,

// Use new registry to avoid using default one and panic due to duplicate metrics.
Registerer: prometheus.NewPedanticRegistry(),
}
require.NoError(t, cfg.LogLevel.Set("info"))
return cfg
Expand Down
2 changes: 1 addition & 1 deletion tools/migrate-ingester-statefulsets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ while [[ $INSTANCES_TO_DOWNSCALE -gt 0 ]]; do
# wget (BusyBox version) will fail, but we don't care ... important thing is that it has triggered shutdown.
# -T causes wget to wait only 5 seconds, otherwise /ingester/shutdown takes a long time.
# Preferably we would wait for /ingester/shutdown to return, but unfortunately that doesn't work (even with big timeout), wget complains with weird error.
kubectl exec "$POD_TO_SHUTDOWN" --namespace="$NAMESPACE" -- wget -T 5 http://localhost:8080/ingester/shutdown >/dev/null 2>/dev/null || true
kubectl exec "$POD_TO_SHUTDOWN" --namespace="$NAMESPACE" -- wget --method POST -T 5 http://localhost:8080/ingester/shutdown >/dev/null 2>/dev/null || true

# While request to /ingester/shutdown completes only after flushing has finished, it unfortunately returns 204 status code,
# which confuses wget. That is the reason why instead of waiting for /ingester/shutdown to complete, this script waits for
Expand Down
Loading