Skip to content

Commit

Permalink
Merge branch 'main' into charleskorn/extract-linked-list-type
Browse files Browse the repository at this point in the history
  • Loading branch information
charleskorn authored Apr 29, 2024
2 parents 12f46d2 + 748a726 commit 4e15332
Show file tree
Hide file tree
Showing 762 changed files with 30,636 additions and 20,664 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/scripts/helm-weekly-release.sh
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,7 @@ new_chart_version=$(calculate_next_chart_version $current_chart_version $latest_
validate_version_update $new_chart_version $current_chart_version $latest_gem_tag $latest_mimir_tag

update_yaml_node $values_file .image.tag $latest_mimir_tag
update_yaml_node $values_file .smoke_test.image.tag $latest_mimir_tag
update_yaml_node $values_file .enterprise.image.tag $latest_gem_tag
update_yaml_node $values_file .continuous_test.image.tag $latest_mimir_tag
update_yaml_node $chart_file .appVersion $(extract_r_version $latest_mimir_tag)
update_yaml_node $chart_file .version $new_chart_version

Expand Down
2 changes: 1 addition & 1 deletion .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ run:

issues:
exclude-rules:
- linters: [revive]
- linters: [ revive ]
text: "if-return"
26 changes: 21 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
* [FEATURE] Store-gateway: Allow specific tenants to be enabled or disabled via `-store-gateway.enabled-tenants` or `-store-gateway.disabled-tenants` CLI flags or their corresponding YAML settings. #7653
* [FEATURE] New `-<prefix>.s3.bucket-lookup-type` flag configures lookup style type, used to access bucket in s3 compatible providers. #7684
* [FEATURE] Querier: add experimental streaming PromQL engine, enabled with `-querier.promql-engine=streaming`. #7693 #7899
* [FEATURE] New `/ingester/unregister-on-shutdown` HTTP endpoint allows dynamic access to ingesters' `-ingester.ring.unregister-on-shutdown` configuration. #7739
* [FEATURE] Server: added experimental [PROXY protocol support](https://www.haproxy.org/download/2.3/doc/proxy-protocol.txt). The PROXY protocol support can be enabled via `-server.proxy-protocol-enabled=true`. When enabled, the support is added both to HTTP and gRPC listening ports. #7698
* [ENHANCEMENT] Store-gateway: merge series from different blocks concurrently. #7456
* [ENHANCEMENT] Store-gateway: Add `stage="wait_max_concurrent"` to `cortex_bucket_store_series_request_stage_duration_seconds` which records how long the query had to wait for its turn for `-blocks-storage.bucket-store.max-concurrent`. #7609
Expand All @@ -19,27 +20,32 @@
* [ENHANCEMENT] Store-gateway: add `outcome` label to `cortex_bucket_stores_gate_duration_seconds` histogram metric. Possible values for the `outcome` label are: `rejected_canceled`, `rejected_deadline_exceeded`, `rejected_other`, and `permitted`. #7784
* [ENHANCEMENT] Query-frontend: use zero-allocation experimental decoder for active series queries via `-query-frontend.use-active-series-decoder`. #7665
* [ENHANCEMENT] Go: updated to 1.22.2. #7802
* [ENHANCEMENT] Query-frontend: support `limit` parameter on `/prometheus/api/v1/label/{name}/values` and `/prometheus/api/v1/labels` endpoints. #7722
* [ENHANCEMENT] Expose TLS configuration for the S3 backend client. #2652
* [ENHANCEMENT] Rules: Support expansion of native histogram values when using rule templates #7974
* [ENHANCEMENT] Rules: Add metric `cortex_prometheus_rule_group_last_restore_duration_seconds` which measures how long it takes to restore rule groups using the `ALERTS_FOR_STATE` series #7974
* [BUGFIX] Rules: improve error handling when querier is local to the ruler. #7567
* [BUGFIX] Querier, store-gateway: Protect against panics raised during snappy encoding. #7520
* [BUGFIX] Ingester: Prevent timely compaction of empty blocks. #7624
* [BUGFIX] querier: Don't cache context.Canceled errors for bucket index. #7620
* [BUGFIX] Store-gateway: account for `"other"` time in LabelValues and LabelNames requests. #7622
* [BUGFIX] Query-frontend: Don't panic when using the `-query-frontend.downstream-url` flag. #7651
* [BUGFIX] Ingester: when receiving multiple exemplars for a native histogram via remote write, sort them and only report an error if all are older than the latest exemplar as this could be a partial update. #7640
* [BUGFIX] Ingester: when receiving multiple exemplars for a native histogram via remote write, sort them and only report an error if all are older than the latest exemplar as this could be a partial update. #7640 #7948
* [BUGFIX] Ingester: don't retain blocks if they finish exactly on the boundary of the retention window. #7656
* [BUGFIX] Bug-fixes and improvements to experimental native histograms. #7744 #7813
* [BUGFIX] Querier: return an error when a query uses `label_join` with an invalid destination label name. #7744
* [BUGFIX] Compactor: correct outstanding job estimation in metrics and `compaction-planner` tool when block labels differ. #7745
* [BUGFIX] Ingester: turn native histogram validation errors in TSDB into soft ingester errors that result in returning 4xx to the end-user instead of 5xx. In the case of TSDB validation errors, the counter `cortex_discarded_samples_total` will be increased with the `reason` label set to `"invalid-native-histogram"`. #7736 #7773
* [BUGFIX] Do not wrap error message with `sampled 1/<frequency>` if it's not actually sampled. #7784
* [BUGFIX] Store-gateway: do not track cortex_querier_blocks_consistency_checks_failed_total metric if query has been canceled or interrued due to any error not related to blocks consistency check failed. #7752
* [BUGFIX] Ingester: ignore instances with no tokens when calculating local limits to prevent discards during ingester scale-up #7881
* [BUGFIX] Ingester: do not reuse exemplars slice in the write request if there are more than 10 exemplars per series. This should help to reduce the in-use memory in case of few requests with a very large number of exemplars. #7936
* [BUGFIX] Distributor: fix down scaling of native histograms in the distributor when timeseries unmarshal cache is in use. #7947

### Mixin

* [CHANGE] Alerts: Removed obsolete `MimirQueriesIncorrect` alert that used test-exporter metrics. Test-exporter support was however removed in Mimir 2.0 release. #7774
* [CHANGE] Fine-tuned `terminationGracePeriodSeconds` for the following components: #7364
* Querier: changed from `30` to `180`
* Query-scheduler: changed from `30` to `180`
* [CHANGE] Alerts: Change threshold for `MimirBucketIndexNotUpdated` alert to fire before queries begin to fail due to bucket index age. #7879
* [FEATURE] Dashboards: added 'Remote ruler reads networking' dashboard. #7751
* [ENHANCEMENT] Alerts: allow configuring alerts range interval via `_config.base_alerts_range_interval_minutes`. #7591
* [ENHANCEMENT] Dashboards: Add panels for monitoring distributor and ingester when using ingest-storage. These panels are disabled by default, but can be enabled using `show_ingest_storage_panels: true` config option. Similarly existing panels used when distributors and ingesters use gRPC for forwarding requests can be disabled by setting `show_grpc_ingestion_panels: false`. #7670 #7699
Expand All @@ -55,14 +61,21 @@
* [ENHANCEMENT] Dashboards: renamed rows in the "Remote ruler reads" and "Remote ruler reads resources" dashboards to match the actual component names. #7750
* [ENHANCEMENT] Dashboards: allow switching between using classic of native histograms in dashboards. #7627
* Overview dashboard, Status panel, `cortex_request_duration_seconds` metric.
* [ENHANCEMENT] Alerts: exclude `529` and `598` status codes from failure codes in `MimirRequestsError`. #7889
* [BUGFIX] Dashboards: Fix regular expression for matching read-path gRPC ingester methods to include querying of exemplars, label-related queries, or active series queries. #7676
* [BUGFIX] Dashboards: Fix user id abbreviations and column heads for Top Tenants dashboard. #7724

### Jsonnet

* [CHANGE] Memcached: Change default read timeout for chunks and index caches to `750ms` from `450ms`. #7778
* [ENHANCEMENT] Compactor: add `$._config.cortex_compactor_concurrent_rollout_enabled` option (disabled by default) that makes use of rollout-operator to speed up the rollout of compactors. #7783
* [CHANGE] Fine-tuned `terminationGracePeriodSeconds` for the following components: #7364
* Querier: changed from `30` to `180`
* Query-scheduler: changed from `30` to `180`
* [CHANGE] Change TCP port exposed by `mimir-continuous-test` deployment to match with updated defaults of its container image (see changes below). #7958
* [ENHANCEMENT] Compactor: add `$._config.cortex_compactor_concurrent_rollout_enabled` option (disabled by default) that makes use of rollout-operator to speed up the rollout of compactors. #7783 #7878
* [ENHANCEMENT] Shuffle-sharding: add `$._config.shuffle_sharding.ingest_storage_partitions_enabled` and `$._config.shuffle_sharding.ingester_partitions_shard_size` options, that allow configuring partitions shard size in ingest-storage mode. #7804
* [ENHANCEMENT] Rollout-operator: upgrade to v0.14.0.
* [ENHANCEMENT] Add `_config.autoscaling_querier_predictive_scaling_enabled` to scale querier based on inflight queries 7 days ago. #7775
* [BUGFIX] Guard against missing samples in KEDA queries. #7691

### Mimirtool
Expand All @@ -74,6 +87,8 @@
### Mimir Continuous Test

* [CHANGE] `mimir-continuous-test` has been deprecated and replaced by a Mimir module that can be run as a target from the `mimir` binary using `mimir -target=continuous-test`. #7753
* [CHANGE] `-server.metrics-port` flag is no longer available for use in the module run of mimir-continuous-test, including the grafana/mimir-continuous-test Docker image which uses the new module. Configuring this port is still possible in the binary, which is deprecated. #7747
* [CHANGE] Allowed authenticatication to Mimir using both Tenant ID and basic/bearer auth #7619.
* [BUGFIX] Set `User-Agent` header for all requests sent from the testing client. #7607

### Query-tee
Expand Down Expand Up @@ -138,6 +153,7 @@
* [CHANGE] Distributor: the metric `cortex_distributor_sample_delay_seconds` has been deprecated and will be removed in Mimir 2.14. #7516
* [CHANGE] Query-frontend: The deprecated YAML setting `frontend.cache_unaligned_requests` has been moved to `limits.cache_unaligned_requests`. #7519
* [CHANGE] Querier: the CLI flag `-querier.minimize-ingester-requests` has been moved from "experimental" to "advanced". #7638
* [CHANGE] Ingester: allow only POST method on `/ingester/shutdown`, as previously it was too easy to accidentally trigger through GET requests. At the same time, add an option to keep the existing behavior by introducing an `-api.get-request-for-ingester-shutdown-enabled` flag. This flag will be removed in Mimir 2.15. #7707
* [FEATURE] Introduce `-server.log-source-ips-full` option to log all IPs from `Forwarded`, `X-Real-IP`, `X-Forwarded-For` headers. #7250
* [FEATURE] Introduce `-tenant-federation.max-tenants` option to limit the max number of tenants allowed for requests when federation is enabled. #6959
* [FEATURE] Cardinality API: added a new `count_method` parameter which enables counting active label names. #7085
Expand Down
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@ go.mod @grafana/mimir-maintainers @grafanabot
go.sum @grafana/mimir-maintainers @grafanabot
/vendor/ @grafana/mimir-maintainers @grafanabot
/vendor/github.com/prometheus/alertmanager @grafana/mimir-ruler-and-alertmanager-maintainers @grafana/mimir-maintainers @grafanabot
/vendor/github.com/grafana/alerting @grafana/mimir-ruler-and-alertmanager-maintainers @grafana/mimir-maintainers @grafanabot
13 changes: 9 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,6 @@ lint: check-makefiles
faillint -paths "github.com/grafana/mimir/pkg/..." ./pkg/storage/sharding/...
faillint -paths "github.com/grafana/mimir/pkg/..." ./pkg/querier/engine/...
faillint -paths "github.com/grafana/mimir/pkg/..." ./pkg/querier/api/...
faillint -paths "github.com/grafana/mimir/pkg/..." ./pkg/util/globalerror

# Ensure all errors are report as APIError
faillint -paths "github.com/weaveworks/common/httpgrpc.{Errorf}=github.com/grafana/mimir/pkg/api/error.Newf" ./pkg/frontend/querymiddleware/...
Expand Down Expand Up @@ -383,11 +382,17 @@ lint: check-makefiles

# Ensure packages that no longer use a global logger don't reintroduce it
faillint -paths "github.com/grafana/mimir/pkg/util/log.{Logger}" \
./pkg/alertmanager/alertstore/... \
./pkg/ingester/... \
./pkg/alertmanager/... \
./pkg/compactor/... \
./pkg/distributor/... \
./pkg/flusher/... \
./pkg/frontend/... \
./pkg/ingester/... \
./pkg/querier/... \
./pkg/ruler/...
./pkg/ruler/... \
./pkg/scheduler/... \
./pkg/storage/... \
./pkg/storegateway/...

# We've copied github.com/NYTimes/gziphandler to pkg/util/gziphandler
# at least until https://github.com/nytimes/gziphandler/pull/112 is merged
Expand Down
11 changes: 8 additions & 3 deletions cmd/mimir-continuous-test/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@ package main

import (
"context"
"errors"
"flag"
"fmt"
"os"

"github.com/go-kit/log/level"
"github.com/grafana/dskit/flagext"
"github.com/grafana/dskit/log"
"github.com/grafana/dskit/modules"
"github.com/grafana/dskit/tracing"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/collectors"
Expand Down Expand Up @@ -55,7 +57,7 @@ func main() {
registry.MustRegister(version.NewCollector("mimir_continuous_test"))
registry.MustRegister(collectors.NewGoCollector())

i := instrumentation.NewMetricsServer(serverMetricsPort, registry)
i := instrumentation.NewMetricsServer(serverMetricsPort, registry, util_log.Logger)
if err := i.Start(); err != nil {
level.Error(logger).Log("msg", "Unable to start instrumentation server", "err", err.Error())
util_log.Flush()
Expand All @@ -74,8 +76,11 @@ func main() {
m := continuoustest.NewManager(cfg.Manager, logger)
m.AddTest(continuoustest.NewWriteReadSeriesTest(cfg.WriteReadSeriesTest, client, logger, registry))
if err := m.Run(context.Background()); err != nil {
level.Error(logger).Log("msg", "Failed to run continuous test", "err", err.Error())
if !errors.Is(err, modules.ErrStopProcess) {
level.Error(logger).Log("msg", "Failed to run continuous test", "err", err.Error())
util_log.Flush()
os.Exit(1)
}
util_log.Flush()
os.Exit(1)
}
}
Loading

0 comments on commit 4e15332

Please sign in to comment.