Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create v2-1.md #1848

Merged
merged 12 commits into from
May 17, 2022
39 changes: 39 additions & 0 deletions docs/sources/release-notes/v2.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: "Grafana Mimir version 2.1 release notes"
menuTitle: "V2.1 release notes"
description: "Release notes for Grafana Mimir version 2.1"
weight: 200
---

# Grafana Mimir version 2.1 release notes

Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.

Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the [Changelog](https://github.com/grafana/mimir/blob/main/CHANGELOG.md).

## Features and enhancements

- **Mimir on ARM**: We now publish Docker images for both `amd64` and `arm64`, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the the [Mimir docker registry](https://hub.docker.com/r/grafana/mimir). Note that our existing integration test suite only uses the `amd64` images, which means we cannot make any functional or performance guarantees about the `arm64` images.

- **`Remote` ruler mode for improved rule evaluation performance**: We've added a `remote` mode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the [query-frontend]({{< relref "../operators-guide/architecture/components/query-frontend/index.md" >}}) rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding). `Remote` mode is considered experimental and is off by default. To enable, see [remote ruler]({{< relref "../operators-guide/architecture/components/ruler/index.md#remote" >}}).

- **Per-tenant custom trackers for monitoring cardinality**: In Grafana Mimir 2.0, we introduced a [custom tracker feature]({{< relref "../operators-guide/configuring/configuring-custom-trackers.md" >}}) that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we've made it possible to configure custom trackers via the [runtime configuration file]({{< relref "../operators-guide/configuring/about-runtime-configuration.md" >}}). This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.

- **Reduce cardinality of Grafana Mimir's `/metrics` endpoint**: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made [several optimizations](https://github.com/grafana/mimir/issues/1750) which decreased series count on the `/metrics` endpoint by more than 10%.

## Upgrade considerations

We've updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:

- We've changed the default for `-blocks-storage.tsdb.isolation-enabled` from `true` to `false`. We've marked this flag as deprecated and will remove it completely in 2 releases. [TSDB isolation](https://grafana.com/blog/2020/05/05/how-isolation-improves-queries-in-prometheus-2.17/) is a feature inherited from Prometheus that didn't provide any benefit given Grafana Mimir's distributed architecture and in our [1 billion series load test](https://grafana.com/blog/2022/04/08/how-we-scaled-our-new-prometheus-tsdb-grafana-mimir-to-1-billion-active-series/#prometheus-tsdb-enhancements) we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.

- The store gateway attributes cache is now enabled by default (achieved by updating the default for `-blocks-storage.bucket-store.chunks-cache.attributes-in-memory-max-items` from `0` to `50000`). This in-memory cache makes it faster to look up object attributes for chunk data. We've been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
pracucci marked this conversation as resolved.
Show resolved Hide resolved

## Bug fixes

### 2.1.0 bug fixes

- [PR 1704](https://github.com/grafana/mimir/pull/1704): Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
- [PR 1835](https://github.com/grafana/mimir/pull/1835): Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
- [PR 1836](https://github.com/grafana/mimir/pull/1836): The ability to run Alertmanager with `local` storage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we've made it possible to again run Alertmanager with `local` storage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.
- [PR 1715](https://github.com/grafana/mimir/pull/1715): Restored Grafana Mimir's ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.