Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support limits for silences #8241

Merged
merged 6 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### Grafana Mimir

* [FEATURE] Alertmanager: Added `-alertmanager.max-silences-count` and `-alertmanager.max-silence-size-bytes` to set limits on per tenant silences. Disabled by default. #6898
* [CHANGE] Build: `grafana/mimir` docker image is now based on `gcr.io/distroless/static-debian12` image. Alpine-based docker image is still available as `grafana/mimir-alpine`, until Mimir 2.15. #8204
* [CHANGE] Ingester: `/ingester/flush` endpoint is now only allowed to execute only while the ingester is in `Running` state. The 503 status code is returned if the endpoint is called while the ingester is not in `Running` state. #7486
* [CHANGE] Distributor: Include label name in `err-mimir-label-value-too-long` error message: #7740
Expand Down
20 changes: 20 additions & 0 deletions cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -4052,6 +4052,26 @@
"fieldFlag": "alertmanager.max-config-size-bytes",
"fieldType": "int"
},
{
"kind": "field",
"name": "alertmanager_max_silences_count",
"required": false,
"desc": "Maximum number of active and pending silences that a tenant can have at once. 0 = no limit.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "alertmanager.max-silences-count",
"fieldType": "int"
},
{
"kind": "field",
"name": "alertmanager_max_silence_size_bytes",
"required": false,
"desc": "Maximum silence size in bytes. 0 = no limit.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "alertmanager.max-silence-size-bytes",
"fieldType": "int"
},
{
"kind": "field",
"name": "alertmanager_max_templates_count",
Expand Down
4 changes: 4 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ Usage of ./cmd/mimir/mimir:
Maximum number of aggregation groups in Alertmanager's dispatcher that a tenant can have. Each active aggregation group uses single goroutine. When the limit is reached, dispatcher will not dispatch alerts that belong to additional aggregation groups, but existing groups will keep working properly. 0 = no limit.
-alertmanager.max-recv-msg-size int
Maximum size (bytes) of an accepted HTTP request body. (default 104857600)
-alertmanager.max-silence-size-bytes int
Maximum silence size in bytes. 0 = no limit.
-alertmanager.max-silences-count int
Maximum number of active and pending silences that a tenant can have at once. 0 = no limit.
-alertmanager.max-template-size-bytes int
Maximum size of single template in tenant's Alertmanager configuration uploaded via Alertmanager API. 0 = no limit.
-alertmanager.max-templates-count int
Expand Down
4 changes: 4 additions & 0 deletions cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ Usage of ./cmd/mimir/mimir:
Maximum size of configuration file for Alertmanager that tenant can upload via Alertmanager API. 0 = no limit.
-alertmanager.max-dispatcher-aggregation-groups int
Maximum number of aggregation groups in Alertmanager's dispatcher that a tenant can have. Each active aggregation group uses single goroutine. When the limit is reached, dispatcher will not dispatch alerts that belong to additional aggregation groups, but existing groups will keep working properly. 0 = no limit.
-alertmanager.max-silence-size-bytes int
Maximum silence size in bytes. 0 = no limit.
-alertmanager.max-silences-count int
Maximum number of active and pending silences that a tenant can have at once. 0 = no limit.
-alertmanager.max-template-size-bytes int
Maximum size of single template in tenant's Alertmanager configuration uploaded via Alertmanager API. 0 = no limit.
-alertmanager.max-templates-count int
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3440,6 +3440,15 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -alertmanager.max-config-size-bytes
[alertmanager_max_config_size_bytes: <int> | default = 0]
# Maximum number of active and pending silences that a tenant can have at once.
# 0 = no limit.
# CLI flag: -alertmanager.max-silences-count
[alertmanager_max_silences_count: <int> | default = 0]
# Maximum silence size in bytes. 0 = no limit.
# CLI flag: -alertmanager.max-silence-size-bytes
[alertmanager_max_silence_size_bytes: <int> | default = 0]
# Maximum number of templates in tenant's Alertmanager configuration uploaded
# via Alertmanager API. 0 = no limit.
# CLI flag: -alertmanager.max-templates-count
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ require (
github.com/hashicorp/consul/api v1.28.2 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-cleanhttp v0.5.2 // indirect
github.com/hashicorp/go-hclog v1.5.0 // indirect
github.com/hashicorp/go-hclog v1.6.2 // indirect
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
github.com/hashicorp/go-msgpack v1.1.5 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
Expand Down Expand Up @@ -284,4 +284,4 @@ replace github.com/opentracing-contrib/go-stdlib => github.com/grafana/opentraci
replace github.com/opentracing-contrib/go-grpc => github.com/charleskorn/go-grpc v0.0.0-20231024023642-e9298576254f

// Replacing prometheus/alertmanager with our fork.
replace github.com/prometheus/alertmanager => github.com/grafana/prometheus-alertmanager v0.25.1-0.20240524091923-8090d8837b5f
replace github.com/prometheus/alertmanager => github.com/grafana/prometheus-alertmanager v0.25.1-0.20240531172444-6ad94e405c5a
8 changes: 4 additions & 4 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -521,8 +521,8 @@ github.com/grafana/mimir-prometheus v0.0.0-20240515135245-e5b85c151ba8 h1:XmqfG3
github.com/grafana/mimir-prometheus v0.0.0-20240515135245-e5b85c151ba8/go.mod h1:ZlD3SoAHSwXK5VGLHv78Jh5kOpgSLaQAzt9gxq76fLM=
github.com/grafana/opentracing-contrib-go-stdlib v0.0.0-20230509071955-f410e79da956 h1:em1oddjXL8c1tL0iFdtVtPloq2hRPen2MJQKoAWpxu0=
github.com/grafana/opentracing-contrib-go-stdlib v0.0.0-20230509071955-f410e79da956/go.mod h1:qtI1ogk+2JhVPIXVc6q+NHziSmy2W5GbdQZFUHADCBU=
github.com/grafana/prometheus-alertmanager v0.25.1-0.20240524091923-8090d8837b5f h1:EtKg1joztl0yM5tqj51LzzUmiWzPz/5zrYr8Bc7Y5pk=
github.com/grafana/prometheus-alertmanager v0.25.1-0.20240524091923-8090d8837b5f/go.mod h1:01sXtHoRwI8W324IPAzuxDFOmALqYLCOhvSC2fUHWXc=
github.com/grafana/prometheus-alertmanager v0.25.1-0.20240531172444-6ad94e405c5a h1:0zyw9u1O0PBB0bep9SyfM0sz2Q4XKYuNpTcIGkW3jSk=
github.com/grafana/prometheus-alertmanager v0.25.1-0.20240531172444-6ad94e405c5a/go.mod h1:01sXtHoRwI8W324IPAzuxDFOmALqYLCOhvSC2fUHWXc=
github.com/grafana/pyroscope-go/godeltaprof v0.1.6 h1:nEdZ8louGAplSvIJi1HVp7kWvFvdiiYg3COLlTwJiFo=
github.com/grafana/pyroscope-go/godeltaprof v0.1.6/go.mod h1:Tk376Nbldo4Cha9RgiU7ik8WKFkNpfds98aUzS8omLE=
github.com/grafana/regexp v0.0.0-20240531075221-3685f1377d7b h1:oMAq12GxTpwo9jxbnG/M4F/HdpwbibTaVoxNA0NZprY=
Expand All @@ -549,8 +549,8 @@ github.com/hashicorp/go-hclog v0.9.2/go.mod h1:5CU+agLiy3J7N7QjHK5d05KxGsuXiQLrj
github.com/hashicorp/go-hclog v0.12.0/go.mod h1:whpDNt7SSdeAju8AWKIWsul05p54N/39EeqMAyrmvFQ=
github.com/hashicorp/go-hclog v0.16.2/go.mod h1:whpDNt7SSdeAju8AWKIWsul05p54N/39EeqMAyrmvFQ=
github.com/hashicorp/go-hclog v1.2.0/go.mod h1:whpDNt7SSdeAju8AWKIWsul05p54N/39EeqMAyrmvFQ=
github.com/hashicorp/go-hclog v1.5.0 h1:bI2ocEMgcVlz55Oj1xZNBsVi900c7II+fWDyV9o+13c=
github.com/hashicorp/go-hclog v1.5.0/go.mod h1:W4Qnvbt70Wk/zYJryRzDRU/4r0kIg0PVHBcfoyhpF5M=
github.com/hashicorp/go-hclog v1.6.2 h1:NOtoftovWkDheyUM/8JW3QMiXyxJK3uHRK7wV04nD2I=
github.com/hashicorp/go-hclog v1.6.2/go.mod h1:W4Qnvbt70Wk/zYJryRzDRU/4r0kIg0PVHBcfoyhpF5M=
github.com/hashicorp/go-immutable-radix v1.0.0/go.mod h1:0y9vanUI8NX6FsYoO3zeMjhV/C5i9g4Q3DwcSNZ4P60=
github.com/hashicorp/go-immutable-radix v1.3.1 h1:DKHmCUm2hRBK510BaiZlwvpD40f8bJFeZnpfm2KLowc=
github.com/hashicorp/go-immutable-radix v1.3.1/go.mod h1:0y9vanUI8NX6FsYoO3zeMjhV/C5i9g4Q3DwcSNZ4P60=
Expand Down
8 changes: 6 additions & 2 deletions pkg/alertmanager/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,12 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) {
am.silences, err = silence.New(silence.Options{
SnapshotFile: silencesFile,
Retention: cfg.Retention,
Logger: log.With(am.logger, "component", "silences"),
Metrics: am.registry,
Limits: silence.Limits{
MaxSilences: cfg.Limits.AlertmanagerMaxSilencesCount(cfg.UserID),
MaxPerSilenceBytes: cfg.Limits.AlertmanagerMaxSilenceSizeBytes(cfg.UserID),
Copy link
Contributor

@56quarters 56quarters May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max-per-silence-bytes makes this limit a lot easier to understand (compared to max-silence-size-bytes). WDYT about using max per silence bytes everywhere? OK too if you'd like to keep it as-is for consistency with other limits.

EDIT: lol, I see @alexweav suggested the opposite. Ignore me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep it consistent with other limits 🙂

},
Logger: log.With(am.logger, "component", "silences"),
Metrics: am.registry,
})
if err != nil {
return nil, fmt.Errorf("failed to create silences: %v", err)
Expand Down
84 changes: 84 additions & 0 deletions pkg/alertmanager/alertmanager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/prometheus/alertmanager/cluster/clusterpb"
"github.com/prometheus/alertmanager/config"
"github.com/prometheus/alertmanager/featurecontrol"
"github.com/prometheus/alertmanager/silence/silencepb"
"github.com/prometheus/alertmanager/types"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/testutil"
Expand Down Expand Up @@ -323,3 +324,86 @@ func testLimiter(t *testing.T, limits Limits, ops []callbackOp) {
assert.Equal(t, op.expectedTotalSize, totalSize, "wrong total size, op %d", ix)
}
}

func TestSilenceLimits(t *testing.T) {
user := "test"

r := prometheus.NewPedanticRegistry()
am, err := New(&Config{
UserID: user,
Logger: log.NewNopLogger(),
Limits: &mockAlertManagerLimits{
maxSilencesCount: 1,
maxSilenceSizeBytes: 2 << 11, // 4KB,
},
Features: featurecontrol.NoopFlags{},
TenantDataDir: t.TempDir(),
ExternalURL: &url.URL{Path: "/am"},
ShardingEnabled: true,
Store: prepareInMemoryAlertStore(),
Replicator: &stubReplicator{},
ReplicationFactor: 1,
// We have to set this interval non-zero, though we don't need the persister to do anything.
PersisterConfig: PersisterConfig{Interval: time.Hour},
}, r)
require.NoError(t, err)
defer am.StopAndWait()

// Insert sil1 should succeed without error.
sil1 := &silencepb.Silence{
Matchers: []*silencepb.Matcher{{Name: "a", Pattern: "b"}},
StartsAt: time.Now(),
EndsAt: time.Now().Add(5 * time.Minute),
}
id1, err := am.silences.Set(sil1)
require.NoError(t, err)
require.NotEqual(t, "", id1)

// Insert sil2 should fail because maximum number of silences
// has been exceeded.
sil2 := &silencepb.Silence{
Matchers: []*silencepb.Matcher{{Name: "a", Pattern: "b"}},
StartsAt: time.Now(),
EndsAt: time.Now().Add(5 * time.Minute),
}
id2, err := am.silences.Set(sil2)
require.EqualError(t, err, "exceeded maximum number of silences: 1 (limit: 1)")
require.Equal(t, "", id2)

// Expire sil1. This should allow sil2 to be inserted.
require.NoError(t, am.silences.Expire(id1))
id2, err = am.silences.Set(sil2)
require.NoError(t, err)
require.NotEqual(t, "", id2)

// Should be able to update sil2 without hitting the limit.
_, err = am.silences.Set(sil2)
require.NoError(t, err)

// Expire sil2.
require.NoError(t, am.silences.Expire(id2))

// Insert sil3 should fail because it exceeds maximum size.
sil3 := &silencepb.Silence{
Matchers: []*silencepb.Matcher{
{
Name: strings.Repeat("a", 2<<9),
Pattern: strings.Repeat("b", 2<<9),
},
{
Name: strings.Repeat("c", 2<<9),
Pattern: strings.Repeat("d", 2<<9),
},
},
CreatedBy: strings.Repeat("e", 2<<9),
Comment: strings.Repeat("f", 2<<9),
StartsAt: time.Now(),
EndsAt: time.Now().Add(5 * time.Minute),
}
id3, err := am.silences.Set(sil3)
require.Error(t, err)
// Do not check the exact size as it can change between consecutive runs
// due to padding.
require.Contains(t, err.Error(), "silence exceeded maximum size")
require.Equal(t, "", id3)
}
6 changes: 6 additions & 0 deletions pkg/alertmanager/multitenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,12 @@ type Limits interface {
// AlertmanagerMaxConfigSize returns max size of configuration file that user is allowed to upload. If 0, there is no limit.
AlertmanagerMaxConfigSize(tenant string) int

// AlertmanagerMaxSilencesCount returns the max number of active and pending silences. If negative or 0, there is no limit.
AlertmanagerMaxSilencesCount(tenant string) int

// AlertmanagerMaxSilenceSizeBytes returns the max silence size in bytes. If negative or 0, there is no limit.
AlertmanagerMaxSilenceSizeBytes(tenant string) int

// AlertmanagerMaxTemplatesCount returns max number of templates that tenant can use in the configuration. 0 = no limit.
AlertmanagerMaxTemplatesCount(tenant string) int

Expand Down
8 changes: 8 additions & 0 deletions pkg/alertmanager/multitenant_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2353,6 +2353,8 @@ type mockAlertManagerLimits struct {
emailNotificationRateLimit rate.Limit
emailNotificationBurst int
maxConfigSize int
maxSilencesCount int
maxSilenceSizeBytes int
maxTemplatesCount int
maxSizeOfTemplate int
maxDispatcherAggregationGroups int
Expand All @@ -2364,6 +2366,12 @@ func (m *mockAlertManagerLimits) AlertmanagerMaxConfigSize(string) int {
return m.maxConfigSize
}

func (m *mockAlertManagerLimits) AlertmanagerMaxSilencesCount(string) int { return m.maxSilencesCount }

func (m *mockAlertManagerLimits) AlertmanagerMaxSilenceSizeBytes(string) int {
return m.maxSilenceSizeBytes
}

func (m *mockAlertManagerLimits) AlertmanagerMaxTemplatesCount(string) int {
return m.maxTemplatesCount
}
Expand Down
12 changes: 12 additions & 0 deletions pkg/util/validation/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,8 @@ type Limits struct {
NotificationRateLimitPerIntegration NotificationRateLimitMap `yaml:"alertmanager_notification_rate_limit_per_integration" json:"alertmanager_notification_rate_limit_per_integration"`

AlertmanagerMaxConfigSizeBytes int `yaml:"alertmanager_max_config_size_bytes" json:"alertmanager_max_config_size_bytes"`
AlertmanagerMaxSilencesCount int `yaml:"alertmanager_max_silences_count" json:"alertmanager_max_silences_count"`
AlertmanagerMaxSilenceSizeBytes int `yaml:"alertmanager_max_silence_size_bytes" json:"alertmanager_max_silence_size_bytes"`
AlertmanagerMaxTemplatesCount int `yaml:"alertmanager_max_templates_count" json:"alertmanager_max_templates_count"`
AlertmanagerMaxTemplateSizeBytes int `yaml:"alertmanager_max_template_size_bytes" json:"alertmanager_max_template_size_bytes"`
AlertmanagerMaxDispatcherAggregationGroups int `yaml:"alertmanager_max_dispatcher_aggregation_groups" json:"alertmanager_max_dispatcher_aggregation_groups"`
Expand Down Expand Up @@ -335,6 +337,8 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
}
f.Var(&l.NotificationRateLimitPerIntegration, "alertmanager.notification-rate-limit-per-integration", "Per-integration notification rate limits. Value is a map, where each key is integration name and value is a rate-limit (float). On command line, this map is given in JSON format. Rate limit has the same meaning as -alertmanager.notification-rate-limit, but only applies for specific integration. Allowed integration names: "+strings.Join(allowedIntegrationNames, ", ")+".")
f.IntVar(&l.AlertmanagerMaxConfigSizeBytes, "alertmanager.max-config-size-bytes", 0, "Maximum size of configuration file for Alertmanager that tenant can upload via Alertmanager API. 0 = no limit.")
f.IntVar(&l.AlertmanagerMaxSilencesCount, "alertmanager.max-silences-count", 0, "Maximum number of active and pending silences that a tenant can have at once. 0 = no limit.")
f.IntVar(&l.AlertmanagerMaxSilenceSizeBytes, "alertmanager.max-silence-size-bytes", 0, "Maximum silence size in bytes. 0 = no limit.")
f.IntVar(&l.AlertmanagerMaxTemplatesCount, "alertmanager.max-templates-count", 0, "Maximum number of templates in tenant's Alertmanager configuration uploaded via Alertmanager API. 0 = no limit.")
f.IntVar(&l.AlertmanagerMaxTemplateSizeBytes, "alertmanager.max-template-size-bytes", 0, "Maximum size of single template in tenant's Alertmanager configuration uploaded via Alertmanager API. 0 = no limit.")
f.IntVar(&l.AlertmanagerMaxDispatcherAggregationGroups, "alertmanager.max-dispatcher-aggregation-groups", 0, "Maximum number of aggregation groups in Alertmanager's dispatcher that a tenant can have. Each active aggregation group uses single goroutine. When the limit is reached, dispatcher will not dispatch alerts that belong to additional aggregation groups, but existing groups will keep working properly. 0 = no limit.")
Expand Down Expand Up @@ -922,6 +926,14 @@ func (o *Overrides) AlertmanagerMaxConfigSize(userID string) int {
return o.getOverridesForUser(userID).AlertmanagerMaxConfigSizeBytes
}

func (o *Overrides) AlertmanagerMaxSilencesCount(userID string) int {
return o.getOverridesForUser(userID).AlertmanagerMaxSilencesCount
}

func (o *Overrides) AlertmanagerMaxSilenceSizeBytes(userID string) int {
return o.getOverridesForUser(userID).AlertmanagerMaxSilenceSizeBytes
}

func (o *Overrides) AlertmanagerMaxTemplatesCount(userID string) int {
return o.getOverridesForUser(userID).AlertmanagerMaxTemplatesCount
}
Expand Down
5 changes: 3 additions & 2 deletions vendor/github.com/hashicorp/go-hclog/README.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading