-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expires notify log sooner when possible #2982
Conversation
Attempt to fix #2961 |
@gotjosh We have ran this and noticed a 30% memory decrease with alertmanager with low repeat interval. Any chances to get this merged? |
notify/notify.go
Outdated
@@ -785,7 +785,12 @@ func (n SetNotifiesStage) Exec(ctx context.Context, l log.Logger, alerts ...*typ | |||
return ctx, nil, errors.New("resolved alerts missing") | |||
} | |||
|
|||
return ctx, alerts, n.nflog.Log(n.recv, gkey, firing, resolved) | |||
var expiry time.Duration | |||
if n, ok := RepeatInterval(ctx); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repeat interval should always be present in the context? At least *DedupStage.Exec()
returns an error if this is the case. I suggest that we do the same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
It seems useless to keep the notifications in the nflog for longer than twice the repeat interval. This should help reduce memory usage of clustered alertmanagers. Signed-off-by: Julien Pivotto <[email protected]>
7d9c41b
to
b044302
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I have a question on how the data retention interacts with repeat interval.
@@ -415,6 +415,11 @@ func (l *Log) Log(r *pb.Receiver, gkey string, firingAlerts, resolvedAlerts []ui | |||
} | |||
} | |||
|
|||
expiresAt := now.Add(l.retention) | |||
if expiry > 0 && l.retention > expiry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense to use the default retention only if expiry > 0?
If repeat_interval is 10d (because why not), it would make sense to keep the log for longer than the default 120h retention. We could also get rid of this check:
alertmanager/cmd/alertmanager/main.go
Lines 488 to 500 in d034f11
if r.RouteOpts.RepeatInterval > *retention { | |
level.Warn(configLogger).Log( | |
"msg", | |
"repeat_interval is greater than the data retention period. It can lead to notifications being repeated more often than expected.", | |
"repeat_interval", | |
r.RouteOpts.RepeatInterval, | |
"retention", | |
*retention, | |
"route", | |
r.Key(), | |
) | |
} | |
}) |
This would be change of behaviour, let's merge this while we think on the further change |
It seems useless to keep the notifications in the nflog for longer than
twice the repeat interval. This should help reduce memory usage of
clustered alertmanagers.
Signed-off-by: Julien Pivotto [email protected]