-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alert metric reports different results to what the user sees via API #2943
Alert metric reports different results to what the user sees via API #2943
Conversation
977ca67
to
a0d866c
Compare
@@ -33,8 +34,10 @@ const alertChannelLength = 200 | |||
type Alerts struct { | |||
cancel context.CancelFunc | |||
|
|||
alerts *store.Alerts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved alerts
because mtx
is not protecting it and this was a bit confusing at first. alerts
has its own mutex.
return prometheus.NewGaugeFunc( | ||
prometheus.GaugeOpts{ | ||
Name: "alertmanager_alerts", | ||
Help: "How many alerts by state.", | ||
Name: "alertmanager_marked_alerts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept the old metric intact functionality-wise, just added more label to complete the story.
c85ebe0
to
143cf12
Compare
Needs #2944 |
func (a *Alerts) count(state types.AlertState) int { | ||
var count int | ||
for _, alert := range a.alerts.List() { | ||
if alert.Resolved() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the big change, we now skip resolved alerts.
This LGTM. |
Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <[email protected]>
Signed-off-by: gotjosh <[email protected]>
143cf12
to
90f0c5b
Compare
@roidelapluie @simonpasquier I have just rebased - this should be all good. |
|
Thanks! |
More results, at the 500 alerts mark
LGTM, I will stop my test. |
…rometheus#2943) * Alert metric reports different results to what the user sees via API Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <[email protected]> Signed-off-by: Yijie Qin <[email protected]>
Fixes #1439 and #2619 (which are duplicates)
The previous metric is not technically reporting incorrect results as the alerts are still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name
alertmanager_marked_alerts
and repurposed the current metric to match what the user sees in the UI.I have tested this (and you can too) with the following:
endsAt
forAlwaysFiringExpired
,AlwaysFiringExpiringSoon
and others as necessary) - you want one alert to be expired on reception and one that eventually expires to see how the metrics behave.AlwaysFiringToBeSilenced
.AlwaysFiringActive
.Marker
New
alertmanager_alerts