-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webhooks for firing alerts contain zeroed EndsAt #3351
Comments
I found the original commit by Brian here. I think the need to do this is a consequence of how Alertmanager always uses An immediate fix would be to instead remove the |
The more I think about it, the more I feel like the semantics of flushing and time are broken. If we assume that once we've determined that an alert needs to flush and it enters the subsequent stages, it should not be time-constrained? Instead, it should just be forced flush. I'd argue that modifying endsAt is not something we wanna do. |
I discussed with Josh and we agreed that an appropriate fix would be instead of calculating if the alert was firing or not by checking This means that instead of calculating if the alert is firing or not based on the end time, it just reads the new field called |
I think that given that an alert can reach the notifier before it's resolve date, so it's state is computed as firing, but the notification is not guaranteed to reach the user in time (for example, a SMTP server can cause some delay), making a snapshot of the alert's data at the time of determining if the alert should be flushed is perfectly fine.
That should work for me. |
This looks the right approach to me. |
I started on a fix this issue and it appears that for _, alert := range alerts {
a := *alert
// Ensure that alerts don't resolve as time move forwards.
if !a.ResolvedAt(now) {
a.EndsAt = time.Time{}
}
alertsSlice = append(alertsSlice, &a)
} func Alerts(alerts ...*Alert) model.Alerts {
res := make(model.Alerts, 0, len(alerts))
for _, a := range alerts {
v := a.Alert
// If the end timestamp is not reached yet, do not expose it.
if !a.Resolved() {
v.EndsAt = time.Time{}
}
res = append(res, &v)
}
return res
} If I remove just one then the webhook still contains a zeroed EndsAt, and makes me believe that the code in Second, I was thinking about changing the code in type AlertWithStatus {
Alert
Status string
} for _, alert := range alerts {
a := types.AlertWithStatus{
Alert: *alert,
Status: "", // status goes here
}
alertsSlice = append(alertsSlice, &a)
} This seems like a reasonable approach to changing alerts from dispatch onwards, without having to make changes to data before that. The downside is there are a lot of function signatures that will need to be changed, so expect a lot of noise in the diff. |
Hi guys, any progress on this? I am also experiencing this issue, I am forwarding alerts to Alerta and Alerta immediately marks them as closed since the invalid endsAt time. |
This commit adds the StatusAt method for the Alert struct. It calls the ResolvedAt method while Status calls the Resolved method. This method will be used in Alertmanager to fix issue prometheus/alertmanager#3351. Signed-off-by: George Robinson <[email protected]>
I think I found a less intrusive fix for this. Instead of calling // If the end timestamp is not reached yet, do not expose it.
if !a.Resolved() {
v.EndsAt = time.Time{}
} and also solve the issue that #1686 was intended to fix. |
Here is the PR #3805. |
What did you do?
It appears that webhooks for firing alerts contain zeroed EndsAt timestamps
0001-01-01T00:00:00Z
instead of the actual EndsAt timestamp of the alert as can be seen in/v2/alerts
.Having looked into this further the code that causes this can be found on Line 507 of
dispatch.go
:I think the intention of this code is to prevent alerts from becoming resolved during subsequent stages of the receiver such as WaitStage and DedupStage. This can happen when a firing alert enters the stages at time t1 but doesn't finish the stages until time t2 > EndsAt.
What did you expect to see?
I expect to see either
EndsAt
contain a non-zero timestamp or for theEndsAt
column to be omitted from the JSON for firing alerts.What did you see instead? Under which circumstances?
Related: #3341
The text was updated successfully, but these errors were encountered: