-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolved alert sent without sending corresponding active alert #878
Comments
What version of AM are you using? |
version=0.7.1, branch=master, revision=ab4138299b94c78dc554ea96e2ab28d04b048059 |
We also get this, with the following alert:
For this one we get "RESOLVED" notifications once in a while, even if we didn't get the initial ones. We're also on 0.7.1, running in Kubernetes 1.5.7, Prometheus 1.7.1. |
We're seeing this too on v0.7.0. |
Also happens in our setup with v0.9.1 |
Question for the group: does this only happen with a particular alert? So far I'm only seeing it with an alert such as:
Other alerts seem to be fine, so I'm wondering if there is a correlation in the actual alert here. Running 0.9.1 with mesh-mode, receiving alerts from multiple prometheus instances. Edit: Nevermind, seeing this on a few different alerts. |
I'm also having this issue: My alert config file is: global: And my alert.rules: |
Seeing the same on the latest 0.10.0. Can we get some input from the alertmanager folks on what we can do to help push this along? It seems that most of our alerts now are being sent as resolved without the accompanying firing alert.
|
@alkalinecoffee Do you have a snippet of the log when this problem happened? |
I do think I found a pattern here. I think what's happening is the alert fires, and before the alert is actually sent out, the resolved message comes in very soon after (under a second, almost simultaenously), and that's what is sent out instead. This reflects what's seen in the original post here. So for example, I have an alert for high memory usage over a 3m period. I have an application that sees memory spikes for about 3 minutes at a time. My alertmanager config also has It seems to me that if the alert fires and is resolved between 3rd and 5th minute, the firing alert is never sent out (which makes sense), but the resolved alert may be sent out (which doesn't make sense). I'm able to pretty reliably replicate the issue and will work on a setup for others to easily replicate for further debugging. |
I confirm that my experiences as I remember them seem to fit within the parameters described in the comment above. |
@alkalinecoffee Any progress or results on this? I'm also getting duplicate resolved alerts with no active firing ones. I'm on Alertmanager v0.12.0. Here's my config:
|
I haven't been able to duplicate it reliably locally, and I haven't looked at it since before the holidays. From what I can tell, the firing and resolved alerts are sent immediately after the other, and alertmanager appears to only honor the resolved one. We've since adjusted our alerts with longer time period thresholds, so we just don't see this issue much anymore. It'll be more prevalent if your alerts have low thresholds and fire often. The important thing is that alerts don't appear to be outright dropped--they are fired and then resolved just as fast. |
I can explain why this happens sometimes. Assuming this AlertManager configuration with a single route:
Here is the workflow:
Now if the configuration is changed to include the
The duplication doesn't happen since alertA and alertB don't fall in the same aggregation group. |
@simonpasquier That's interesting. I'll change my |
I understand it's a workaround, but be aware that adding |
Well that's no good. But it also makes sense the more I think about it. @simonpasquier Is that your experience with that workaround? |
@nsaud01 @alkalinecoffee yes, if you want to group CPU alerts for all instances in a single notification then you can't use the workaround... |
I'll try to send a PR this week. |
Update github.com/prometheus/common dependencies
I have alertmanager sending notifications to Slack. Today it sent a "resolved" notification to Slack without ever sending an "active" notification. This is contrary to my expectations - I expect "resolved" to only be sent if an "active" is sent first.
Here is the log:
Configuration:
Prometheus main configuration:
Prometheus alerts configuration:
The text was updated successfully, but these errors were encountered: