-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inhibited alerts are being sent as resolved #891
Comments
@sciffer Thanks for reporting! Could you supply your Prometheus and Alertmanager config as well as the alert so we can try to reproduce this? |
Hi @mxinden, thx for responding. I have attached our alertmanager yml file (removed the keys and aush parts where relevant) And the relevant alerting Prometheus config is pasted bellow here The reason we need the inhibition(which works great) rules is because for each team we have two Prometheus servers doing the same scraping and we want only one alert to be fired. Please let us know if more information is needed Prometheus alerting configs:
|
Possibly related: #878 |
Any updates on this issue ? |
@gfliker @sciffer I am sorry that this takes so long. I am able to reproduce your issue. First of all a general question: Why are you using inhibition rules to de-duplicate your alerts send by two identical Prometheus for HA? If I am not misunderstanding your use case this can be done with the default behaviour of Alertmanager (See the FAQ). Whenever two alerts with the same label set come in, they are automatically de-duplicated into one alert. Due to the inhibition logic my guess would be:
This should not happen with the default de-duplication logic of Alertmanager. Is it possible for you to use the default HA de-duplication logic and remove the re-labeling on the Prometheus side? I hope I am not missing something here. |
Thx @mxinden for following up on this. I guess that we can use alert_relabel_configs to drop / change the Prometheus instance label. FYI using inhibition was working fine for the last 6 months. im guessing something changed in 0.7.1. Anyway we can close this issue since we will be switching to the alert_relabel_configs way. Many thanks @mxinden |
@gfliker Oh, I don't think this is a wanted change. I hope I got time to look into this further. @brancz Can you approve my suggestion to use Alertmanagers default de-duplication instead of inhibition rules to handle duplicated Alerts due to an HA-Prometheus setup? |
This should be covered by the default way the Alertmanager works.
|
@mxinden and I just went through this in person and we think we figured out what is happening. Inhibition works not by looking at what notifications have been sent, but by which alerts are currently firing. Therefore it is not the right mechanism to perform this de-duplication. The sequence of events as @mxinden described above seem to match my suspicion. So in terms of what we suggest you do:
If this doesn't work for you, you should also see multiple alerts firing and multiple alert notifications, if this is the case we should investigate further. In that case please share your full Prometheus configuration as well (of course anonymized, etc.). Let us know how it works out, we're happy to help out @gfliker ! 🙂 |
@gfliker I am closing here. Please reopen in case you are still facing any issues. |
This is great. Thanks for the feedback. Please feel free to reach out again if you are facing any further issues. |
Fix code style check in "all" make target
Alertmanager 0.7.1, Prometheus 1.6.3.
Every alert gets sent form 2 prometheus collectors(redundant collectors), Inhibit rules are in place to make sure only 1 of those alerts will get fired.
When the alerts are fired only 1 of them get's fired (which is what we expect), but when the alert is resolved both alerts are being sent as resolved - this is wrong.
So far, we noticed that behaviour with email notifications, I guess pagerduty will filter and complain about resolved alerts that never got triggered(as it has context and state).
The text was updated successfully, but these errors were encountered: