Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated resolved alerts after 'repeat_interval' #2474

Open
noevhlev opened this issue Feb 2, 2021 · 2 comments
Open

Repeated resolved alerts after 'repeat_interval' #2474

noevhlev opened this issue Feb 2, 2021 · 2 comments

Comments

@noevhlev
Copy link

noevhlev commented Feb 2, 2021

After updating alertmanager from version 0.19 to 0.20, I had a problem. When I group alerts, the resolved alerts are not disappear anymore in the case that there are already firing alerts in the group. When upgrading to version 0.21, the problem is still reproducible.

It seems this problem appeared after PR#2040.
In the annotation to the PR, it was written that the aggregation group is responsible for removing the resolved alerts. However, this does not happen - perhaps I've missed something.

What did you expect to see?(That's how it worked before):

  • I get 3 firing alerts
  • After sometime I get 2 resolved alerts - So there are 1 firing alerts and 2 resolved alerts
  • After repeat_interval, there are 1 firing alerts

What did you see instead?

  • I get 3 firing alerts
  • After sometime I get 2 resolved alerts - So there are 1 firing alerts and 2 resolved alerts
  • After repeat_interval I get 1 firing alerts and 2 resolved alerts. Nothing changes.

Environment

  • System information:

    Linux 4.15.0-115-generic x86_64

  • Alertmanager version:

    alertmanager, version 0.20.0 (branch: HEAD, revision: f74be04)
    build user: root@00c3106655f8
    build date: 20191211-14:13:14
    go version: go1.13.5

  • Prometheus version:

    prometheus, version 2.18.1 (branch: HEAD, revision: ecee9c8abfd118f139014cb1b174b08db3f342cf)
    build user: root@2117a9e64a7e
    build date: 20200507-16:51:47
    go version: go1.14.2

  • Alertmanager configuration file:

global:
  resolve_timeout: 3m
...

route:
  group_by:
  - environment
  - alertname
  - severity
  group_wait: 20s
  group_interval: 80s
  repeat_interval: 3m
...
  • Logs:
Feb 02 02:49:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:49:44.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[1d2cb32][active] 
Feb 02 02:49:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:49:44.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[281d27a][active] 
Feb 02 02:49:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:49:44.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[4f37afb][active] 
Feb 02 02:49:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:49:44.365Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{severity=\"warning\"}/{environment=\"testing\"}:{alertname=\"DiskReplace\", environment=\"testing\", severity=\"warning\"}" msg=flushing alerts="[DiskReplace[4f37afb][active] DiskReplace[281d27a][active] DiskReplace[1d2cb32][active]]"
Feb 02 02:49:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:49:44.530Z caller=notify.go:685 component=dispatcher receiver=alerts-testing integration=webhook[0] msg="Notify success" attempts=1
Feb 02 02:50:29 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:50:29.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[281d27a][resolved]Feb 02 02:50:29 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:50:29.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[4f37afb][resolved]
Feb 02 02:50:59 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:50:59.364Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[1d2cb32][active] 
Feb 02 02:51:04 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:51:04.365Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{severity=\"warning\"}/{environment=\"testing\"}:{alertname=\"DiskReplace\", environment=\"testing\", severity=\"warning\"}" msg=flushing alerts="[DiskReplace[4f37afb][resolved] DiskReplace[281d27a][resolved] DiskReplace[1d2cb32][active]]"
Feb 02 02:51:04 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:51:04.538Z caller=notify.go:685 component=dispatcher receiver=alerts-testing integration=webhook[0] msg="Notify success" attempts=1
Feb 02 02:51:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:51:44.366Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[281d27a][resolved]
Feb 02 02:51:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:51:44.366Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[4f37afb][resolved]Feb 02 02:52:14 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:52:14.366Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[1d2cb32][active] 
Feb 02 02:52:24 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:52:24.365Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{severity=\"warning\"}/{environment=\"testing\"}:{alertname=\"DiskReplace\", environment=\"testing\", severity=\"warning\"}" msg=flushing alerts="[DiskReplace[4f37afb][resolved] DiskReplace[281d27a][resolved] DiskReplace[1d2cb32][active]]"
Feb 02 02:52:59 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:52:59.367Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[281d27a][resolved]Feb 02 02:52:59 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:52:59.367Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[4f37afb][resolved]
Feb 02 02:53:29 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:53:29.367Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[1d2cb32][active] 
Feb 02 02:53:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:53:44.365Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{severity=\"warning\"}/{environment=\"testing\"}:{alertname=\"DiskReplace\", environment=\"testing\", severity=\"warning\"}" msg=flushing alerts="[DiskReplace[4f37afb][resolved] DiskReplace[281d27a][resolved] DiskReplace[1d2cb32][active]]"
Feb 02 02:54:14 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:54:14.369Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[281d27a][resolved]
Feb 02 02:54:14 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:54:14.369Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[4f37afb][resolved]
Feb 02 02:54:44 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:54:44.370Z caller=dispatch.go:138 component=dispatcher msg="Received alert" alert=DiskReplace[1d2cb32][active] Feb 02 02:55:04 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:55:04.365Z caller=dispatch.go:473 component=dispatcher aggrGroup="{}/{severity=\"warning\"}/{environment=\"testing\"}:{alertname=\"DiskReplace\", environment=\"testing\", severity=\"warning\"}" msg=flushing alerts="[DiskReplace[4f37afb][resolved] DiskReplace[281d27a][resolved] DiskReplace[1d2cb32][active]]"
Feb 02 02:55:04 pupa alertmanager[18119]: level=debug ts=2021-02-01T23:55:04.549Z caller=notify.go:685 component=dispatcher receiver=alerts-testing integration=webhook[0] msg="Notify success" attempts=1
@axlev
Copy link

axlev commented Jul 19, 2021

Any updates on this ? @noevhlev how did you finally manage, Vladimir ?

@noevhlev
Copy link
Author

noevhlev commented Aug 16, 2021

We have updated the alertmanager to version 0.22.2. Unfortunately the problem is still reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants