BP Rules - take into account acknowledgements or downtimes #1825

xkilian · 2016-03-16T21:48:45Z

Case: Create a BP rule with two services.
1 services gets into warning state.
A technician acknowledges the service.

Problem : The BP_Rule still shows the state warning

BP_Rules should be able to take into account this information when computing state information.

Other systems will compute an differente state (ex. nagvis) : warning_acknowledged with a different image.

In our case, at a minimum BP_Rules should have logic to take into account other information to compute accurate state information. This should be an optional logic enhancement as I can see cases where the BP_Rule itself should be keep the existing logic and simply put the BP_Rule globally acknowledged or in maintenance. But for the majority of cases, especially when a BP_Rule contains lots of items it becomes important.

Request : Explain how to do this, if this is implemented and not documented. Implement logic in the BP_Rule also use the downtime/ack state when computing the global state.

geektophe · 2016-03-17T15:10:19Z

We could try to add an automatic acknowledgment if all the underlying problems are acknowledged.

The only tricky part is to make the difference between an automatic acknowledgment, and a real one triggered by the user. The first one should be removed if a new unacknowledged problem appears.

The same logic could be applied to downtimes.

I can have a look at this when I'll have some free time.

xkilian · 2016-04-21T21:23:15Z

This was implemented in pull request #1837 , we are currently testing the patch. It works for services. Once our testing is complete, I would suggest we merge this. Will provide feedback.

xkilian · 2016-04-22T16:47:41Z

Tested in our environment it works, though as noted in #1837 the option to treat downtimes_as_acks does not work with this feature. An analysis would need to be done to determine if this is possible.

tomasz-kuzemko · 2016-05-25T07:40:42Z

+1 for this feature

geektophe · 2016-05-26T16:32:08Z

I see two different features from what you've described.

Add an option to exclude underlying services/hosts under downtime or acknowledged from the bp_rule status calculation (this should be easy to implement)
Reflect the services/hosts downtime or acknowledgment state in the bp_rule itself.

The second feature is really trickier because we have to answer questions such as:

Which state to choose if both downtimes and acknowledgments are detected (behavior is different as a downtime expires, an acknowledgment does not).
How to manage new problems: if the business rule gets acknowledged or is under downtime, no new notifications will be sent if another service fails (this it why the current implementation only blocks notifications, and does nothing more).

I think those two questions can only be answered by an human deciding to globally acknowledge the business rule or set it under downtime. This shouldn't be the service's responsibility.

tomasz-kuzemko · 2016-05-26T21:33:18Z

I see two different features from what you've described.

For me the first feature would be enough. The second feature would be confusing and of little practical value. As you said, in case someone would like to downtime/ack the whole bp_rule he can do so manually.

Motivation here is to have a good overview of the state of a group of services. If I set downtime/acknowledge a service, I would like the bp_rule to reflect this and be able to quickly see if any other service is in a bad state.

geektophe · 2016-05-27T08:48:55Z

I have a patch to finish, but I'll have a look at that after.

fpeyre · 2016-07-14T13:23:37Z

I have made a pull request who implement the first feature (#1837)

Maybe the last thing to check /change is the name of the option business_rule_downtime_as_ack who became ambiguous (Because if people activate an option to consider ack as OK inside a BP rule with business_rule_downtime_as_ack , they could consider if a service/host in downtime is like acknowledge then it must be in OK state).

Or this option is used only on the step to determine if a notification must be sent or not (and in this case, with this parameter, the host/service in downtime has the same behaviour as if it was acknowledged)

What is the good way to correct this ambiguousity ? Open another issue and make another PR, or make the change inside this PR?

naparuba added the TOSORT label Apr 30, 2016

naparuba added this to the Not prioritized milestone Apr 30, 2016

fpeyre mentioned this issue Jan 12, 2017

BP Rules - take into account acknowledgements or downtimes Alignak-monitoring/alignak#685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BP Rules - take into account acknowledgements or downtimes #1825

BP Rules - take into account acknowledgements or downtimes #1825

xkilian commented Mar 16, 2016

geektophe commented Mar 17, 2016

xkilian commented Apr 21, 2016

xkilian commented Apr 22, 2016

tomasz-kuzemko commented May 25, 2016

geektophe commented May 26, 2016

tomasz-kuzemko commented May 26, 2016

geektophe commented May 27, 2016

fpeyre commented Jul 14, 2016

BP Rules - take into account acknowledgements or downtimes #1825

BP Rules - take into account acknowledgements or downtimes #1825

Comments

xkilian commented Mar 16, 2016

geektophe commented Mar 17, 2016

xkilian commented Apr 21, 2016

xkilian commented Apr 22, 2016

tomasz-kuzemko commented May 25, 2016

geektophe commented May 26, 2016

tomasz-kuzemko commented May 26, 2016

geektophe commented May 27, 2016

fpeyre commented Jul 14, 2016