Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BP Rules - take into account acknowledgements or downtimes #1825

Open
xkilian opened this issue Mar 16, 2016 · 8 comments
Open

BP Rules - take into account acknowledgements or downtimes #1825

xkilian opened this issue Mar 16, 2016 · 8 comments

Comments

@xkilian
Copy link

xkilian commented Mar 16, 2016

Case: Create a BP rule with two services.
1 services gets into warning state.
A technician acknowledges the service.

Problem : The BP_Rule still shows the state warning

BP_Rules should be able to take into account this information when computing state information.

Other systems will compute an differente state (ex. nagvis) : warning_acknowledged with a different image.

In our case, at a minimum BP_Rules should have logic to take into account other information to compute accurate state information. This should be an optional logic enhancement as I can see cases where the BP_Rule itself should be keep the existing logic and simply put the BP_Rule globally acknowledged or in maintenance. But for the majority of cases, especially when a BP_Rule contains lots of items it becomes important.

Request : Explain how to do this, if this is implemented and not documented. Implement logic in the BP_Rule also use the downtime/ack state when computing the global state.

@geektophe
Copy link
Collaborator

We could try to add an automatic acknowledgment if all the underlying problems are acknowledged.

The only tricky part is to make the difference between an automatic acknowledgment, and a real one triggered by the user. The first one should be removed if a new unacknowledged problem appears.

The same logic could be applied to downtimes.

I can have a look at this when I'll have some free time.

@xkilian
Copy link
Author

xkilian commented Apr 21, 2016

This was implemented in pull request #1837 , we are currently testing the patch. It works for services. Once our testing is complete, I would suggest we merge this. Will provide feedback.

@xkilian
Copy link
Author

xkilian commented Apr 22, 2016

Tested in our environment it works, though as noted in #1837 the option to treat downtimes_as_acks does not work with this feature. An analysis would need to be done to determine if this is possible.

@naparuba naparuba added this to the Not prioritized milestone Apr 30, 2016
@tomasz-kuzemko
Copy link

+1 for this feature

@geektophe
Copy link
Collaborator

I see two different features from what you've described.

  • Add an option to exclude underlying services/hosts under downtime or acknowledged from the bp_rule status calculation (this should be easy to implement)
  • Reflect the services/hosts downtime or acknowledgment state in the bp_rule itself.

The second feature is really trickier because we have to answer questions such as:

  • Which state to choose if both downtimes and acknowledgments are detected (behavior is different as a downtime expires, an acknowledgment does not).
  • How to manage new problems: if the business rule gets acknowledged or is under downtime, no new notifications will be sent if another service fails (this it why the current implementation only blocks notifications, and does nothing more).

I think those two questions can only be answered by an human deciding to globally acknowledge the business rule or set it under downtime. This shouldn't be the service's responsibility.

@tomasz-kuzemko
Copy link

I see two different features from what you've described.

For me the first feature would be enough. The second feature would be confusing and of little practical value. As you said, in case someone would like to downtime/ack the whole bp_rule he can do so manually.

Motivation here is to have a good overview of the state of a group of services. If I set downtime/acknowledge a service, I would like the bp_rule to reflect this and be able to quickly see if any other service is in a bad state.

@geektophe
Copy link
Collaborator

I have a patch to finish, but I'll have a look at that after.

@fpeyre
Copy link
Contributor

fpeyre commented Jul 14, 2016

I have made a pull request who implement the first feature (#1837)

Maybe the last thing to check /change is the name of the option business_rule_downtime_as_ack who became ambiguous (Because if people activate an option to consider ack as OK inside a BP rule with business_rule_downtime_as_ack , they could consider if a service/host in downtime is like acknowledge then it must be in OK state).

Or this option is used only on the step to determine if a notification must be sent or not (and in this case, with this parameter, the host/service in downtime has the same behaviour as if it was acknowledged)

What is the good way to correct this ambiguousity ? Open another issue and make another PR, or make the change inside this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants