-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"strict" health-checking (#427) #428
Conversation
Hmm, this seems a bit more complex than necessary. I think updating the |
Sure, go ahead! |
How about this: func passingServices(checks []*api.HealthCheck, status []string, strict bool) []*api.HealthCheck {
var p []*api.HealthCheck
for _, svc := range checks {
if !isServiceCheck(svc) {
continue
}
total, passing := countChecks(svc, checks, status)
if passing == 0 {
continue
}
if strict && total != passing {
continue
}
if isAgentCritical(svc, checks) {
continue
}
if isNodeInMaintenance(svc, checks) {
continue
}
if isServiceInMaintenance(svc, checks) {
continue
}
p = append(p, svc)
}
return p
}
// isServiceCheck returns true if the health check is a valid service check.
func isServiceCheck(c *api.HealthCheck) bool {
return c.ServiceID != "" &&
c.CheckID != "serfHealth" &&
c.CheckID != "_node_maintenance" &&
!strings.HasPrefix("_service_maintenance:", c.CheckID)
}
// isAgentCritical returns true if the agent on the node on which the service
// runs is critical.
func isAgentCritical(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
for _, c := range checks {
if svc.Node == c.Node && c.CheckID == "serfHealth" && c.Status == "critical" {
log.Printf("[DEBUG] consul: Skipping service %q since agent on node %q is down: %s", c.ServiceID, c.Node, c.Output)
return true
}
}
return false
}
// isNodeInMaintenance returns true if the node on which the service runs is in
// maintenance mode.
func isNodeInMaintenance(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
for _, c := range checks {
if svc.Node == c.Node && c.CheckID == "_node_maintenance" {
log.Printf("[DEBUG] consul: Skipping service %q since node %q is in maintenance mode: %s", c.ServiceID, c.Node, c.Output)
return true
}
}
return false
}
// isServiceInMaintenance returns true if the service instance is in
// maintenance mode.
func isServiceInMaintenance(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
for _, c := range checks {
if svc.Node == c.Node && c.CheckID == "_service_maintenance:"+svc.ServiceID && c.Status == "critical" {
log.Printf("[DEBUG] consul: Skipping service %q since it is in maintenance mode: %s", svc.ServiceID, c.Output)
return true
}
}
return false
}
// countChecks counts the number of service checks exist for a given service
// and how many of them are passing.
func countChecks(svc *api.HealthCheck, checks []*api.HealthCheck, status []string) (total int, passing int) {
for _, c := range checks {
if svc.Node == c.Node && svc.ServiceID == c.ServiceID {
total++
if hasStatus(c, status) {
passing++
}
}
}
return
}
// hasStatus returns true if the health check status is one of the given
// values.
func hasStatus(c *api.HealthCheck, status []string) bool {
for _, s := range status {
if c.Status == s {
return true
}
}
return false
} |
Yes, that makes much more sense! So should I incorporate your code into this PR and also change the setting to a string-based one? |
That would be great. It would be good as well if you could extend the existing test for the passing function. It could use some love in terms of naming the cases. If you want I can provide an example of what I mean.
—
Frank Schröder
… On 31. Jan 2018, at 09:38, systemfreund ***@***.***> wrote:
Yes, that makes much more sense!
So should I incorporate your code into this PR and also change the setting to a string-based one?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
No problem, however I won't be able to do anything until the weekend. |
Any progress on this? |
@systemfreund are you still working on this? |
Yes, I do and I'm going to update the PR this evening. |
I've updated the PR, however some unrelated test seem to fail (at least it looks unrelated)
|
I think that is OK. Can you please rebase the branch on master and update |
done :) |
Thanks @systemfreund ! |
This PR introduces a "strict" health-checking mode (as discussed in #427).
Summary of the changes:
-registry.consul.strictHealthCheck
. The default value isfalse
in order to be backward-compatiblepassingServices
(passing.go
) to not only return the passingHealthCheck
instances. Instead, it is now returning aHealthCheckResult
which contains theHealthCheck
and its result (passed/failed)servicesConfig
in order to generate a "health-report" from the given list ofHealthCheckResults
, which takes the strict-option in to accountI'm hoping that I didn't make too many mistakes, as I usually don't write go-code :)
Fixes #427