Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"strict" health-checking (#427) #428

Merged
merged 1 commit into from
Feb 19, 2018
Merged

"strict" health-checking (#427) #428

merged 1 commit into from
Feb 19, 2018

Conversation

systemfreund
Copy link
Contributor

@systemfreund systemfreund commented Jan 28, 2018

This PR introduces a "strict" health-checking mode (as discussed in #427).

Summary of the changes:

  • add command-line option -registry.consul.strictHealthCheck. The default value is false in order to be backward-compatible
  • refactor passingServices (passing.go) to not only return the passing HealthCheck instances. Instead, it is now returning a HealthCheckResult which contains the HealthCheck and its result (passed/failed)
  • refactor servicesConfig in order to generate a "health-report" from the given list of HealthCheckResults, which takes the strict-option in to account
  • Added test to check the health-report logic

I'm hoping that I didn't make too many mistakes, as I usually don't write go-code :)

Fixes #427

@CLAassistant
Copy link

CLAassistant commented Jan 28, 2018

CLA assistant check
All committers have signed the CLA.

@magiconair
Copy link
Contributor

Hmm, this seems a bit more complex than necessary. I think updating the passingServices function should be sufficient. Also, instead of the boolean config flag I'd probably go for a string flag like registry.consul.checksRequired = (one|all). Do you mind if I give the passingServices function a shot?

@systemfreund
Copy link
Contributor Author

Sure, go ahead!

@magiconair
Copy link
Contributor

How about this:

func passingServices(checks []*api.HealthCheck, status []string, strict bool) []*api.HealthCheck {
	var p []*api.HealthCheck
	for _, svc := range checks {
		if !isServiceCheck(svc) {
			continue
		}
		total, passing := countChecks(svc, checks, status)
		if passing == 0 {
			continue
		}
		if strict && total != passing {
			continue
		}
		if isAgentCritical(svc, checks) {
			continue
		}
		if isNodeInMaintenance(svc, checks) {
			continue
		}
		if isServiceInMaintenance(svc, checks) {
			continue
		}
		p = append(p, svc)
	}

	return p
}

// isServiceCheck returns true if the health check is a valid service check.
func isServiceCheck(c *api.HealthCheck) bool {
	return c.ServiceID != "" &&
		c.CheckID != "serfHealth" &&
		c.CheckID != "_node_maintenance" &&
		!strings.HasPrefix("_service_maintenance:", c.CheckID)
}

// isAgentCritical returns true if the agent on the node on which the service
// runs is critical.
func isAgentCritical(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
	for _, c := range checks {
		if svc.Node == c.Node && c.CheckID == "serfHealth" && c.Status == "critical" {
			log.Printf("[DEBUG] consul: Skipping service %q since agent on node %q is down: %s", c.ServiceID, c.Node, c.Output)
			return true
		}
	}
	return false
}

// isNodeInMaintenance returns true if the node on which the service runs is in
// maintenance mode.
func isNodeInMaintenance(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
	for _, c := range checks {
		if svc.Node == c.Node && c.CheckID == "_node_maintenance" {
			log.Printf("[DEBUG] consul: Skipping service %q since node %q is in maintenance mode: %s", c.ServiceID, c.Node, c.Output)
			return true
		}
	}
	return false
}

// isServiceInMaintenance returns true if the service instance is in
// maintenance mode.
func isServiceInMaintenance(svc *api.HealthCheck, checks []*api.HealthCheck) bool {
	for _, c := range checks {
		if svc.Node == c.Node && c.CheckID == "_service_maintenance:"+svc.ServiceID && c.Status == "critical" {
			log.Printf("[DEBUG] consul: Skipping service %q since it is in maintenance mode: %s", svc.ServiceID, c.Output)
			return true
		}
	}
	return false
}

// countChecks counts the number of service checks exist for a given service
// and how many of them are passing.
func countChecks(svc *api.HealthCheck, checks []*api.HealthCheck, status []string) (total int, passing int) {
	for _, c := range checks {
		if svc.Node == c.Node && svc.ServiceID == c.ServiceID {
			total++
			if hasStatus(c, status) {
				passing++
			}
		}
	}
	return
}

// hasStatus returns true if the health check status is one of the given
// values.
func hasStatus(c *api.HealthCheck, status []string) bool {
	for _, s := range status {
		if c.Status == s {
			return true
		}
	}
	return false
}

@systemfreund
Copy link
Contributor Author

Yes, that makes much more sense!

So should I incorporate your code into this PR and also change the setting to a string-based one?

@magiconair
Copy link
Contributor

magiconair commented Jan 31, 2018 via email

@systemfreund
Copy link
Contributor Author

No problem, however I won't be able to do anything until the weekend.

@hamann
Copy link

hamann commented Feb 16, 2018

Any progress on this?

@magiconair
Copy link
Contributor

@systemfreund are you still working on this?

@systemfreund
Copy link
Contributor Author

Yes, I do and I'm going to update the PR this evening.

@systemfreund
Copy link
Contributor Author

I've updated the PR, however some unrelated test seem to fail (at least it looks unrelated)

FAIL github.com/fabiolb/fabio/cert 15.015s

@magiconair
Copy link
Contributor

I think that is OK. Can you please rebase the branch on master and update fabio.properties and the docs with the new option?

@systemfreund
Copy link
Contributor Author

done :)

@magiconair magiconair merged commit daa6826 into fabiolb:master Feb 19, 2018
@magiconair
Copy link
Contributor

Thanks @systemfreund !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants