Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish node counts grouped by service and health-check status #607

Closed
petemounce opened this issue Jan 16, 2015 · 9 comments
Closed

Publish node counts grouped by service and health-check status #607

petemounce opened this issue Jan 16, 2015 · 9 comments
Labels
theme/api Relating to the HTTP API interface type/enhancement Proposed improvement or new feature
Milestone

Comments

@petemounce
Copy link

We use statsd, and were overjoyed to see that you support that OOTB.

I had a look, and couldn't find if consul (0.4.1) publishes counts of connected agents? I'd love to be able to see:

  • a count of healthy/warning/critical instances per service
  • a count of present/failed/left instances per service

(where service has its name taken from the "name" element, and then sanitised to not contain any . characters)

We terminate and recycle instances in our auto scaling groups often enough that most instances don't last more than a day. Being able to see current instance count would be fantastic for us.

@armon
Copy link
Member

armon commented Jan 20, 2015

@petemounce We don't have this information "easily" available. It would require doing an internal query and aggregation every interval to generate this. It may be simpler to write a small service that consumes the API to forward that information to statsd.

@armon armon closed this as completed Jan 20, 2015
@petemounce
Copy link
Author

So something that is maybe

  • watch type=service that reacts to join & leave
    • does this return the name of the service?
  • read /v1/catalog/service/<service> for the service-name
  • netcat consul.services.<service>.nodes:<count>|g to set the node count for that service as a gauge

?

@armon armon reopened this Jan 22, 2015
@armon
Copy link
Member

armon commented Jan 22, 2015

@petemounce You may be able to do it more easily with a blocking query against /v1/health/state/any endpoint. That way you can aggregate all the healthy/warning/critical instances in a single API call.

@petemounce
Copy link
Author

"with a blocking query" - do you mean "Blocking Queries" on https://consul.io/docs/agent/http.html ? Is there an example you could link me to of how I might do this?

@ryanuber
Copy link
Member

@petemounce yes, a "blocking query" in Consul refers to making a specific HTTP request that will wait until some change is made. You can read more about how to perform a blocking query here under the "Blocking Queries" section (right up near the top).

@ryanuber
Copy link
Member

@petemounce Here is an example.

First, make a query to the health endpoint and note the X-Consul-Index header:

» curl -i "localhost:8500/v1/health/state/any?pretty"
HTTP/1.1 200 OK
Content-Type: application/json
X-Consul-Index: 10
X-Consul-Knownleader: true
X-Consul-Lastcontact: 0
Date: Fri, 23 Jan 2015 18:04:41 GMT
Content-Length: 1281

[
...
    {
        "Node": "ryanuber-mbp.local",
        "CheckID": "service:redis:1",
        "Name": "Service 'redis' check",
        "Status": "passing",
        "Notes": "",
        "Output": "",
        "ServiceID": "redis",
        "ServiceName": "redis"
    },
...

Make the same query, passing in the index and an (optional) time to wait.

» curl -i "localhost:8500/v1/health/state/any?pretty&index=10&wait=1m"

The above query will "block" until the status of a health check changes, at which point a response very similar to the one above is received with the new list of services and a new X-Consul-Index, which you could use for further blocking queries.

Hope that helps!

@JesperTerkelsen
Copy link

I guess a more simple overview API or a simple command in the command line tool would be preferable ? So people can plug the aggregated status checks into nagios or datadog ?

@doublerebel
Copy link

Related: #356 /v1/health/nodes endpoint, #1164 Health service nodes filter by states

@slackpad slackpad added type/enhancement Proposed improvement or new feature post-0.9 labels May 2, 2017
@slackpad slackpad added the theme/api Relating to the HTTP API interface label May 25, 2017
@slackpad slackpad added this to the Unplanned milestone Jan 5, 2018
@slackpad slackpad removed the post-0.9 label Jan 5, 2018
@hanshasselberg
Copy link
Member

Thank you for reporting and participating. Since there are ways to accomplish what you asked and given that it is not simple to build in consul, I am going to close this issue.

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
Previously, we were checking that the federation is successful
by only looking for the number of WAN consul members. However, sometimes
those members could be unhealthy/not alive, which will cause the test to fail.
This change improves federation verification and checks that all members are
healthy from the perspective of both servers. It also checks that the ACL
replication is running in case ACLs are used.
duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

8 participants