Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add "health" command #299

Merged
merged 5 commits into from
Sep 25, 2023

Conversation

rebornplusplus
Copy link
Member

@rebornplusplus rebornplusplus commented Sep 7, 2023

This PR adds a new "health" command to Pebble. The health command queries the health of configured pebble checks. It outputs "healthy" and returns an exit code of 0 if all the specified checks are healthy. Otherwise, it outputs "unhealthy" and returns with an exit code of 1.

The "health" command accepts two one option with arguments:

  • The --level option takes an argument, either "alive" or "ready". It queries the health of only those checks with the same level. Note that, ready implies alive. Thus, if the ready checks are healthy but one of the alive checks is not, the overall output will be "unhealthy".

Usage:

pebble health [health-OPTIONS] [<check>...]

This PR includes a few tests too for the relevant changes.


This command does not add any new logic to Pebble, as it relies entirely on the already existing implementation of the Pebble health API. Nonetheless, the hereby proposed feature (and corresponding spec) identifies certain edge cases where the resulting health outcome isn’t the most intuitive or predictable. Examples:

  • pebble health --level=ready can be healthy even if

    1. there are no checks of that level
    2. checks with no level are down
  • pebble health --level=alive will be healthy even if ready checks are down


Major changes from the initial commit(s)

Commit Description
0186355 checks and health appear in a new category in pebble help.
aee03aa Dropped the --format option from health command due to recent feedbacks.

This commit adds a new "health" command to pebble. The health command
queries the health of configured pebble checks. It outputs "healthy" and
returns an exit code of 0 if all the specified checks are healthy.
Otherwise, it outputs "unhealthy" and returns with an exit code of 1.

The "health" command accepts two options with arguments:

  - The --level option takes an argument, either "alive" or "ready". It
    queries the health of only those checks with the same level.
    Note that, ready implies alive. Thus, if the ready checks are
    healthy but one of the alive checks are not, the overall output will
    be "unhealthy".

  - The --format option takes an argument, either "text" or "json". It
    controls how the output will look like. The default option is
    "text".

Usage:

  pebble health [health-OPTIONS] [<check>...]

This command does not add any new logic to pebble, as it relies entirely
on the already existing implementation of the pebble health API.

This commit includes a few tests too for the relevant changes.
@rebornplusplus
Copy link
Member Author

Hi @benhoyt, @cjdcordeiro. Will you please take a look at this PR when you are free? Cheers.

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks! Left a bunch of comments, most of them quite minor.

Copy link
Contributor

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Approved from my point of view (left two suggestions for grammar tweaks in comments).

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks quite nice, thanks!

// Health fetches healthy status of specified checks.
func (client *Client) Health(opts *HealthOptions) (*HealthInfo, error) {
query := make(url.Values)
if opts.Level != UnsetLevel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a pre-existing bug here: UnsetLevel of what? Being a constant specific to checks, this needs to be named UnsetCheckLevel or similar. This is a client, so unfortunately we cannot make changes trivially without risking breaking other people's code. We might risk this based on how recent the feature is and hope that people are not using Pebble's API directly yet, but even that needs coordination, at least with Ben. Can you please check his opinion on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this PR shoud not hold on this decision. It should be fixed independently, and fixing the one instance above can be done together with the PR that addresses the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm on board with changing that (but agreed -- in a separate PR). I think it's highly unlikely any external users are using this constant (Juju certainly isn't). And we would only have to change UnsetLevel -- AliveLevel and ReadyLevel we could consider changing too, but I think they're okay as they include the word "alive" and "ready", which gives them context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should change it to be more specific. I will add a separate PR after this gets merged.

cs.rsp = `{
"type": "sync",
"status-code": 502,
"status": "Bad Gateway",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad Gateway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /v1/health endpoint returns a "Bad Gateway" status with 502 code when unhealthy.

{"type":"sync","status-code":502,"status":"Bad Gateway","result":{"healthy":false}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we (I can't remember who exactly) debated which HTTP status code this should return at the time, and ended up with 502. It's nice a perfect fit, but we wanted it to be a 5xx code, and Pebble is kind of acting as a proxy/gateway here, so it seemed like a reasonable one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit awkward. Generally those errors mean "I can't handle your request", because of reason 5XX. The request here is "How's your health?", so in theory the response should be "Yep, the check you're asking for is not ready/alive.".

This seems analogous to asking to pull a file and getting a 404 to say the file doesn't exist. That's a bad practice as we're mixing the success of the request with the result of the operation requested. We would get a similar 404 if we were talking to a different server altogether, for example.

Maybe this was done to satisfy Kubernetes, when poking at health URLs using its internal probing?

Includes:
- Drop --format option to health command
- Drop client.HealthInfo type
- client.Health function signature changes
- Changes in comments
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants