Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom health checks API and "pebble checks" CLI command #3

Closed
wants to merge 25 commits into from

Conversation

benhoyt
Copy link
Owner

@benhoyt benhoyt commented Nov 25, 2021

Temporary PR to allow review of just the API and CLI parts of custom health checks. The real PR is canonical#86

Conflicts:
- cmd/pebble/cmd_help.go
- internal/daemon/api.go
Conflicts resolved:
- internal/overlord/checkstate/manager.go
- internal/overlord/checkstate/manager_test.go
- internal/overlord/servstate/handlers.go
…ling (canonical#85)

This implements custom health checks and the `on-check-failure` logic
per the [spec]
(https://docs.google.com/document/d/1d6-h3UAt2VPUSvlkVF30l8iuDW8raRNkHbp5M6NUo1A/edit).

Basically, you specify the list of custom health checks in a new
top-level configuration object named `checks` (see the `README.md`
documentation). They can be HTTP, TCP, or exec checks. Then in your
service configuration you specify any action you want to perform
`on-check-failure` for the various checks.

Here's an example configuration with a single HTTP-based check, and a
single service that will be restarted if the check fails (twice in a
row):

services:
    web-server:
        command: python3 webapp.py --port=8080
        on-check-failure:
            server-up: restart

checks:
    server-up:
        period: 5s
        failures: 2
        http:
            url: http://localhost:8080/is-up

It's a good amount of code, though at least half of it is tests. The
CLI and APIs are coming in
canonical#86.

I'm pretty happy with how it turned out, particularly the checkers and
check manager. However, the wiring up of the check manager and the
service manager is a little odd, because they are mutually dependent
on one another: we have to tell the check manager about plan updates
(from the service manager), and we have to tell the service manager
about check failures (from the check manager). Open to suggestions on
that or anything else.
@benhoyt
Copy link
Owner Author

benhoyt commented Jan 28, 2022

Closing now that canonical#85 is merged. See canonical#86

@benhoyt benhoyt closed this Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant