Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: how to run services reliably and update service autorestart to service lifecycle. #541

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/how-to/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ Installation follows a similar pattern on all architectures. You can choose to i
Install Pebble <install-pebble>
```


## Service orchestration

As your needs grow, you may want to orchestrate multiple services.
As your needs grow, you may want to use advanced Pebble features to run services reliably and orchestrate multiple services.

```{toctree}
:titlesonly:
:maxdepth: 1

Run services reliably <run-services-reliably>
Manage service dependencies <service-dependencies>
```

Expand Down
74 changes: 74 additions & 0 deletions docs/how-to/run-services-reliably.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# How to run services reliably

Microservice architectures offer flexibility, but they can introduce reliability challenges such as network interruptions, resource exhaustion, problems with dependent services, cascading failures, and deployment issues. Health checks can address these issues by monitoring resource usage, checking the availability of dependencies, catching problems of new deployments, and preventing downtime by redirecting traffic away from failing services.
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

To help you manage services more reliably, Pebble provides a comprehensive health check feature.
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

## Use health checks of the HTTP type
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

A health check of the HTTP type issues HTTP `GET` requests to the health check URL at a user-specified interval.
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

The health check is considered successful if the check returns an HTTP 200 response. After getting a certain number of failures in a row, the health check is considered "down" (or unhealthy).
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

### Configure HTTP-type health checks
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

For example, we can configure a health check of HTTP type named `svc1-up` that checks the endpoint `http://127.0.0.1:5000/health`:
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

```yaml
checks:
svc1-up:
override: replace
period: 10s
timeout: 3s
threshold: 3
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved
http:
url: http://127.0.0.1:5000/health
```

The configuration above contains three key options that we can tweak for each health check:

- `period`: How often to run the check (defaults to 10 seconds).
- `timeout`: If the check hasn't responded before the timeout (defaults to 3 seconds), consider the check an error.
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved
- `threshold`: After how many consecutive errors (defaults to 3) is the check considered "down".
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

Given the default values, a minimum check looks like the following:
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

```yaml
checks:
svc1-up:
override: replace
http:
url: http://127.0.0.1:5000/health
```

Besides the HTTP type, there are two more health check types in Pebble: `tcp`, which opens the given TCP port, and `exec`, which executes a user-specified command. For more information, see [Health checks](../reference/health-checks) and [Layer specification](../reference/layer-specification).
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

### Restart the service when the health check fails
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

To automatically restart services when a health check fails, use `on-check-failure` in the service configuration.

To restart `svc1` when the health check named `svc1-up` fails, use the following configuration:

```yaml
services:
svc1:
override: replace
command: python3 /home/ubuntu/work/health-check-sample-service/main.py
startup: enabled
on-check-failure:
svc1-up: restart
```

## Limitations of health checks

Although health checks are useful, they are not a complete solution for reliability:

- Health checks can detect issues such as a failed database connection due to network issues, but they can't fix the network issue itself.
- Health checks also can't replace testing and monitoring.
- Health checks shouldn't be used for scheduling tasks like backups.
IronCore864 marked this conversation as resolved.
Show resolved Hide resolved

## See more

- [Health checks](../reference/health-checks)
- [Layer specification](../reference/layer-specification)
- [Service lifecycle](../reference/service-lifecycle)
4 changes: 2 additions & 2 deletions docs/reference/cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -950,7 +950,7 @@ The "Current" column shows the current status of the service, and can be one of

* `active`: starting or running
* `inactive`: not yet started, being stopped, or stopped
* `backoff`: in a [backoff-restart loop](service-auto-restart.md)
* `backoff`: in a [backoff-restart loop](service-lifecycle.md)
* `error`: in an error state


Expand Down Expand Up @@ -996,7 +996,7 @@ any other services it depends on, in the correct order.
### How it works

- If the command is still running at the end of the 1 second window, the start is considered successful.
- If the command exits within the 1 second window, Pebble retries the command after a configurable backoff, using the restart logic described in [](service-auto-restart.md). If one of the started services exits within the 1 second window, `pebble start` prints an appropriate error message and exits with an error.
- If the command exits within the 1 second window, Pebble retries the command after a configurable backoff, using the restart logic described in [Service lifecycle](service-lifecycle.md). If one of the started services exits within the 1 second window, `pebble start` prints an appropriate error message and exits with an error.

### Examples

Expand Down
4 changes: 2 additions & 2 deletions docs/reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Layers <layers>
Layer specification <layer-specification>
Log forwarding <log-forwarding>
Notices <notices>
Service auto-restart <service-auto-restart>
Service lifecycle <service-lifecycle>
```


Expand All @@ -46,7 +46,7 @@ The `pebble` command has several subcommands.

Pebble provides two ways to automatically restart services when they fail. Auto-restart is based on exit codes from services. Health checks are a more sophisticated way to test and report the availability of services.

* [Service auto-restart](service-auto-restart)
* [Service lifecycle](service-lifecycle)
* [Health checks](health-checks)


Expand Down
15 changes: 0 additions & 15 deletions docs/reference/service-auto-restart.md

This file was deleted.

58 changes: 58 additions & 0 deletions docs/reference/service-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Service lifecycle

Pebble manages the lifecycle of a service, including starting, stopping, and restarting it. Pebble also handles health checks, failures, and auto-restart with backoff. This is all achieved using a state machine with the following states:

- initial: The service's initial state.
- starting: The service is in the process of starting.
- running: The `okayDelay` (see below) period has passed, and the service runs normally.
- terminating: The service is being gracefully terminated.
- killing: The service is being forcibly killed.
- stopped: The service has stopped.
- backoff: The service will be put in the backoff state before the next start attempt if the service is configured to restart when it exits.
- exited: The service has exited (and won't be automatically restarted).

## Service start

A service begins in an "initial" state. Pebble tries to start the service's underlying process and transitions the service to the "starting" state.

## Start confirmation

Pebble waits for a short period (`okayDelay`, defaults to one second) after starting the service. If the service runs without exiting after the `okayDelay` period, it's considered successfully started, and the service's state is transitioned into "running".

No matter if the service is in the "starting" or "running" state, if you get the service, the status will be shown as "active". Read more in the [`pebble services`](#reference_pebble_services_command) command.

## Start failure

If the service exits quickly, an error along with the last logs are added to the task (see more in [Changes and tasks](/reference/changes-and-tasks.md)). This also ensures logs are accessible.

## Abort start

If the user interrupts the start process (e.g., with a SIGKILL), the service transitions to stopped, and a SIGKILL signal is sent to the underlying process.

## Auto-restart

By default, Pebble's service manager automatically restarts services that exit unexpectedly, regardless of whether the service is in the "starting" state (the `okayDelay` period has not passed) or in the "running" state (`okayDelay` has passed, and the service is considered to be "running").

Pebble considers a service to have exited unexpectedly if the exit code is non-zero.

You can fine-tune the auto-restart behaviour using the `on-success` and `on-failure` fields in a configuration layer. The possible values for these fields are:

* `restart`: restart the service and enter a restart-backoff loop (the default behaviour).
* `shutdown`: shut down and exit the Pebble daemon (with exit code 0 if the service exits successfully, exit code 10 otherwise)
- `success-shutdown`: shut down with exit code 0 (valid only for `on-failure`)
- `failure-shutdown`: shut down with exit code 10 (valid only for `on-success`)
* `ignore`: ignore the service exiting and do nothing further

## Backoff

Pebble implements a backoff mechanism that increases the delay before restarting the service after each failed attempt. This prevents a failing service from consuming excessive resources.

The `backoff-delay` defaults to half a second, the `backoff-factor` defaults to 2.0 (doubling), and the increasing delay is capped at `backoff-limit`, which defaults to 30 seconds. All of the three configurations can be customized, read more in [Layer specification](../reference/layer-specification).

With default settings for the above configuration, in `restart` mode, the first time a service exits, Pebble waits for half a second. If the service exits again, Pebble calculates the next backoff delay by multiplying the current delay by `backoff-factor`, which results in a 1-second delay. The next delay will be 2 seconds, then 4 seconds, and so on, capped at 30 seconds.

The `backoff-limit` value is also used as a "backoff reset" time. If the service stays running after a restart for `backoff-limit` seconds, the backoff process is reset and the delay reverts to `backoff-delay`.

## Auto-restart on health check failures

Pebble can be configured to automatically restart services based on health checks. To do so, use `on-check-failure` in the service configuration. Read more in [Health checks](health-checks).
Loading