From 1b64d2f8b31ab47a4c5428639a326e708b3752d2 Mon Sep 17 00:00:00 2001 From: trujillo-adam Date: Fri, 27 Jan 2023 11:46:02 -0800 Subject: [PATCH] converted health checks everything page to Define Health Checks usage page --- .../content/docs/services/usage/checks.mdx | 819 ++++++------------ 1 file changed, 263 insertions(+), 556 deletions(-) diff --git a/website/content/docs/services/usage/checks.mdx b/website/content/docs/services/usage/checks.mdx index bf5fece42906..3e60d2a77b48 100644 --- a/website/content/docs/services/usage/checks.mdx +++ b/website/content/docs/services/usage/checks.mdx @@ -1,186 +1,195 @@ --- layout: docs -page_title: Configure Health Checks -description: >- - Agents can be configured to periodically perform custom checks on the health of a service instance or node. Learn about the types of health checks and how to define them in agent and service configuration files. +page_title: Define Health Checks +description: -> + Learn how to configure different types of health checks for services you register with Consul. --- -# Health Checks +# Define Health Checks +This topic describes how to create different types of health checks for your services. -One of the primary roles of the agent is management of system-level and application-level health -checks. A health check is considered to be application-level if it is associated with a -service. If not associated with a service, the check monitors the health of the entire node. -Review the [health checks tutorial](/consul/tutorials/developer-discovery/service-registration-health-checks) -to get a more complete example on how to leverage health check capabilities in Consul. +## Overview +Health checks are configurations that verifies the health of a service or node. Health checks configurations are nested in the `service` block. Refer to [Define Services](/consul/docs/discovery/usage/define-services) for information about specifying other service parameters. -A check is defined in a configuration file or added at runtime over the HTTP interface. Checks -created via the HTTP interface persist with that node. +You can define individual health checks for your service in separate `check` blocks or define multiple checks in a `checks` block. Refer to [Define multiple checks](#define-multiple-checks) for additional information. -There are severeal types of checks: +You can create several different kinds of checks: -- [`Script + Interval`](#script-check) - These checks invoke an external application - that performs the health check. - -- [`HTTP + Interval`](#http-check) - These checks make an HTTP `GET` request to the specified URL - in the health check definition. - -- [`TCP + Interval`](#tcp-check) - These checks attempt a TCP connection to the specified - address and port in the health check definition. - -- [`UDP + Interval`](#udp-check) - These checks direct the client to periodically send UDP datagrams - to the specified address and port in the health check definition. - -- [`OSService + Interval`](#osservice-check) - These checks periodically direct the Consul agent to monitor - the health of a service running on the host operating system. - -- [`Time to Live (TTL)`](#time-to-live-ttl-check) - These checks attempt an HTTP connection after a given TTL elapses. - -- [`Docker + Interval`](#docker-check) - These checks invoke an external application that - is packaged within a Docker container. - -- [`gRPC + Interval`](#grpc-check) - These checks are intended for applications that support the standard - [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). - -- [`H2ping + Interval`](#h2ping-check) - These checks test an endpoint that uses HTTP/2 - by connecting to the endpoint and sending a ping frame. - -- [`Alias`](#alias-check) - These checks alias the health state of another registered - node or service. +- _Script_ checks invoke an external application that performs the health check, exits with an appropriate exit code, and potentially generates output. Script checks are one of the most common types of checks. +- _HTTP_ checks make an HTTP GET request to the specified URL and wait for the specified amount of time. HTTP checks are one of the most common types of checks. +- _TCP_ checks attempt to connect to an IP or hostname and port over TCP and wait for the specified amount of time. +- _UDP_ checks send UDP datagrams to the specified IP or hostname and port and wait for the specified amount of time. +- _Time-to-live (TTL)_ checks are passive checks that await updates from the service. If the check does not receive a status update before the specified duration, the health check enters a `critical`state. +- _Docker_ checks are dependent on external applications packaged with a Docker container that are triggered by calls to the Docker `exec` API endpoint. +- _gRPC_ checks probe applications that support the standard gRPC health checking protocol. +- _H2ping_ checks test an endpoint that uses http2. The check connects to the endpoint and sends a ping frame. +- _Alias_ checks represent the health state of another registered node or service. +If your network runs in a Kubernetes environment, you can sync service health information with Kubernetes health checks. Refer to [Configure Health Checks for Consul on Kubernetes](/consul/docs/k8s/connect/health) for details. -## Registering a health check +### Registration -There are three ways to register a service with health checks: +After defining health checks, you must register the service containing the checks with Consul. Refer to [Register Services and Health Checks]() for additional information. If the service is already registered, you can reload the service configuration file to implement your health check. Refer to [Reload]() -1. Start or reload a Consul agent with a service definition file in the - [agent's configuration directory](/consul/docs/agent#configuring-consul-agents). -1. Call the - [`/agent/service/register`](/consul/api-docs/agent/service#register-service) - HTTP API endpoint to register the service. -1. Use the - [`consul services register`](/consul/commands/services/register) - CLI command to register the service. +## Define multiple checks -When a service is registered using the HTTP API endpoint or CLI command, -the checks persist in the Consul data folder across Consul agent restarts. +You can define multiple checks for a service in a single `checks` block. The `checks` block contains an array of objects. The objects contain the configuration for each health check you want to implement. The following example includes two script checks named `mem` and `cpu` and an HTTP check that calls the `/health` API endpoint. -## Types of checks + -This section describes the available types of health checks you can use to -automatically monitor the health of a service instance or node. - --> **To manually mark a service unhealthy:** Use the maintenance mode - [CLI command](/consul/commands/maint) or - [HTTP API endpoint](/consul/api-docs/agent#enable-maintenance-mode) - to temporarily remove one or all service instances on a node - from service discovery DNS and HTTP API query results. - -### Script check - -Script checks periodically invoke an external application that performs the health check, -exits with an appropriate exit code, and potentially generates some output. -The specified `interval` determines the time between check invocations. -The output of a script check is limited to 4KB. -Larger outputs are truncated. - -By default, script checks are configured with a timeout equal to 30 seconds. -To configure a custom script check timeout value, -specify the `timeout` field in the check definition. -After reaching the timeout on a Windows system, -Consul waits for any child processes spawned by the script to finish. -After reaching the timeout on other systems, -Consul attempts to force-kill the script and any child processes it spawned. - -Script checks are not enabled by default. -To enable a Consul agent to perform script checks, -use one of the following agent configuration options: +```hcl +checks = [ + { + id = "chk1" + name = "mem" + args = ["/bin/check_mem", "-limit", "256MB"] + interval = "5s" + }, + { + id = "chk2" + name = "/health" + http = "http://localhost:5000/health" + interval = "15s" + }, + { + id = "chk3" + name = "cpu" + args = ["/bin/check_cpu"] + interval = "10s" + }, + ... +] +``` -- [`enable_local_script_checks`](/consul/docs/agent/config/cli-flags#_enable_local_script_checks): - Enable script checks defined in local config files. - Script checks registered using the HTTP API are not allowed. -- [`enable_script_checks`](/consul/docs/agent/config/cli-flags#_enable_script_checks): - Enable script checks no matter how they are registered. +```json +{ + "checks": [ + { + "id": "chk1", + "name": "mem", + "args": ["/bin/check_mem", "-limit", "256MB"], + "interval": "5s" + }, + { + "id": "chk2", + "name": "/health", + "http": "http://localhost:5000/health", + "interval": "15s" + }, + { + "id": "chk3", + "name": "cpu", + "args": ["/bin/check_cpu"], + "interval": "10s" + }, + ... + ] +} +``` - ~> **Security Warning:** - Enabling non-local script checks in some configurations may introduce - a remote execution vulnerability known to be targeted by malware. - We strongly recommend `enable_local_script_checks` instead. - For more information, refer to - [this blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations). + -The following service definition file snippet is an example -of a script check definition: +## Define initial health check status +When checks are registered against a Consul agent, they are assigned a `critical` status by default. This prevents services from registering as `passing` and entering the service pool before their health is verified. You can add the `status` parameter to the check definition to specify the initial state. In the following example, the check registers in a `passing` state: - + ```hcl check = { - id = "mem-util" - name = "Memory utilization" - args = ["/usr/local/bin/check_mem.py", "-limit", "256MB"] + id = "mem" + args = ["/bin/check_mem", "-limit", "256MB"] interval = "10s" - timeout = "1s" + status = "passing" } ``` ```json { - "check": { - "id": "mem-util", - "name": "Memory utilization", - "args": ["/usr/local/bin/check_mem.py", "-limit", "256MB"], - "interval": "10s", - "timeout": "1s" - } + "check": [ + { + "args": [ + "/bin/check_mem", + "-limit", + "256MB" + ], + "id": "mem", + "interval": "10s", + "status": "passing" + } + ] } ``` -#### Check script conventions - -A check script's exit code is used to determine the health check status: +## Script checks +Script checks invoke an external application that performs the health check, exits with an appropriate exit code, and potentially generates output data. The output of a script check is limited to 4KB. Outputs that exceed the limit are truncated. + +Script checks timeout after 30 seconds by default, but you can configure a custom script check timeout value by specifying the `timeout` field in the check definition. When the timeout is reached on Windows, Consul waits for any child processes spawned by the script to finish. For any other system, Consul attempts to force-kill the script and any child processes it has spawned once the timeout has passed. + +### Script check configuration +To enable script checks, you must first enable the agent to send external requests, then configure the health check settings in the service definition: + +1. Add one of the following configurations to your agent configuration file to enable a script check: + - [`enable_local_script_checks`](/consul/docs/agent/config/cli-flags#_enable_local_script_checks): Enable script checks defined in local configuration files. Script checks registered using the HTTP API are not allowed. + - [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): Enable script checks no matter how they are registered. + + !> **Security warning:** Enabling non-local script checks in some configurations may introduce a known remote execution vulnerability targeted by malware. We strongly recommend `enable_local_script_checks` instead. + +1. Specify the script to run in the `args` of the `check` block in your service configuration file. In the following example, a check named `Memory utilization` invokes the `check_mem.py` script every 10 seconds and times out if a response takes longer than one second: + + + + ```hcl + service { + ## ... + check = { + id = "mem-util" + name = "Memory utilization" + args = ["/usr/local/bin/check_mem.py", "-limit", "256MB"] + interval = "10s" + timeout = "1s" + } + } + ``` + + ```json + { + "service": [ + { + "check": { + "id": "mem-util", + "name": "Memory utilization", + "args": ["/usr/local/bin/check_mem.py", "-limit", "256MB"], + "interval": "10s", + "timeout": "1s" + } + } ] + } + ``` + + +Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration) for information about all health check configurations. + +### Script check exit codes +The following exit codes returned by the script check determine the health check status: - Exit code 0 - Check is passing - Exit code 1 - Check is warning - Any other code - Check is failing -Any output of the script is captured and made available in the -`Output` field of checks included in HTTP API responses, -as in this example from the [local service health endpoint](/consul/api-docs/agent/service#by-name-json). - -### HTTP check - -HTTP checks periodically make an HTTP `GET` request to the specified URL, -waiting the specified `interval` amount of time between requests. -The status of the service depends on the HTTP response code: any `2xx` code is -considered passing, a `429 Too ManyRequests` is a warning, and anything else is -a failure. This type of check -should be preferred over a script that uses `curl` or another external process -to check a simple HTTP operation. By default, HTTP checks are `GET` requests -unless the `method` field specifies a different method. Additional request -headers can be set through the `header` field which is a map of lists of -strings, such as `{"x-foo": ["bar", "baz"]}`. - -By default, HTTP checks are configured with a request timeout equal to 10 seconds. -To configure a custom HTTP check timeout value, -specify the `timeout` field in the check definition. -The output of an HTTP check is limited to approximately 4KB. -Larger outputs are truncated. -HTTP checks also support TLS. By default, a valid TLS certificate is expected. -Certificate verification can be turned off by setting the `tls_skip_verify` -field to `true` in the check definition. When using TLS, the SNI is implicitly -determined from the URL if it uses a hostname instead of an IP address. -You can explicitly set the SNI value by setting `tls_server_name`. - -Consul follows HTTP redirects by default. -To disable redirects, set the `disable_redirects` field to `true`. - -The following service definition file snippet is an example -of an HTTP check definition: - - +Any output of the script is captured and made available in the `Output` field of checks included in HTTP API responses. Refer to the example described in the [local service health endpoint](/consul/api-docs/agent/service#by-name-json). + +## HTTP checks +_HTTP_ checks send an HTTP request to the specified URL and report the service health based on the [HTTP response code](#http-check-response-codes). We recommend using HTTP checks over [script checks](#script-checks) that use cURL or another external process to check an HTTP operation. + +### HTTP check configuration +Add an `http` field to the `check` block in your service definition file and specify the HTTP address, including port number, for the check to call. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration) for information about all health check configurations. + +In the following example, an HTTP check named `HTTP API on port 5000` sends a `POST` request to the `health` endpoint every 10 seconds: + + ```hcl check = { @@ -216,28 +225,32 @@ check = { } } ``` - -### TCP check +HTTP checks send GET requests by default, but you can specify another request method in the `method` field. You can send additional headers in the `header` block. The `header` block contains a key and an array of strings, such as `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks timeout at 10 seconds, but you can specify a custom timeout value in the `timeout` field. + +HTTP checks expect a valid TLS certificate by default. You can disable certificate verification by setting the `tls_skip_verify` field to `true`. When using TLS and a host name is specified in the `http` field, the check automatically determines the SNI from the URL. If the `http` field is configured with an IP address or if you want to explicitly set the SNI, specify the name in the `tls_server_name` field. + +The check follows HTTP redirects configured in the network by default. Set the `disable_redirects` field to `true` to disable redirects. + +### HTTP check response codes +Responses larger than 4KB are truncated. The HTTP response determines the status of the service: + +- A `200`-`299` response code is healthy. +- A `429` response code indicating too many requests is a warning. +- All other response codes indicate a failure. -TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts. -If no hostname is specified, it defaults to "localhost". -The health check status is `success` if the target host accepts the connection attempt, -otherwise the status is `critical`. In the case of a hostname that -resolves to both IPv4 and IPv6 addresses, an attempt is made to both -addresses, and the first successful connection attempt results in a -successful check. This type of check should be preferred over a script that -uses `netcat` or another external process to check a simple socket operation. -By default, TCP checks are configured with a request timeout equal to 10 seconds. -To configure a custom TCP check timeout value, -specify the `timeout` field in the check definition. +## TCP checks +TCP checks establish connections to the specified IPs or hosts. If the check successfully establishes a connection, the service status is reported as `success`. If the IP or host does not accept the connection, the service status is reported as `critical`. We recommend TCP checks over [script checks](#script-checks) that use netcat or another external process to check a socket operation. -The following service definition file snippet is an example -of a TCP check definition: +### Configuration +Add a `tcp` field to the `check` block in your service definition file and specify the address, including port number, for the check to call. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/services/configuration/health-checks-configuration) for information about all health check configurations. + +In the following example, a TCP check named `SSH TCP on port 22` attempts to connect to `localhost:22` every 10 seconds: + + - ```hcl check = { @@ -263,23 +276,19 @@ check = { -### UDP check +If a hostname resolves to an IPv4 and an IPv6 address, Consul attempts to connect to both addresses. The first successful connection attempt results in a successful check. -UDP checks periodically direct the Consul agent to send UDP datagrams -to the specified IP/hostname and port, -waiting `interval` amount of time between attempts. -The check status is set to `success` if any response is received from the targeted UDP server. -Any other result sets the status to `critical`. +By default, TCP check requests timeout at 10 seconds, but you can specify a custom timeout in the `timeout` field. -By default, UDP checks are configured with a request timeout equal to 10 seconds. -To configure a custom UDP check timeout value, -specify the `timeout` field in the check definition. -If any timeout on read exists, the check is still considered healthy. +## UDP checks +UDP checks direct the Consul agent to send UDP datagrams to the specified IP or hostname and port. The check status is set to `success` if any response is received from the targeted UDP server. Any other result sets the status to `critical`. -The following service definition file snippet is an example -of a UDP check definition: +### UDP check configuration +Add a `udp` field to the `check` block in your service definition file and specify the address, including port number, for sending datagrams. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration) for information about all health check configurations. - +In the following example, a UDP check named `DNS UDP on port 53` sends datagrams to `localhost:53` every 10 seconds: + + ```hcl check = { @@ -305,19 +314,17 @@ check = { -### OSService check +By default, UDP checks timeout at 10 seconds, but you can specify a custom timeout in the `timeout` field. If any timeout on read exists, the check is still considered healthy. + +## OSService check +OSService checks if an OS service is running on the host. OSService checks support Windows services on Windows hosts or SystemD services on Unix hosts. The check logs the service as `healthy` if it is running. If the service is not running, the status is logged as `critical`. All other results are logged with `warning`. A `warning` status indicates that the check is not reliable because an issue is preventing it from determining the health of the service. -OSService checks periodically direct the Consul agent to monitor the health of a service running on -the host operating system as either a Windows service (Windows) or a SystemD service (Unix). -The check is logged as `healthy` if the service is running. -If it is stopped or not running, the status is `critical`. All other results set -the status to `warning`, which indicates that the check is not reliable because -an issue is preventing the check from determining the health of the service. +### OSService check configurations +Add an `os_service` field to the `check` block in your service definition file and specify the name of the service to check. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration] for information about all health check configurations. -The following service definition file snippet is an example -of an OSService check definition: +In the following example, an OSService check named `svcname-001 Windows Service Health` verifies that the `myco-svctype-svcname-001` service is running every 10 seconds: - + ```hcl check = { @@ -343,35 +350,25 @@ check = { -### Time to live (TTL) check +## TTL checks +Time-to-live (TTL) checks wait for an external process to report the service's state to a Consul [`/agent/check` HTTP endpoint](/consul/api-docs/agent/check). If the check does not receive an update before the specified `ttl` duration, the check logs the service as `critical`. For example, if a healthy application is configured to periodically send a `PUT` request a status update to the HTTP endpoint, then the health check logs a `critical` state if the application is unable to send the update before the TTL expires. The check uses the following endpoints to update health information: -TTL checks retain their last known state for the specified `ttl` duration. -The state of the check updates periodically over the HTTP interface. -If the `ttl` duration elapses before a new check update -is provided over the HTTP interface, -the check is set to `critical` state. +- [pass](/api-docs/agent/check#ttl-check-pass) +- [warn] (/api-docs/agent/check#ttl-check-warn) +- [fail](/api-docs/agent/check#ttl-check-fail) +- [update](/api-docs/agent/check#ttl-check-update) -This mechanism relies on the application to directly report its health. -For example, a healthy app can periodically `PUT` a status update to the HTTP endpoint. -Then, if the app is disrupted and unable to perform this update -before the TTL expires, the health check enters the `critical` state. -The endpoints used to update health information for a given check are: [pass](/consul/api-docs/agent/check#ttl-check-pass), -[warn](/consul/api-docs/agent/check#ttl-check-warn), [fail](/consul/api-docs/agent/check#ttl-check-fail), -and [update](/consul/api-docs/agent/check#ttl-check-update). TTL checks also persist their -last known status to disk. This persistence allows the Consul agent to restore the last known -status of the check across agent restarts. Persisted check status is valid through the -end of the TTL from the time of the last check. +TTL checks also persist their last known status to disk so that the Consul agent can restore the last known status of the check across restarts. Persisted check status is valid through the end of the TTL from the time of the last check. -To manually mark a service unhealthy, -it is far more convenient to use the maintenance mode -[CLI command](/consul/commands/maint) or -[HTTP API endpoint](/consul/api-docs/agent#enable-maintenance-mode) -rather than a TTL health check with arbitrarily high `ttl`. +You can manually mark a service as unhealthy using the [`consul maint` CLI command](/consul/commands/maint) or [`agent/maintenance` HTTP API endpoint](/consul/api-docs/agent#enable-maintenance-mode), rather than waiting for a TTL health check if the `ttl` duration is high. -The following service definition file snippet is an example -of a TTL check definition: +### TTL check configuration +Add a `ttl` field to the `check` block in your service definition file and specify how long to wait for an update from the external process. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration] for information about all health check configurations. + +In the following example, a TTL check named `Web App Status` logs the application as `critical` if a status update is not received every 30 seconds: + + - ```hcl check = { @@ -395,39 +392,33 @@ check = { -### Docker check - -These checks depend on periodically invoking an external application that -is packaged within a Docker Container. The application is triggered within the running -container through the Docker Exec API. We expect that the Consul agent user has access -to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to -determine the Docker API endpoint. The application is expected to run, perform a health -check of the service running inside the container, and exit with an appropriate exit code. -The check should be paired with an invocation interval. The shell on which the check -has to be performed is configurable, making it possible to run containers which -have different shells on the same host. -The output of a Docker check is limited to 4KB. -Larger outputs are truncated. - -Docker checks are not enabled by default. -To enable a Consul agent to perform Docker checks, -use one of the following agent configuration options: - -- [`enable_local_script_checks`](/consul/docs/agent/config/cli-flags#_enable_local_script_checks): - Enable script checks defined in local config files. - Script checks registered using the HTTP API are not allowed. +## Docker checks +Docker checks invoke an application packaged within a Docker container. The application should perform a health check and exit with an appropriate exit code. + +The application is triggered within the running container through the Docker `exec` API. You should have access to either the Docker HTTP API or the Unix socket. Consul uses the `$DOCKER_HOST` environment variable to determine the Docker API endpoint. + +The output of a Docker check is limited to 4KB. Larger outputs are truncated. + +### Docker check configuration +To enable Docker checks, you must first enable the agent to send external requests, then configure the health check settings in the service definition: + +1. Add one of the following configurations to your agent configuration file to enable a Docker check: + + - [`enable_local_script_checks`](/consul/docs/agent/config/cli-flags#_enable_local_script_checks): Enable script checks defined in local config files. Script checks registered using the HTTP API are not allowed. -- [`enable_script_checks`](/consul/docs/agent/config/cli-flags#_enable_script_checks): - Enable script checks no matter how they are registered. + - [`enable_script_checks`](/consul/docs/agent/config/cli-flags#_enable_script_checks): Enable script checks no matter how they are registered. + + !> **Security warning**: Enabling non-local script checks in some configurations may introduce a known remote execution vulnerability targeted by malware. We strongly recommend `enable_local_script_checks` instead. +1. Configure the following fields in the `check` block in your service definition file: + - `docker_container_id`: The `docker ps` command is a common way to get the ID. + - `shell`: Specifies the shell to use for performing the check. Different containers can run different shells on the same host. + - `args`: Specifies the external application to invoke. + - `interval`: Specifies the interval for running the check. - !> **Security Warning:** - We recommend using `enable_local_script_checks` instead of `enable_script_checks` in production - environments, as remote script checks are more vulnerable to malware attacks. Learn more about how [script checks can be exploited](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations#how-script-checks-can-be-exploited). +In the following example, a Docker check named `Memory utilization` invokes the `check_mem.py` application in container `f972c95ebf0e` every 10 seconds: -The following service definition file snippet is an example -of a Docker check definition: + - ```hcl check = { @@ -455,28 +446,15 @@ check = { -### gRPC check +## gRPC checks +gRPC checks send a request to the specified endpoint. These checks are intended for applications that support the standard [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). -gRPC checks are intended for applications that support the standard -[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). -The state of the check will be updated by periodically probing the configured endpoint, -waiting `interval` amount of time between attempts. +### gRPC check configuration +Add a `grpc` field to the `check` block in your service definition file and specify the endpoint, including port number, for sending requests. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration] for information about all health check configurations. -By default, gRPC checks are configured with a timeout equal to 10 seconds. -To configure a custom Docker check timeout value, -specify the `timeout` field in the check definition. +In the following example, a gRPC check named `Service health status` probes the entire application by sending requests to `127.0.0.1:12345` every 10 seconds: -gRPC checks default to not using TLS. -To enable TLS, set `grpc_use_tls` in the check definition. -If TLS is enabled, then by default, a valid TLS certificate is expected. -Certificate verification can be turned off by setting the -`tls_skip_verify` field to `true` in the check definition. -To check on a specific service instead of the whole gRPC server, -add the service identifier after the `gRPC` check's endpoint. - -The following example shows a gRPC check for a whole application: - - + ```hcl check = { @@ -502,9 +480,12 @@ check = { -The following example shows a gRPC check for the specific `my_service` service: +gRPC checks probe the entire gRPC server, but you can check on a specific service by adding the service identifier after the gRPC check's endpoint using the following format: `/:service_identifier`. + +In the following example, a gRPC check probes `my_service` in the application at `127.0.0.1:12345` every 10 seconds: + - + ```hcl check = { @@ -530,25 +511,20 @@ check = { -### H2ping check +TLS is disabled for gRPC checks by default. You can enable TLS by setting `grpc_use_tls` to `true`. If TLS is enabled, you must either provide a valid TLS certificate or disable certificate verification by setting the `tls_skip_verify` field to `true`. -H2ping checks test an endpoint that uses http2 by connecting to the endpoint -and sending a ping frame, waiting `interval` amount of time between attempts. -If the ping is successful within a specified timeout, -then the check status is set to `success`. +By default, gRPC checks timeout after 10 seconds, but you can specify a custom duration in the `timeout` field. -By default, h2ping checks are configured with a request timeout equal to 10 seconds. -To configure a custom h2ping check timeout value, -specify the `timeout` field in the check definition. +## H2ping checks +H2ping checks test an endpoint that uses HTTP2 by connecting to the endpoint and sending a ping frame. If the endpoint sends a response within the specified interval, the check status is set to `success`. -TLS is enabled by default. -To disable TLS and use h2c, set `h2ping_use_tls` to `false`. -If TLS is not disabled, a valid certificate is required unless `tls_skip_verify` is set to `true`. +### H2ping check configuration +Add an `h2ping` field to the `check` block in your service definition file and specify the HTTP2 endpoint, including port number, for the check to ping. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration) for information about all health check configurations. -The following service definition file snippet is an example -of an h2ping check definition: +In the following example, an H2ping check named `h2ping` pings the endpoint at `localhost:22222` every 10 seconds: - + + ```hcl check = { @@ -574,312 +550,43 @@ check = { -### Alias check - -These checks alias the health state of another registered -node or service. The state of the check updates asynchronously, but is -nearly instant. For aliased services on the same agent, the local state is monitored -and no additional network resources are consumed. For other services and nodes, -the check maintains a blocking query over the agent's connection with a current -server and allows stale requests. If there are any errors in watching the aliased -node or service, the check state is set to `critical`. -For the blocking query, the check uses the ACL token set on the service or check definition. -If no ACL token is set in the service or check definition, -the blocking query uses the agent's default ACL token -([`acl.tokens.default`](/consul/docs/agent/config/config-files#acl_tokens_default)). - -~> **Configuration info**: The alias check configuration expects the alias to be -registered on the same agent as the one you are aliasing. If the service is -not registered with the same agent, `"alias_node": ""` must also be -specified. When using `alias_node`, if no service is specified, the check will -alias the health of the node. If a service is specified, the check will alias -the specified service on this particular node. - -The following service definition file snippet is an example -of an alias check for a local service: - - - -```hcl -check = { - id = "web-alias" - alias_service = "web" -} -``` - -```json -{ - "check": { - "id": "web-alias", - "alias_service": "web" - } -} -``` - - - -## Check definition +TLS is enabled by default, but you can disable TLS by setting `h2ping_use_tls` to `false`. When TLS is disabled, the Consul sends pings over h2c. When TLS is enabled, a valid certificate is required unless `tls_skip_verify` is set to `true`. -This section covers some of the most common options for check definitions. -For a complete list of all check options, refer to the -[Register Check HTTP API endpoint documentation](/consul/api-docs/agent/check#json-request-body-schema). +By default, H2ping checks timeout at 10 seconds, but you can specify a custom duration in the `timeout` field. --> **Casing for check options:** - The correct casing for an option depends on whether the check is defined in - a service definition file or an HTTP API JSON request body. - For example, the option `deregister_critical_service_after` in a service - definition file is instead named `DeregisterCriticalServiceAfter` in an - HTTP API JSON request body. -#### General options +## Alias checks +Alias checks continuously report the health state of another registered node or service. If the alias experiences errors while watching the actual node or service, the check reports a`critical` state. Consul updates the alias and actual node or service state asynchronously but nearly instantaneously. -- `name` `(string: )` - Specifies the name of the check. +For aliased services on the same agent, the check monitors the local state without consuming additional network resources. For services and nodes on different agents, the check maintains a blocking query over the agent's connection with a current server and allows stale requests. -- `id` `(string: "")` - Specifies a unique ID for this check on this node. +### ACLs +For the blocking query, the alias check presents the ACL token set on the actual service or the token configured in the check definition. If neither are available, the alias check falls back to the default ACL token set for the agent. Refer to [`acl.tokens.default`](/consul/docs/agent/config/config-files#acl_tokens_default) for additional information about the default ACL token. - If unspecified, Consul defines the check id by: - - If the check definition is embedded within a service definition file, - a unique check id is auto-generated. - - Otherwise, the `id` is set to the value of `name`. - If names might conflict, you must provide unique IDs to avoid - overwriting existing checks with the same id on this node. +### Configuration +Add an `alias_service` field to the `check` block in your service definition file and specify the name of the service or node to alias. All other fields are optional. Refer to [Health Checks Configuration Reference](/consul/docs/discovery/configuration/health-checks-configuration] for information about all health check configurations. -- `interval` `(string: )` - Specifies - the frequency at which to run this check. - Required for all check types except TTL and alias checks. +In the following example, an alias check with the ID `web-alias` reports the health state of the `web` service: - The value is parsed by Go's `time` package, and has the following - [formatting specification](https://golang.org/pkg/time/#ParseDuration): + - > A duration string is a possibly signed sequence of decimal numbers, each with - > optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". - > Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". - -- `service_id` `(string: )` - Specifies - the ID of a service instance to associate this check with. - That service instance must be on this node. - If not specified, this check is treated as a node-level check. - For more information, refer to the - [service-bound checks](#service-bound-checks) section. - -- `status` `(string: "")` - Specifies the initial status of the health check as - "critical" (default), "warning", or "passing". For more details, refer to - the [initial health check status](#initial-health-check-status) section. - - -> **Health defaults to critical:** If health status it not initially specified, - it defaults to "critical" to protect against including a service - in discovery results before it is ready. - -- `deregister_critical_service_after` `(string: "")` - If specified, - the associated service and all its checks are deregistered - after this check is in the critical state for more than the specified value. - The value has the same formatting specification as the [`interval`](#interval) field. - - The minimum timeout is 1 minute, - and the process that reaps critical services runs every 30 seconds, - so it may take slightly longer than the configured timeout to trigger the deregistration. - This field should generally be configured with a timeout that's significantly longer than - any expected recoverable outage for the given service. - -- `notes` `(string: "")` - Provides a human-readable description of the check. - This field is opaque to Consul and can be used however is useful to the user. - For example, it could be used to describe the current state of the check. - -- `token` `(string: "")` - Specifies an ACL token used for any interaction - with the catalog for the check, including - [anti-entropy syncs](/consul/docs/architecture/anti-entropy) and deregistration. - - For alias checks, this token is used if a remote blocking query is necessary to watch the state of the aliased node or service. - -#### Success/failures before passing/warning/critical - -To prevent flapping health checks and limit the load they cause on the cluster, -a health check may be configured to become passing/warning/critical only after a -specified number of consecutive checks return as passing/critical. -The status does not transition states until the configured threshold is reached. - -- `success_before_passing` - Number of consecutive successful results required - before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0. - -- `failures_before_warning` - Number of consecutive unsuccessful results required - before check status transitions to warning. Defaults to the same value as that of - `failures_before_critical` to maintain the expected behavior of not changing the - status of service checks to `warning` before `critical` unless configured to do so. - Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0. - -- `failures_before_critical` - Number of consecutive unsuccessful results required - before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0. - -This feature is available for all check types except TTL and alias checks. -By default, both passing and critical thresholds are set to 0 so the check -status always reflects the last check result. - - - -```hcl -checks = [ - { - name = "HTTP TCP on port 80" - tcp = "localhost:80" - interval = "10s" - timeout = "1s" - success_before_passing = 3 - failures_before_warning = 1 - failures_before_critical = 3 - } -] -``` - -```json -{ - "checks": [ - { - "name": "HTTP TCP on port 80", - "tcp": "localhost:80", - "interval": "10s", - "timeout": "1s", - "success_before_passing": 3, - "failures_before_warning": 1, - "failures_before_critical": 3 - } - ] -} -``` - - - -## Initial health check status - -By default, when checks are registered against a Consul agent, the state is set -immediately to "critical". This is useful to prevent services from being -registered as "passing" and entering the service pool before they are confirmed -to be healthy. In certain cases, it may be desirable to specify the initial -state of a health check. This can be done by specifying the `status` field in a -health check definition, like so: - - - -```hcl -check = { - id = "mem" - args = ["/bin/check_mem", "-limit", "256MB"] - interval = "10s" - status = "passing" -} -``` - -```json -{ - "check": { - "id": "mem", - "args": ["/bin/check_mem", "-limit", "256MB"], - "interval": "10s", - "status": "passing" - } -} -``` - - - -The above service definition would cause the new "mem" check to be -registered with its initial state set to "passing". - -## Service-bound checks - -Health checks may optionally be bound to a specific service. This ensures -that the status of the health check will only affect the health status of the -given service instead of the entire node. Service-bound health checks may be -provided by adding a `service_id` field to a check configuration: - - ```hcl check = { - id = "web-app" - name = "Web App Status" - service_id = "web-app" - ttl = "30s" + id = "web-alias" + alias_service = "web" } ``` ```json { "check": { - "id": "web-app", - "name": "Web App Status", - "service_id": "web-app", - "ttl": "30s" + "id": "web-alias", + "alias_service": "web" } } ``` -In the above configuration, if the web-app health check begins failing, it will -only affect the availability of the web-app service. All other services -provided by the node will remain unchanged. - -## Agent certificates for TLS checks - -The [enable_agent_tls_for_checks](/consul/docs/agent/config/config-files#enable_agent_tls_for_checks) -agent configuration option can be utilized to have HTTP or gRPC health checks -to use the agent's credentials when configured for TLS. - -## Multiple check definitions - -Multiple check definitions can be defined using the `checks` (plural) -key in your configuration file. - - - -```hcl -checks = [ - { - id = "chk1" - name = "mem" - args = ["/bin/check_mem", "-limit", "256MB"] - interval = "5s" - }, - { - id = "chk2" - name = "/health" - http = "http://localhost:5000/health" - interval = "15s" - }, - { - id = "chk3" - name = "cpu" - args = ["/bin/check_cpu"] - interval = "10s" - }, - ... -] -``` - -```json -{ - "checks": [ - { - "id": "chk1", - "name": "mem", - "args": ["/bin/check_mem", "-limit", "256MB"], - "interval": "5s" - }, - { - "id": "chk2", - "name": "/health", - "http": "http://localhost:5000/health", - "interval": "15s" - }, - { - "id": "chk3", - "name": "cpu", - "args": ["/bin/check_cpu"], - "interval": "10s" - }, - ... - ] -} -``` - - +By default, the alias must be registered with the same Consul agent as the alias check. If the service is not registered with the same agent, you must specify `"alias_node": ""` in the `check` configuration. If no service is specified and the `alias_node` field is enabled, the check aliases the health of the node. If a service is specified, the check will alias the specified service on this particular node. \ No newline at end of file