In this lab we will learn how to monitor application health using OpenShift health probes and how you can see container resource consumption using metrics.
When building microservices, monitoring becomes of extreme importance to make sure all services are running at all times, and when they don't there are automatic actions triggered to rectify the issues.
OpenShift, using Kubernetes health probes, offers a solution for monitoring application health and trying to automatically heal faulty containers through restarting them to fix issues such as a deadlock in the application which can be resolved by restarting the container. Restarting a container in such a state can help to make the application more available despite bugs.
Furthermore, there are of course a category of issues that can't be resolved by restarting the container. In those scenarios, OpenShift would remove the faulty container from the built-in load-balancer and send traffic only to the healthy containers that remain.
There are two types of health probes available in OpenShift: liveness probes and readiness probes. Liveness probes are to know when to restart a container and readiness probes to know when a Container is ready to start accepting traffic.
Health probes also provide crucial benefits when automating deployments with practices like rolling updates in order to remove downtime during deployments. A readiness health probe would signal OpenShift when to switch traffic from the old version of the container to the new version so that the users don't get affected during deployments.
There are three ways to define a health probe for a container:
-
HTTP Checks: healthiness of the container is determined based on the response code of an HTTP endpoint. Anything between 200 and 399 is considered success. A HTTP check is ideal for applications that return HTTP status codes when completely initialized.
-
Container Execution Checks: a specified command is executed inside the container and the healthiness is determined based on the return value (0 is success).
-
TCP Socket Checks: a socket is opened on a specified port to the container and it's considered healthy only if the check can establish a connection. TCP socket check is ideal for applications that do not start listening until initialization is complete.
Let's add health probes to the microservices deployed so far.
Spring Boot, WildFly Swarm and Vert.x all provide out-of-the-box support for creating RESTful endpoints that provide details on the health of the application. These endpoints by default provide basic data about the service however they all provide a way to customize the health data and add more meaningful information (e.g. database connection health, backoffice system availability, etc).
Spring Boot Actuator is a
sub-project of Spring Boot which adds health and management HTTP endpoints to the application. Enabling Spring Boot
Actuator is done via adding org.springframework.boot:spring-boot-starter-actuator
dependency to the Maven project
dependencies which is already done for the Catalog services.
Verify that the health endpoint works for the Catalog service using curl
, replacing {{CATALOG_ROUTE_HOST}}
with the Catalog route url:
Remember how to find out the route urls? Try
oc get route catalog
$ curl http://{{CATALOG_ROUTE_HOST}}/health
{"status":"UP","diskSpace":{"status":"UP","total":3209691136,"free":2667175936,"threshold":10485760},"db":{"status":"UP","database":"H2","hello":1}}
WildFly Swarm health endpoints function in a similar fashion and are enabled by adding org.wildfly.swarm:monitor
to the Maven project dependencies.
This is also already done for the Inventory service.
Verify that the health endpoint works for the Inventory service using curl
, replacing {{INVENTORY_ROUTE_HOST}}
with the Inventory route url:
You know this by know! Use
oc get route inventory
to get the Inventory route url
$ curl http://{{INVENTORY_ROUTE_HOST}}/node
{
"name" : "localhost",
"server-state" : "running",
"suspend-state" : "RUNNING",
"running-mode" : "NORMAL",
"uuid" : "79b3ffc5-d98c-4b8e-ae5c-9756ed13944a",
"swarm-version" : "2017.8.1"
}
Expectedly, Eclipse Vert.x also provides a health check module
which is enabled by adding io.vertx:vertx-health-check
as a dependency to the Maven project.
Verify that the health endpoint works for the Inventory service using curl
, replacing {{API_GATEWAY_ROUTE_HOST}}
with the API Gateway route url::
Yup! You can use
oc get route gateway
to get the API Gateway route url
$ curl http://{{API_GATEWAY_ROUTE_HOST}}/health
{"status":"UP"}
Last but not least, although you can build more sophisticated health endpoints for the Web UI as well, you can use the root context ("/") of the Web UI in this lab to verify it's up and running.
Health probes are defined on the deployment config for each pod and can be added using OpenShift Web Console or OpenShift CLI. You will try both in this lab.
Like mentioned, health probes are defined on a deployment config for each pod. Review the available deployment configs in the project.
$ oc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
catalog 1 1 1 config,image(catalog:latest)
gateway 1 1 1 config,image(gateway:latest)
inventory 1 1 1 config,image(inventory:latest)
web 1 1 1 config,image(web:latest)
dc
stands for deployment config
Add a liveness probe on the catalog deployment config using oc set probe
:
$ oc set probe dc/catalog --liveness --initial-delay-seconds=30 --failure-threshold=3 --get-url=http://:8080/health
OpenShift automates deployments using deployment triggers that react to changes to the container image or configuration. Therefore, as soon as you define the probe, OpenShift automatically redeploys the Catalog pod using the new configuration including the liveness probe.
The --get-url
defines the HTTP endpoint to use for check the liveness of the container. The \http://:8080
syntax is a convenient way to define the endpoint without having to worry about the hostname for the running
container.
It is possible to customize the probes even further using for example
--initial-delay-seconds
to specify how long to wait after the container starts and before to begin checking the probes. Runoc set probe --help
to get a list of all available options.
Add a readiness probe on the catalog deployment config using the same /health
endpoint that you used for
the liveness probe.
It's recommended to have separate endpoints for readiness and liveness to indicate to OpenShift when to restart the container and when to leave it alone and remove it from the load-balancer so that an administrator would manually investigate the issue.
$ oc set probe dc/catalog --readiness --initial-delay-seconds=30 --failure-threshold=3 --get-url=http://:8080/health
Viola! OpenShift automatically restarts the Catalog pod and as soon as the health probes succeed, it is ready to receive traffic.
Fabric8 Maven Plugin can also be configured to automatically set the health probes when running
fabric8:deploy
goal. Read more on Fabric8 docs under Spring Boot, WildFly Swarm and Eclipse Vert.x.
Adding liveness and readiness probes can be done at the same time if you want to define the same health endpoint and parameters for both liveness and readiness probes.
Add liveness and readiness probes to the Inventory service:
$ oc set probe dc/inventory --liveness --readiness --initial-delay-seconds=30 --failure-threshold=3 --get-url=http://:8080/node
OpenShift automatically restarts the Inventory pod and as soon as the health probes succeed, it is ready to receive traffic.
Using the oc describe
command, you can get a detailed look into the deployment config and verify that the health probes are in fact
configured as you wanted:
$ oc describe dc/inventory
Name: inventory
Namespace: {{COOLSTORE_PROJECT}}
...
Containers:
wildfly-swarm:
...
Liveness: http-get http://:8080/node delay=180s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8080/node delay=10s timeout=1s period=10s #success=1 #failure=3
...
You are an expert in health probes by now! Add liveness and readiness probes to the API Gateway service:
$ oc set probe dc/gateway --liveness --readiness --initial-delay-seconds=15 --failure-threshold=3 --get-url=http://:8080/health
OpenShift automatically restarts the Inventory pod and as soon as the health probes succeed, it is ready to receive traffic.
Although you can add the liveness and health probes to the Web UI using a single CLI command, let's give the OpenShift Web Console a try this time.
Go the OpenShift Web Console in your browser and in the {{COOLSTORE_PROJECT}} project. Click on
Applications >> Deployments on the left-side bar. Click on web
and then the Configuration
tab. You will see the warning about health checks, with a link to
click in order to add them. Click Add health checks now.
Instead of Configuration tab, you can directly click on Actions button on the top-right and then Edit Health Checks
![Health Probes]({% image_path health-web-details.png %}){:width="900px"}
You will want to click both Add Readiness Probe and Add Liveness Probe and then fill them out as follows:
Readiness Probe
- Path:
/
- Initial Delay:
10
- Timeout:
1
Liveness Probe
- Path:
/
- Initial Delay:
180
- Timeout:
1
![Readiness Probe]({% image_path health-readiness.png %}){:width="700px"}
![Readiness Probe]({% image_path health-liveness.png %}){:width="700px"}
Click Save and then click the Overview button in the left navigation. You will notice that Web UI pod is getting restarted and it stays light blue for a while. This is a sign that the pod(s) have not yet passed their readiness checks and it turns blue when it's ready!
![Web Redeploy]({% image_path health-web-redeploy.png %}){:width="740px"}
Metrics are another important aspect of monitoring applications which is required in order to gain visibility into how the application behaves and particularly in identifying issues.
OpenShift provides container metrics out-of-the-box and displays how much memory, cpu and network each container has been consuming over time. In the project overview, you can see three charts near each pod that shows the resource consumption by that pod.
![Container Metrics]({% image_path health-metrics-brief.png %}){:width="740px"}
Click on any of the pods (blue circle) which takes you to the pod details. Click on the Metrics tab to see a more detailed view of the metrics charts.
![Container Metrics]({% image_path health-metrics-detailed.png %}){:width="900px"}
Well done! You are ready to move on to the next lab.