Elasticsearch Helm Chart review #53426

jmlrt · 2020-03-11T18:09:57Z

Product teams tasks

We’d like to have chart reviews from product teams to validate that they are following product recommended configuration for Kubernetes:

Elasticsearch

Review Elasticsearch Helm Chart @jasontedor

jmlrt · 2020-03-11T18:11:54Z

Hi @jasontedor
Thank you for working on this issue.

Elasticsearch chart code is in helm-charts/elasticsearch.

This chart will create the following K8S resources:

StatefulSet which will manage Elasticsearch pods including a few init and side car containers to manage virtual memory tuning, keystore and graceful termination
2 Services which will expose the pods as network services internal to the K8S cluster and will be used for node discovery (see Clustering and Node Discovery for more details)
(Optional) ConfigMap which can contain elasticsearch.yml and log4j2.properties content for example (default values in values.yaml)
(Optional) Ingress which manage external access to the service
(Optional) ServiceAccount, Role and RoleBinding to manage K8S RBAC
(Optional) PodDisruptionBudget to ensure HA during updates and K8S operations
(Optional) PodSecurityPolicy to enforce rules controlling what a pod can do

What we would like for this review

Validate that a chart deployment with default values provides a well-configured Elasticsearch clusters
Validate that we don't need additional K8S resources for specific Elasticsearch use cases (examples additional initContainer, additional volumes for persistency, ...)
Validate the default resources requests / limits and ES_JAVA_OPTS
Validate the readinessProbe to check that Elasticsearch is ready and help us define a livenessProbe (quoting @Crazybus livenessProbe = I'm dead, restart me / readinessProbe = I'm not ready to receive traffic, wait for me)
Validate that we can use RollingUpdate strategy for StatefulSet during updates
Any other thing that could be relevant

Please ping me if you have any questions

Crazybus · 2020-03-12T10:04:35Z

Validate the readinessProbe to check that Elasticsearch is ready and help us define a livenessProbe (quoting @Crazybus livenessProbe = I'm dead, restart me / readinessProbe = I'm not ready to receive traffic, wait for me)

This quote was originally for non-cluster applications like Metricbeat/Filebeat. For stateful clusters applications like Elasticsearch there is also another really important usage for the readinessProbe.

Under normal operating conditions the readinessProbe needs to be "node aware". If this is passing, please send requests to me. If it is failing remove me from the service.

During restarts (rolling upgrades, Kubernetes node cluster maintenance or pod rescheduling) this readinessProbe also needs to become "cluster aware". This is because Kubernetes will wait for the first pod to be ready before restarting the next one.

Without any cluster level health checking here it would mean that Kubernetes would restart each pod in sequence as soon as it started responding to a basic "I am ready" check. For Elasticsearch this would mean that nothing was waiting for the cluster to be "green" in between restarts of each Elasticsearch pod.

The current logic in the readinessProbe will make sure that on initial startup the probe will wait for the cluster to become green. Once this condition is met the probe switches over to the node level check.

This makes sure that cluster health is green before upgrading the next pod, and that already running pods are not considered unhealthy if the cluster health is non-green.

In an ideal world Kubernetes statefulsets would have different readinessProbes to signify pod level health, and cluster level health for rolling upgrade but this is unfortunately not the case.

The readinessProbe:
https://github.com/elastic/helm-charts/blob/a0e8d77c6a292636ef771e78f24b04b6ea1158bb/elasticsearch/templates/statefulset.yaml#L215-L227

elasticmachine · 2020-03-18T14:18:38Z

Pinging @elastic/es-core-infra (:Core/Infra/Core)

pugnascotia · 2020-03-26T16:57:51Z

Feedback from an Elasticsearch POV:

I'm running the latest Docker for Mac and helm (via Homebrew). I found that I couldn't run the examples as-is because the Helm CLI has changed. For example, specifying a timeout now requires a unit, and --purge isn't recognised by helm del.
I ran the docker-for-mac and security examples to check that everything worked (I had to change latter to run on my laptop, as per docker-for-mac).
We should revisit the readinessProbe - see Elasticsearch readiness probe might fail if a single node is stuck cloud-on-k8s#2248 for what ECK does and why. The TL;DR version is just to call /, but note that there are some subtleties with HTTP response codes and the ES version.
I'm not sure what we could define for a livenessProbe - I checked with the ECK developers, they don't define a livenessProbe at all because they don't want Kubernetes "to randomly restart ES nodes when the get unhealthy"

It may be useful to get an ECK developer to take a look as well.

jmlrt · 2020-03-27T15:09:04Z

Hi @pugnascotia,
Thank you for this feedback

I'm running the latest Docker for Mac and helm (via Homebrew). I found that I couldn't run the examples as-is because the Helm CLI has changed. For example, specifying a timeout now requires a unit, and --purge isn't recognised by helm del.

Yeah, our charts aren't officially supporting Helm 3 yet as stated in (elastic/helm-charts/elasticsearch#requirements). I should have specified it in this issue 🤦‍♂ .
However, despite a few differences in command lines and other small difference, Elasticsearch is mostly compatible with Helm 3, so I'm glad you were still able to test it.

I ran the docker-for-mac and security examples to check that everything worked (I had to change latter to run on my laptop, as per docker-for-mac).

👍

jmlrt · 2020-04-02T14:57:26Z

@pugnascotia
I created elastic/helm-charts#553 to revisit the readinessProbe and we may get some ECK feedback when we'll work on it.

Do you have other feedback to add for Elasticsearch chart or can we close this issue?

pugnascotia · 2020-04-02T15:29:50Z

I don't have any more feedback.

jmlrt · 2020-04-02T15:38:55Z

Great! So I'll close this issue. Thank you for your review 👍

jmlrt · 2020-04-21T12:07:58Z

We should revisit the readinessProbe - see elastic/cloud-on-k8s#2248 for what ECK does and why. The TL;DR version is just to call /, but note that there are some subtleties with HTTP response codes and the ES version.

@pugnascotia fyi we merged elastic/helm-charts#586 to update readiness probe

pugnascotia · 2020-04-21T12:40:40Z

@jmlrt looks good. Tell me - what Docker image will those curl commands run inside?

jmlrt · 2020-04-21T12:59:32Z

Tell me - what Docker image will those curl commands run inside?

Commands for the readiness probe are run inside the elasticsearch container

jmlrt assigned jasontedor Mar 11, 2020

nik9000 added the :Core/Infra/Core Core issues without another label label Mar 18, 2020

jasontedor assigned pugnascotia and unassigned jasontedor Mar 23, 2020

jmlrt mentioned this issue Apr 2, 2020

[elasticsearch] Revisit readinessProbe elastic/helm-charts#553

Closed

jmlrt closed this as completed Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch Helm Chart review #53426

Elasticsearch Helm Chart review #53426

jmlrt commented Mar 11, 2020 •

edited by mgreau

Loading

jmlrt commented Mar 11, 2020 •

edited

Loading

Crazybus commented Mar 12, 2020

elasticmachine commented Mar 18, 2020

pugnascotia commented Mar 26, 2020

jmlrt commented Mar 27, 2020

jmlrt commented Apr 2, 2020

pugnascotia commented Apr 2, 2020

jmlrt commented Apr 2, 2020

jmlrt commented Apr 21, 2020

pugnascotia commented Apr 21, 2020

jmlrt commented Apr 21, 2020

Elasticsearch Helm Chart review #53426

Elasticsearch Helm Chart review #53426

Comments

jmlrt commented Mar 11, 2020 • edited by mgreau Loading

Product teams tasks

Elasticsearch

jmlrt commented Mar 11, 2020 • edited Loading

What we would like for this review

Crazybus commented Mar 12, 2020

elasticmachine commented Mar 18, 2020

pugnascotia commented Mar 26, 2020

jmlrt commented Mar 27, 2020

jmlrt commented Apr 2, 2020

pugnascotia commented Apr 2, 2020

jmlrt commented Apr 2, 2020

jmlrt commented Apr 21, 2020

pugnascotia commented Apr 21, 2020

jmlrt commented Apr 21, 2020

jmlrt commented Mar 11, 2020 •

edited by mgreau

Loading

jmlrt commented Mar 11, 2020 •

edited

Loading