Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Helm Chart review #53426

Closed
1 task
jmlrt opened this issue Mar 11, 2020 · 11 comments
Closed
1 task

Elasticsearch Helm Chart review #53426

jmlrt opened this issue Mar 11, 2020 · 11 comments
Assignees
Labels
:Core/Infra/Core Core issues without another label

Comments

@jmlrt
Copy link
Member

jmlrt commented Mar 11, 2020

Product teams tasks

We’d like to have chart reviews from product teams to validate that they are following product recommended configuration for Kubernetes:

Elasticsearch

@jmlrt
Copy link
Member Author

jmlrt commented Mar 11, 2020

Hi @jasontedor
Thank you for working on this issue.

Elasticsearch chart code is in helm-charts/elasticsearch.

This chart will create the following K8S resources:

What we would like for this review

  • Validate that a chart deployment with default values provides a well-configured Elasticsearch clusters
  • Validate that we don't need additional K8S resources for specific Elasticsearch use cases (examples additional initContainer, additional volumes for persistency, ...)
  • Validate the default resources requests / limits and ES_JAVA_OPTS
  • Validate the readinessProbe to check that Elasticsearch is ready and help us define a livenessProbe (quoting @Crazybus livenessProbe = I'm dead, restart me / readinessProbe = I'm not ready to receive traffic, wait for me)
  • Validate that we can use RollingUpdate strategy for StatefulSet during updates
  • Any other thing that could be relevant

Please ping me if you have any questions

@Crazybus
Copy link
Contributor

Validate the readinessProbe to check that Elasticsearch is ready and help us define a livenessProbe (quoting @Crazybus livenessProbe = I'm dead, restart me / readinessProbe = I'm not ready to receive traffic, wait for me)

This quote was originally for non-cluster applications like Metricbeat/Filebeat. For stateful clusters applications like Elasticsearch there is also another really important usage for the readinessProbe.

Under normal operating conditions the readinessProbe needs to be "node aware". If this is passing, please send requests to me. If it is failing remove me from the service.

During restarts (rolling upgrades, Kubernetes node cluster maintenance or pod rescheduling) this readinessProbe also needs to become "cluster aware". This is because Kubernetes will wait for the first pod to be ready before restarting the next one.

Without any cluster level health checking here it would mean that Kubernetes would restart each pod in sequence as soon as it started responding to a basic "I am ready" check. For Elasticsearch this would mean that nothing was waiting for the cluster to be "green" in between restarts of each Elasticsearch pod.

The current logic in the readinessProbe will make sure that on initial startup the probe will wait for the cluster to become green. Once this condition is met the probe switches over to the node level check.

This makes sure that cluster health is green before upgrading the next pod, and that already running pods are not considered unhealthy if the cluster health is non-green.

In an ideal world Kubernetes statefulsets would have different readinessProbes to signify pod level health, and cluster level health for rolling upgrade but this is unfortunately not the case.

The readinessProbe:
https://github.com/elastic/helm-charts/blob/a0e8d77c6a292636ef771e78f24b04b6ea1158bb/elasticsearch/templates/statefulset.yaml#L215-L227

@nik9000 nik9000 added the :Core/Infra/Core Core issues without another label label Mar 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@jasontedor jasontedor assigned pugnascotia and unassigned jasontedor Mar 23, 2020
@pugnascotia
Copy link
Contributor

Feedback from an Elasticsearch POV:

  • I'm running the latest Docker for Mac and helm (via Homebrew). I found that I couldn't run the examples as-is because the Helm CLI has changed. For example, specifying a timeout now requires a unit, and --purge isn't recognised by helm del.
  • I ran the docker-for-mac and security examples to check that everything worked (I had to change latter to run on my laptop, as per docker-for-mac).
  • We should revisit the readinessProbe - see Elasticsearch readiness probe might fail if a single node is stuck cloud-on-k8s#2248 for what ECK does and why. The TL;DR version is just to call /, but note that there are some subtleties with HTTP response codes and the ES version.
  • I'm not sure what we could define for a livenessProbe - I checked with the ECK developers, they don't define a livenessProbe at all because they don't want Kubernetes "to randomly restart ES nodes when the get unhealthy"

It may be useful to get an ECK developer to take a look as well.

@jmlrt
Copy link
Member Author

jmlrt commented Mar 27, 2020

Hi @pugnascotia,
Thank you for this feedback

  • I'm running the latest Docker for Mac and helm (via Homebrew). I found that I couldn't run the examples as-is because the Helm CLI has changed. For example, specifying a timeout now requires a unit, and --purge isn't recognised by helm del.

Yeah, our charts aren't officially supporting Helm 3 yet as stated in (elastic/helm-charts/elasticsearch#requirements). I should have specified it in this issue 🤦‍♂ .
However, despite a few differences in command lines and other small difference, Elasticsearch is mostly compatible with Helm 3, so I'm glad you were still able to test it.

  • I ran the docker-for-mac and security examples to check that everything worked (I had to change latter to run on my laptop, as per docker-for-mac).

👍

@jmlrt
Copy link
Member Author

jmlrt commented Apr 2, 2020

@pugnascotia
I created elastic/helm-charts#553 to revisit the readinessProbe and we may get some ECK feedback when we'll work on it.

Do you have other feedback to add for Elasticsearch chart or can we close this issue?

@pugnascotia
Copy link
Contributor

I don't have any more feedback.

@jmlrt
Copy link
Member Author

jmlrt commented Apr 2, 2020

Great! So I'll close this issue. Thank you for your review 👍

@jmlrt jmlrt closed this as completed Apr 2, 2020
@jmlrt
Copy link
Member Author

jmlrt commented Apr 21, 2020

@pugnascotia fyi we merged elastic/helm-charts#586 to update readiness probe

@pugnascotia
Copy link
Contributor

@jmlrt looks good. Tell me - what Docker image will those curl commands run inside?

@jmlrt
Copy link
Member Author

jmlrt commented Apr 21, 2020

Tell me - what Docker image will those curl commands run inside?

Commands for the readiness probe are run inside the elasticsearch container

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label
Projects
None yet
Development

No branches or pull requests

6 participants