-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed: Add insecure-health-endpoint
flag to enable Kubernetes HTTP probes
#12370
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12370 +/- ##
==========================================
- Coverage 63.28% 63.03% -0.26%
==========================================
Files 448 448
Lines 62342 62352 +10
==========================================
- Hits 39454 39302 -152
- Misses 18329 18479 +150
- Partials 4559 4571 +12
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an integration test for this functionality.
Implementation lgtm
@ptabor Sure. Do you mean an e2e test? If you really do mean integration, could you point me to a good place to introduce one? The integration tests don't appear to work with the If you did mean e2e, same question — any guidance on where this test would fix amongst existing tests? I couldn't find one that exercises things like the default metrics listener but I'm going to keep looking and playing around. |
a4ded77
to
003da10
Compare
@ptabor I added an e2e test covering the new endpoint and documented the field. Thanks for the feedback! |
003da10
to
92a8037
Compare
@ironcladlou Thank you, looks awesome. LGTM - as a recommendation to etcd maintainers. |
The e2e fails in CI:
I'll have to figure that out. Works locally, so something about what I'm doing here is not portable. Perhaps an environmental idiosyncrasy with localhost. I also need to add short timeouts to the client call. |
I think majority of tests (but mostly integrational) are using unix://sockets for communication. BTW: Please name your commit: to mitigate:
|
I was actually able to reproduce the e2e failure natively on darwin, so I'm proceeding with debugging... |
92a8037
to
79e8d82
Compare
insecure-health-endpoint
flag to enable Kubernetes HTTP probesinsecure-health-endpoint
flag to enable Kubernetes HTTP probes
…probes Before this patch, Kubernetes users deploying etcd who configure mTLS authentication for client and metrics endpoints are unable to leverage etcd's `/health` endpoint for liveness or readiness probes because Kubernetes HTTP probes don't support mTLS. Users must work around this limitation with strategies like weak health checks using custom `exec` probes. Bypassing the `/health` endpoint for this purpose can lead to false positive probe results and quorum loss if a member is falsely determined to be ready during a rollout. This patch introduces an `--insecure-health-endpoint` flag which, if specified, enables an insecure HTTP `/health` endpoint which is functionally equivalent to other `/health` endpoints. This enables etcd pods to specify reliable HTTP GET probes for readiness and liveness checking. This approach stands as an alternative to introducing a parallel TLS configuration and listener just for the `/health` endpoint. The typical use case is probably a non-exposed port used only by a container orchestration component (e.g. the Kubelet). Fixes etcd-io#11993.
79e8d82
to
0a4e9f2
Compare
@ptabor fixed the test and the commit message. The e2e problem was that v2 was sometimes being enabled in the test etcd server due to various tests mutating shared mutable fixtures (in this case, For stability and as a matter of general good practice, I think all the tests should be audited and changed as necessary to work with copies of fixtures, but that's outside the scope of this PR. |
LGTM. Thank you for root causing that. Could you, please point me (or better: open an issue) with a link to exemplar configTLS modification in tests harness ? |
@ptabor Lines 271 to 275 in 3f449a8
etcd/tests/e2e/ctl_v3_migrate_test.go Lines 31 to 32 in 96cce20
Since this is fresh on my brain, I'll go ahead and do a refactor to fix everything and open a new PR. |
The e2e tests can be flaky depending on due to various tests mutating shared mutable fixtures, causing non-deterministic behavior depending on the test set, order, etc. For example, `configTLS` is mutated in at least two tests in such a way that the config is potentially invalidated for any subsequent test running in the same process (e.g. by setting the `enableV2` field`). This particular example caused a substantial amount of confusion diagnosing the new test introduced for etcd-io#12370. Independent tests should not share mutable state unless deliberately. This patch refactors the e2e test config fixtures to safeguard against these problems by replacing the package variables (which cannot be made immutable) with functions that return new instances.
Here's a PR which starts to fix the test state issues: #12392 |
The e2e tests can be flaky due to various tests mutating shared mutable fixtures, causing non-deterministic behavior depending on the test set, order, etc. For example, `configTLS` is mutated in at least two tests in such a way that the config is potentially invalidated for any subsequent test running in the same process (e.g. by setting the `enableV2` field). This particular example caused a substantial amount of confusion diagnosing the new test introduced for etcd-io#12370. Independent tests should not share mutable state unless deliberately. This patch refactors the e2e test config fixtures to safeguard against these problems by replacing the package variables (which cannot easily be made immutable) with functions that return new instances.
Somehow in all this we didn't realize that additional metrics endpoints (which would also serve health) can be specified without TLS info: https://github.com/openshift/cluster-etcd-operator/blob/e1dae76a52f2ae0aa31857c5d4249e78ad47c6e0/bindata/etcd/pod.yaml#L168 Assuming the user is okay with exposing the insecure |
Looks like #12370 (comment) was the answer all along. Thanks to everyone! /close |
I'm reopening this per #11993 (comment). |
@ptabor @gyuho please see #11993 (comment) — I think we still need this, otherwise it won't be possible to expose the health endpoint over HTTP without also exposing metrics insecurely, which is a constraint for us. If the approach is still agreeable I can rebase. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
@ironcladlou Could you rebase? We'd like this to be in our 3.5 release. /cc @ptabor |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
@ironcladlou Could you rebase? We need this feature,thanks |
Before this patch, Kubernetes users deploying etcd who configure mTLS
authentication for client and metrics endpoints are unable to leverage etcd's
/health
endpoint for liveness or readiness probes because Kubernetes HTTPprobes don't support mTLS. Users must work around this limitation with
strategies like weak health checks using custom
exec
probes. Bypassing the/health
endpoint for this purpose can lead to false positive probe results andquorum loss if a member is falsely determined to be ready during a rollout.
This patch introduces an
--insecure-health-endpoint
flag which, if specified,enables an insecure HTTP
/health
endpoint which is functionally equivalent toother
/health
endpoints. This enables etcd pods to specify reliable HTTP GETprobes for readiness and liveness checking.
This approach stands as an alternative to introducing a parallel TLS
configuration and listener just for the
/health
endpoint. The typical use caseis probably a non-exposed port used only by a container orchestration component
(e.g. the Kubelet).
Fixes #11993.