controller indicates ready only after verifying proxy configured work #1720

rainest · 2021-08-17T20:42:12Z

What this PR does / why we need it:
Only mark the controller ready after it can assume that the proxy has configuration available, either immediately if the proxy has a database or after the first config push if the proxy has no database.

Replaces the no-op Ping function with a dynamically-scoped ConfigReady() function. ConfigReady() returns a nil error if the previously instantiated Proxy's IsReady() function returns true or returns a "proxy not yet configured" error if IsReady() returns false

IsReady() returns true if:

proxy.dbmode == "off" and len(proxy.lastConfigSHA) > 0 (i.e. we have applied config at least once).
proxy.dbmode is any other value (we assume config is probably already available in the DB). This doesn't handle the initial start edge case where the database is empty, but is usually good enough. Handling the edge case requires some complex nonsense with figuring out if the cluster controller leader has applied configuration (since non-leaders never apply config themselves).

It returns false otherwise.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #1504

Special notes for your reviewer:
I haven't the foggiest how you test this consistently and meaningfully.

Unit tests seem not particularly useful since we'd need to stuff a fake value into lastConfigSHA rather than allowing it to populate naturally. Effectively all we confirm is that string comparisons and slice length checks work as expected.

Integration tests are complicated by many factors:

Empty configuration is valid and will populate lastConfigSHA. The controller will sync even before you've created any Ingresses and such. We can only make it not sync by either not starting the proxy (more or less what we had before, but not easily doable in our test environment because we don't get to let Kubernetes restart the container if the initial connection attempt fails) or by creating resources that result in invalid configuration before anything else applies (not sure how to do this with the test parallelization), confirm not ready, rectify the error, and confirm ready.
We don't have any way to observe the proxy internals even if we can control the expected config state with initially intentionally broken config. I think we either scrape logs and check for the success message (not sure if we can do this), look at the config diagnostic failed/successful outputs (reasonable enough, I guess), or look at the proxy admin API (can't distinguish between successful empty config and unsuccessful config).

A manual test of the image does at least confirm that the controller becomes ready, but even in manual runs I don't know how I can confirm its behavior that it doesn't become ready when it shouldn't.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

Only mark the controller ready after it can assume that the proxy has configuration available, either immediately if the proxy has a database or after the first config push if the proxy has no database.

github-actions · 2021-08-17T20:43:46Z

Licenses differ between commit 2609005350500696c825a5647805f0f482b39a57 and base:

+++ pr_licenses.csv	2021-08-17 20:43:46.326653223 +0000
@@ -1,5 +1,4 @@
-cloud.google.com/go/compute/metadata,Unknown,Apache-2.0
-cloud.google.com/go/container/apiv1,Unknown,Apache-2.0
+cloud.google.com/go,Unknown,Apache-2.0
 github.com/Masterminds/goutils,https://github.com/Masterminds/goutils/blob/master/LICENSE.txt,Apache-2.0
 github.com/Masterminds/semver,https://github.com/Masterminds/semver/blob/master/LICENSE.txt,MIT
 github.com/Masterminds/sprig,https://github.com/Masterminds/sprig/blob/master/LICENSE.txt,MIT```

internal/proxy/clientgo_cached_proxy_resolver.go

internal/manager/run.go

CHANGELOG.md

ccfishk

function comments

mflendrich

Here is my idea for testing this bit of functionality:

Implement a unit test with a mock proxy, and then:
- after initialization, ensure that ready == false
- report readiness in the mock
- ensure that ready == true
modify the existing integration test harness to block until the proxy reports readiness. This will check only the "positive" half of the readiness check, though, but the negative case should be sufficiently covered by the unit test.

CHANGELOG.md

internal/manager/run.go

internal/proxy/clientgo_cached_proxy_resolver.go

Several tags were incorrect in the changelog footer links. Align tags with their actual names and the appropriate comparison versions.

rainest · 2021-08-19T18:40:24Z

Tests more or less follow #1720 (review). It doesn't actually block testing on readiness, but does confirm that readiness is eventually good, which is roughly equivalent. Most of the other tests imply readiness because they fail if config didn't post, so it doesn't make much sense to block them.

I'm still a bit iffy on that strategy since IMO the negative case in integration is what matters for verifying the feature. That seems like it'd require some weird refactoring where we pull the go proxy.startProxyUpdateServer() out of NewCacheBasedProxyWithStagger() into its own Proxy.Start() function, call that after mgr.Start(ctx) in manager.Run() and also somehow pause between those calls when running the negative test ☹️

internal/proxy/clientgo_cached_proxy_resolver_test.go

internal/proxy/proxy.go

ccfishk

If I understand correctly, just typo.

mflendrich

Continuing the discussion from https://github.com/Kong/kubernetes-ingress-controller/pull/1720/files#discussion_r691494074 here because apparently GH doesn't allow responding to that thread anymore.

What I meant is that we could just define the func ConfigReady(_ *http.Request) error function as a plain old top-level function in the global scope of the file, and then use ConfigReady as an argument to mgr.AddReadyzCheck. ConfigReady won't work directly as a value that will be accepted as an argument thereto (because of a type mismatch), so it'd need a type cast: healthz.Checker(ConfigReady).

internal/proxy/clientgo_cached_proxy_resolver.go

internal/proxy/clientgo_cached_proxy_resolver_test.go

- Add a dedicated value for having applied config at least once - Add a lock for the config applied flag - Use dedicate True/False test assertions

internal/proxy/clientgo_cached_proxy_resolver.go

ccfishk

minor semantics

Co-authored-by: Jimin <[email protected]>

rainest · 2021-08-20T18:33:03Z

#1720 (comment) might work to go to the inline discussion? That one still shows replies available to me, but yeah, the GH UI has gotten confused.

As I read the last comment though, it looks like that suggestion was about using the var to make it a Checker, not about doing it to make the dynamic-scoped function, so without further comment on the new inline version, I'm assuming that's resolved and handles both concerns (using the proxy instance and not allowing variable substitution) and have marked the thread such.

ccfishk

👍 💯

internal/proxy/clientgo_cached_proxy_resolver.go

Co-authored-by: Michał Flendrich <[email protected]>

ccfishk

👍

feat(*) implement config-based readiness

0c99ad5

Only mark the controller ready after it can assume that the proxy has configuration available, either immediately if the proxy has a database or after the first config push if the proxy has no database.

github-actions bot added the ci/license/changed label Aug 17, 2021

docs(*) update changelog

0b30746

rainest mentioned this pull request Aug 17, 2021

Make pod ready only after the first successful sync to Kong #1504

Closed

rainest temporarily deployed to Configure ci August 17, 2021 20:54 Inactive

ccfishk reviewed Aug 17, 2021

View reviewed changes

internal/proxy/clientgo_cached_proxy_resolver.go Show resolved Hide resolved

ccfishk reviewed Aug 17, 2021

View reviewed changes

internal/manager/run.go Outdated Show resolved Hide resolved

ccfishk reviewed Aug 17, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

ccfishk suggested changes Aug 17, 2021

View reviewed changes

mflendrich reviewed Aug 18, 2021

View reviewed changes

fix(doc) correct tags in changelog

3daeda7

Several tags were incorrect in the changelog footer links. Align tags with their actual names and the appropriate comparison versions.

rainest temporarily deployed to Configure ci August 18, 2021 18:26 Inactive

Travis Raines added 4 commits August 18, 2021 15:13

refactor(mgr) inline ready function

70d5256

pr(mgr) invert if

b92ecd3

test(proxy) test IsReady

0c25e45

Merge remote-tracking branch 'origin/main' into feat/ready-config

a691fc2

rainest temporarily deployed to Configure ci August 19, 2021 18:34 Inactive

rainest marked this pull request as ready for review August 19, 2021 18:40

rainest requested a review from a team as a code owner August 19, 2021 18:40

rainest requested review from mflendrich and ccfishk August 19, 2021 18:40

ccfishk reviewed Aug 19, 2021

View reviewed changes

internal/proxy/clientgo_cached_proxy_resolver_test.go Outdated Show resolved Hide resolved

ccfishk reviewed Aug 19, 2021

View reviewed changes

internal/proxy/proxy.go Show resolved Hide resolved

ccfishk reviewed Aug 19, 2021

View reviewed changes

ccfishk changed the title ~~Only become ready after proxy configured~~ controller indicates ready only after verifying proxy configured work Aug 19, 2021

mflendrich reviewed Aug 20, 2021

View reviewed changes

internal/proxy/clientgo_cached_proxy_resolver.go Outdated Show resolved Hide resolved

internal/proxy/clientgo_cached_proxy_resolver_test.go Outdated Show resolved Hide resolved

pr(proxy) address review comments

93387e8

- Add a dedicated value for having applied config at least once - Add a lock for the config applied flag - Use dedicate True/False test assertions

ccfishk reviewed Aug 20, 2021

View reviewed changes

internal/proxy/clientgo_cached_proxy_resolver.go Outdated Show resolved Hide resolved

ccfishk suggested changes Aug 20, 2021

View reviewed changes

Travis Raines and others added 2 commits August 20, 2021 11:29

Update internal/proxy/clientgo_cached_proxy_resolver.go

3654112

Co-authored-by: Jimin <[email protected]>

Merge branch 'main' into feat/ready-config

416c34a

Travis Raines added 2 commits August 20, 2021 11:33

fix(proxy) correct defer/return order

b5f7d2b

fix(test) update test with markConfigApplied

3658d78

rainest temporarily deployed to Configure ci August 20, 2021 18:55 Inactive

rainest requested review from mflendrich and ccfishk August 20, 2021 19:13

ccfishk previously approved these changes Aug 20, 2021

View reviewed changes

mflendrich previously approved these changes Aug 26, 2021

View reviewed changes

internal/proxy/clientgo_cached_proxy_resolver.go Outdated Show resolved Hide resolved

rainest dismissed stale reviews from mflendrich and ccfishk via ea82563 August 26, 2021 18:40

Travis Raines and others added 2 commits August 26, 2021 11:40

Update internal/proxy/clientgo_cached_proxy_resolver.go

ea82563

Co-authored-by: Michał Flendrich <[email protected]>

Merge branch 'main' into feat/ready-config

29bc586

rainest enabled auto-merge (squash) August 26, 2021 18:42

ccfishk self-requested a review August 26, 2021 18:45

ccfishk approved these changes Aug 26, 2021

View reviewed changes

rainest merged commit b0ff612 into main Aug 26, 2021

rainest deleted the feat/ready-config branch August 26, 2021 18:49

rainest temporarily deployed to Configure ci August 26, 2021 18:49 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

controller indicates ready only after verifying proxy configured work #1720

controller indicates ready only after verifying proxy configured work #1720

rainest commented Aug 17, 2021

github-actions bot commented Aug 17, 2021

ccfishk left a comment

mflendrich left a comment

rainest commented Aug 19, 2021

ccfishk left a comment

mflendrich left a comment

ccfishk left a comment

rainest commented Aug 20, 2021

ccfishk left a comment

ccfishk left a comment

controller indicates ready only after verifying proxy configured work #1720

controller indicates ready only after verifying proxy configured work #1720

Conversation

rainest commented Aug 17, 2021

github-actions bot commented Aug 17, 2021

ccfishk left a comment

Choose a reason for hiding this comment

mflendrich left a comment

Choose a reason for hiding this comment

rainest commented Aug 19, 2021

ccfishk left a comment

Choose a reason for hiding this comment

mflendrich left a comment

Choose a reason for hiding this comment

ccfishk left a comment

Choose a reason for hiding this comment

rainest commented Aug 20, 2021

ccfishk left a comment

Choose a reason for hiding this comment

ccfishk left a comment

Choose a reason for hiding this comment