This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
Add fix for re-registration when Consul K8s rolling restarts agent nodes #227
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes proposed in this PR:
Currently K8s makes agent nodes ephemeral, when an agent node is upgraded it persists no local state because we don't mount any sort of persistent storage to the agent node containers. This means that we need to properly re-register our services when the agents restart. We currently have some re-registration logic within our registration code when it's detected that a service is non-existent, but it looks like our error casting code was broken due to having an extra pointer.
Unfortunately, this code path is quite tricky to hit without standing up an additional Consul cluster, and if we want to actually test the reproduction case, it'd require orchestrating an agent node flapping while dropping all of it's state. So, I figure manual testing might be better -- we have a mock Consul server that's able to unit test all the other code paths already.
How I've tested this PR:
In the process of ensuring that the deployments get re-registered when a rolling restart happens by performing one manually.Validated via the learn tutorial with a custom image build and triggering a restart of the consul agent nodes. Here are the logs from the deployment (notice the
successfully registered agent service
message third from the bottom):Checklist: