Backport of CSI: make gRPC client creation more robust into release/1.1.x #12636
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #12057 to be assessed for backporting due to the inclusion of the label backport/1.1.x.
The below text is copied from the body of the original PR.
Related to #11784. It doesn't fix it but makes debugging it more legible.
I've broken this up into mostly bite-sized commits for review.
Nomad communicates with CSI plugin tasks via gRPC. The plugin
supervisor hook uses this to ping the plugin for health checks which
it emits as task events. After the first successful health check the
plugin supervisor registers the plugin in the client's dynamic plugin
registry, which in turn creates a CSI plugin manager instance that has
its own gRPC client for fingerprinting the plugin and sending mount
requests.
If the plugin manager instance fails to connect to the plugin on its
first attempt, it exits. The plugin supervisor hook is unaware that
connection failed so long as its own pings continue to work. A
transient failure during plugin startup may mislead the plugin
supervisor hook into thinking the plugin is up (so there's no need to
restart the allocation) but no fingerprinter is started.
plugin manager instance the ability to retry the gRPC client
connection until success.
forever waiting for a plugin that will never come back up.
Minor improvements:
and then throws it away. Instead, reuse the client as we do for the
plugin manager.
timeout applies to the connection and not the rest of the client
lifetime.