-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: use a blocking initial connection with timeout #7965
Conversation
The plugin supervisor lazily connects to plugins, but this means we only get "Unavailable" back from the gRPC call in cases where the plugin can never be reached (for example, if the Nomad client has the wrong permissions for the socket). This changeset improves the operator experience by switching to a blocking `DialWithContext`. It eagerly connects so that we can validate the connection is real and get a "failed to open" error in case where Nomad can't establish the initial connection.
95daea0
to
eb2f77a
Compare
if err != nil { | ||
return false, fmt.Errorf("failed to create csi client: %v", err) | ||
} | ||
defer client.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a panic that can't happen in 0.11.2 or earlier because we never actually connect in this block, so the error condition never happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, I thought about it a little and talked myself out of wanting a way to change the timeout. It might be interesting to have a couple of coarse grained tuning parameters eventually where we'd have or derive an out-of-dc-connection-timeout
. Being open source is a type of allowing tuning of something most folks won't need to tune, though 😀
Yeah, the timeout only impacts the initial connection and we retry it on the next fingerprint anyways, so I figured it'd be safe to have it be short. |
Cherry-picked to 0.11.3 branch as d31bb8b |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
For #7424 (1 of probably many) and specifically the case described in #7931 (comment)
The plugin supervisor lazily connects to plugins, but this means we only get "Unavailable" back from the gRPC call in cases where the plugin can never be reached (for example, if the Nomad client has the wrong permissions for the socket).
This changeset improves the operator experience by switching to a blocking
DialWithContext
. It eagerly connects so that we can validate the connection is real and get a "failed to open" error in case where Nomad can't establish the initial connection.The new error messages in a scenario where Nomad has the wrong permissions will look like the following (repeated each fingerprint attempt):