-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor ssh connection live-ness #22037
Conversation
An ssh server should always send a reply packet to the keepalive request. If we miss those replies for over 2min, consider the connection dead and abort, rather than block the provisioner indefinitely.
e9c863b
to
2206512
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and cool with it, but any notes on updating the crypto dependencies?
@@ -100,16 +100,19 @@ func newMockLineServer(t *testing.T, signer ssh.Signer, pubKey string) string { | |||
|
|||
go func(in <-chan *ssh.Request) { | |||
for req := range in { | |||
// since this channel's requests are serviced serially, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment, and nicely done
The |
I agree that keeping these dependencies up-to-date is good, but I'd say let's also review the changelog for the library to see if there are any changes that would effect emergent behavior for our end-users. The SSH features in particular have been a source of minor churn in the past, I seem to recall, because the support for various SSH features in the upstream library has shifted over time. In this case, just from skimming the diff it looks like e.g. the library may now have GSSAPI (Kerberos) support, which would be great to declare in the changelog if we can verify that is so. (Getting a Kerberos-enabled sshd running to test it is likely bothersome, though. 😖 ) |
There weren't many changes:
The diffie-hellman-group-exchange-sha256 kex was primarily what I wanted to keep compatibility as broad as possible. The gssapi support might be nice to have in the future, but not something we want to try and implement right now. |
If upgrading doesn't automatically introduce any support for GSSAPI without other changes on our end then that's fine. I just worry about new features we don't expect being introduced and becoming compatibility constraints even though we didn't get a chance to design how they interact with Terraform. The intent of my previous comment was just to ask that we make sure we know about any new features this might introduce automatically, not to make any special effort to opt into new features explicitly. |
Gotcha. Yes, the only new code automatically used will be the new key exchange. The auth methods need to be explicitly configured. |
Great! In that case, it sounds like just mentioning that we now support that key exchange method in the changelog will do the job here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting an approval on this, because the comment about CHANGELOG should be done after a merge
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
An unresponsive server or dead tcp connection can hang a provisioner indefinitely.
Monitor the ssh connection status, and abort if the server becomes unresponsive for more than 120s.
Unfortunately the connection
timeout
parameter is already taken to mean "retry timeout", and is often set to an inappropriate value to use for monitoring the connection status. Rather than providing another similarly named option for users, we'll set a conservative timeout of 120s (somewhat arbitrarily chosen to match the arbitrarily chosen default TCP MSL value).Fixes #12139