-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully shutdown of the websocket client #213
Gracefully shutdown of the websocket client #213
Conversation
46b92c2
to
b5666be
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #213 +/- ##
==========================================
- Coverage 73.80% 73.78% -0.03%
==========================================
Files 25 25
Lines 1550 1579 +29
==========================================
+ Hits 1144 1165 +21
- Misses 296 301 +5
- Partials 110 113 +3 ☔ View full report in Codecov by Sentry. |
Could you please take a look @evan-bradley @srikanthccv @tigrannajaryan |
I have watched the meeting record and I think I can add some explanation about this PR. The solution here is making When the client is shutting down, the If the server wants to close the connection, it should send the close message to initiate the close handshake. When a close message is received, the connection read methods will handle it by calling the close handler (the default handler will send a close message back to the peer. I think this is what we want.) and return a CloseError. The receiver loop will stop on any error. |
client/wsclient.go
Outdated
case <-r.IsStopped(): | ||
c.common.Logger.Debugf("shutdown handshake complete.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding websocket.IsCloseError(r.Err(), websocket.CloseNormalClosure)
as this may not always be handshake complete. Less likely but there is a chance b/w some other error could occur before the close message is received from the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the receiver loop has handled it and printed a log for it. However, the log message here is not very accurate. The "complete" here actually means that the receiver loop has also exited. 🤔
client/wsclient.go
Outdated
select { | ||
case <-c.sender.IsStopped(): | ||
// sender will send close message to initiate the close handshake | ||
if err := c.sender.Err(); err != nil && err != websocket.ErrCloseSent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this err != websocket.ErrCloseSent
to handle the case when the server initiates the close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The sender and receiver run concurrently. There is a possibility that the server and client initiate the closing handshake at the same time. In this case, the close message may be sent by CloseHandler, which is called in the receiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change sender.Err()
to return nil if the error was websocket.ErrCloseSent
? Let's consider it an internal implementation detail of the sender and avoid leaking it here. That we only need to check err
for nil here. Sender seeing a websocket.ErrCloseSent
response is a success, we don't need to consider it an error.
client/internal/wssender.go
Outdated
@@ -56,13 +66,22 @@ out: | |||
s.sendNextMessage() | |||
|
|||
case <-ctx.Done(): | |||
s.err = s.sendCloseMessage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means stopping the sender also closes the connection which might be unexpected because run should simply run until the context is cancelled but it also closes the connection. I think sending Close message should be part of client Stop
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design. Stopping the sender would always try to send the close message for the close handshake and runOneCycle
is responsible for the subsequent steps of connection closing. I updated the comment of the runOneCycle()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, One main difference from this PR #205 is that the sender initiates the close handshake which makes the implementation a little simpler but also the connection management is now done b/w wsclient and wssender. @evan-bradley what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, overall it looks good to me. I think this approach is an improvement over #205. There may additional improvements we can make in the future, but for now this will resolve the errors we are seeing and will cleanly close the connection.
client/wsclient.go
Outdated
select { | ||
case <-c.sender.IsStopped(): | ||
// sender will send close message to initiate the close handshake | ||
if err := c.sender.Err(); err != nil && err != websocket.ErrCloseSent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change sender.Err()
to return nil if the error was websocket.ErrCloseSent
? Let's consider it an internal implementation detail of the sender and avoid leaking it here. That we only need to check err
for nil here. Sender seeing a websocket.ErrCloseSent
response is a success, we don't need to consider it an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @haoqixu
Please bear with me while I am trying to understand all the concurrent flows.
// sender will send close message to initiate the close handshake | ||
if err := c.sender.Err(); err != nil { | ||
c.common.Logger.Debugf("Error stopping the sender: %v", err) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we break
here? If the sender stopping failed shouldn't we still try to stop the receiver and close the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this break
was introduced during the updates of the PR. The initial code does not wait for the receiver to stop, leaving the closing of the connection to the defer. I update the branch to close the conneciton explicitly and wait for the receiver to stop before break.
Thanks for updates @haoqixu |
I have updated the PR with 5fa47a3. Another option is to remove the |
@haoqixu this looks good to me. Can you please resolve the conflicts? |
165735a
to
fcbb3c7
Compare
By design, we need to canceling the context and closing the connection to stop the
#240 puts the reading of connections into a separate goroutine, which is unnecessary and may cause goroutine leaks. And the update here actually reverts #240 |
please take a look. @tigrannajaryan |
@srikanthccv please take another look at this PR since it reverts your earlier PR #240. |
We don't make it clear anywhere that We will need to send a final message(s) such as agent disconnect (not implemented yet) before stopping the client. I'd prefer to see |
I recall we had discussed this already once. The implementation can always evolve to satisfy the needs but We should explicitly call out what we consider as internal implementation details. |
Co-authored-by: Evan Bradley <[email protected]>
Co-authored-by: Srikanth Chekuri <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Co-authored-by: Tigran Najaryan <[email protected]>
Here is test run that shows it doesn't stop without fix https://github.com/open-telemetry/opamp-go/actions/runs/7512286887/job/20452817400 Fixes open-telemetry#239
316a4d4
to
d5c6e3a
Compare
I have updated the code. Please take another look. @srikanthccv @tigrannajaryan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, LGTM.
@tigrannajaryan are we good to merge this? I just ran into an instance of this in the opampextension |
Thank you @haoqixu |
Resolves #163