-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-53994] Fix potential stuck session due to ws client leak #492
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #492 +/- ##
========================================
- Coverage 5.62% 5.60% -0.03%
========================================
Files 21 21
Lines 4178 4193 +15
========================================
Hits 235 235
- Misses 3926 3941 +15
Partials 17 17
☔ View full report in Codecov by Sentry. |
// If we can't find one, we resort to looping through all the sessions to | ||
// check against the originalConnID field. This would be necessary only if | ||
// the session reconnected throughout the call with a new ws connection ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. Good comments.
this.removeAllListeners('message'); | ||
} else { | ||
this.emit('close', code); | ||
close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we would we have the same problem on mobile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check that, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent mattermost/mattermost-mobile#7513, great catch :)
Summary
This was a fun one. A session was stuck in the call state after the client explicitly left the call and the RTC connection was closed, only to reconnect on the ws channel every now and then hours after the fact.
They key issue here was that we were improperly using the state of the WebSocket object field (
callsClient.ws.ws
) as a way to differentiate between an explicit disconnect (e.g. user leaving call) vs an event triggered disconnect (e.g. network issue). Unfortunately the logic didn't account for the potential edge case of an explicit disconnect happening right after a network disconnect.The sequence of events was roughly the following:
a.
this.ws
is set tonull
in thews.onclose
handler.b.
close()
gets called and initializes the reconnect handler.a.
callsClient.disconnect()
gets called.b.
callsClient.ws.close()
gets called.c.
callsClients.ws.ws
isnull
due to step 1a so we try to reconnect again instead of terminating the ws connection.callsClient
object is cleared from window but essentially leaked given its ws client was never properly closed and will endlessly try to reconnect on any future disconnect.Note: I am aware the above sequence was likely more useful to me, happy to explain anything unclear.
To fix the above we move the reconnect logic into its own method and solely rely on
callsClient.ws.closed
to decide whether we should reconnect or not.In addition to the above, we also fix a related issue on the server side which would have prevented the stuck session (not the client side leak though). The problem here was that a reconnected session may have have changed its websocket connection ID which would then differ from the original session ID used to track the RTC session.
Big thanks to @wiggin77 for providing client logs which turned out to be key into figure out what was going on.
Ticket Link
https://mattermost.atlassian.net/browse/MM-53994