Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-55415] Fix race condition on stop #124

Merged
merged 2 commits into from
Nov 17, 2023
Merged

[MM-55415] Fix race condition on stop #124

merged 2 commits into from
Nov 17, 2023

Conversation

streamer45
Copy link
Contributor

Summary

Fixing this on our side for expediency. This race could cause a panic when stopping the server as closing the peer connection didn't guarantee for the OnICECandidate handler to be done so we'd go ahead and close the receiveCh which could still be written into, leading to a crash.

I believe the race was introduced after the recent dependency upgrade, more specifically by pion/ice@898746c

I'll still see if I can send a patch upstream as I believe this sort of cases should be dealt with by the underlying API, essentially guaranteeing that after closing the ice.Agent no goroutines created by it would still be running.

Ticket Link

https://mattermost.atlassian.net/browse/MM-55415

@streamer45 streamer45 added the 2: Dev Review Requires review by a core committer label Nov 17, 2023
@streamer45 streamer45 requested a review from cpoile November 17, 2023 21:09
@streamer45 streamer45 self-assigned this Nov 17, 2023
@@ -252,6 +254,14 @@ func (s *Server) InitSession(cfg SessionConfig, closeCb func() error) error {
s.log.Error("failed to create ICE message", mlog.Err(err), mlog.String("sessionID", cfg.SessionID))
return
}

select {
case <-us.closeCh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're just checking if the session has been closed before writing to its receiveCh? (just restating for my own understanding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we do it under lock (us.mut), that's the key essentially. So we are guaranteeing that either closeCh has been closed already when we are inside this block and we exit early otherwise s.receiveCh must be still open and we proceed. Unfortunately I couldn't merge the two selects because the order of selection is not deterministic so we could still trigger a race.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's random on purpose. Gotcha thanks!

@streamer45
Copy link
Contributor Author

@cpoile There's another small race (see test failure), this time totally our fault (mine I should say) . Will send a fix on top of this in a minute.

@streamer45 streamer45 requested a review from cpoile November 17, 2023 21:32
Copy link
Member

@cpoile cpoile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that Go found the race.

@streamer45 streamer45 added 3: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels Nov 17, 2023
@streamer45 streamer45 merged commit 6c7d5b3 into MM-53457 Nov 17, 2023
@streamer45 streamer45 deleted the MM-55415 branch November 17, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants