Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-54885] Improve resiliency of browser initialization logic #50

Merged
merged 2 commits into from
Oct 12, 2023

Conversation

streamer45
Copy link
Contributor

Summary

PR implements an additional layer of re-try logic in case of browser failures during initialization. This should cover both a failure navigating to the URL (e.g. network failure) as well as any subsequent failure to initialize the calls client (e.g. plugin JS assets fail to get downloaded).

I also ported a couple of improvements to the start/stop flow from #44 so that we can get this running without having to make breaking changes.

Ticket Link

https://mattermost.atlassian.net/browse/MM-54885

@streamer45 streamer45 added the 2: Dev Review Requires review by a core committer label Oct 11, 2023
@streamer45 streamer45 requested a review from cpoile October 11, 2023 21:58
@streamer45 streamer45 self-assigned this Oct 11, 2023
@streamer45 streamer45 added this to the v0.5.2 milestone Oct 11, 2023
Copy link
Member

@cpoile cpoile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is nice. Complicated stuff, but pretty clear once you go through it carefully. Great work!


slog.Debug("connected to call")
close(rec.readyCh)
return fmt.Errorf("stop signal received while initializing client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering when this might happen? (I mean practically)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's needed because if the browser continuously fails to start (line 169) we still want to exit properly when the global readyTimeout expires. Otherwise we'd be stuck there indefinitely.

@@ -46,3 +50,35 @@ func checkOSRequirements() error {

return nil
}

func pollBrowserEvaluateExpr(ctx context.Context, expr string, interval, timeout time.Duration, stopCh chan struct{}) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice

Comment on lines +51 to +53
if err := recorder.Stop(); err != nil {
slog.Error("failed to stop recorder", slog.String("err", err.Error()))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should surface these to the sysadmin somehow, or else they might get lost/ignored. I understand simply logging most of what's in the recorder (that's info you would go find when things go wrong), but this seems like something that might get missed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job failed to start, that's a major problem. If stopping fails it's more of a minor issue at that point I am thinking.

@streamer45 streamer45 added 3: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels Oct 12, 2023
@streamer45 streamer45 merged commit 49fb809 into master Oct 12, 2023
@streamer45 streamer45 deleted the MM-54885 branch October 12, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants