Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race condition causing unreliable websocket upgrade #76

Closed
wants to merge 1 commit into from

Conversation

bennofs
Copy link

@bennofs bennofs commented Aug 31, 2018

Sending a single noop is not enough in all cases, since the client may start a
new polling request right away before doing the websocket upgrade tasks.

Before this fix the socketio javascript client could in some cases block for up
to ping_timeout seconds (not sending or receiving any messages) during upgrade
to websockets. This can be observed in the debug browser console logs from engine.io-client:

15:17:42.456 engine.io-client:polling we are currently polling - waiting to pause +1ms browser.js:138
... nothing for 60 seconds ...
15:18:42.247 The connection to ws://localhost:5000/ was interrupted while the page was loading. websocket.js:117

The fix is to periodically send noops if the queue is empty so that no polling
request blocks for a long time during upgrade.

Sending a single noop is not enough in all cases, since the client may start a
new polling request right away before doing the websocket upgrade tasks.

Before this fix the socketio javascript client could in some cases block for up
to ping_timeout seconds (not sending or receiving any messages) during upgrade
to websockets. This can be observed in the debug browser console logs from engine.io-client:

```
15:17:42.456 engine.io-client:polling we are currently polling - waiting to pause +1ms browser.js:138
... nothing for 60 seconds ...
15:18:42.247 The connection to ws://localhost:5000/ was interrupted while the page was loading. websocket.js:117
```

The fix is to periodically send noops if the queue is empty so that no polling
request blocks for a long time during upgrade.
Copy link

@yordadev yordadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten a bunch of issue questions this week alone on slack that were caused by this.

Good fix.

@bennofs
Copy link
Author

bennofs commented Sep 28, 2018

Any feedback on this?

@miguelgrinberg
Copy link
Owner

Hi, sorry for the delay. The idea of sending several NOOP packets is fine, that makes sense to me. Unfortunately this is a change that is very difficult to test, so I wanted to spend more time testing it manually with several clients in different languages and with the different backend frameworks.

I honestly have never seen a problem with sending a single NOOP, so the first issue I have is how to reproduce this so that I can evaluate your fix.

@bennofs
Copy link
Author

bennofs commented Sep 29, 2018

Thanks for the response, makes sense! I don't have much time right now to help you with this, but in 2 to 3 weeks I will have more time. I can try to work out a detailed reproducer then.

A few notes on how I discovered this:

  • use javascript socketio client (with the transport not explictly set, so it starts with polling and then upgrades)
  • I used firefox and pressed Ctrl+r until the bug appears

In my case, the application loaded some data from the server and if the bug happens, you could observe that the initial load took ~1m (because it had to wait until the ping timeout before connection resumes). I guess you could also trigger this more easily by adding sleeps to the code base: add a sleep after we send the initial noop, and then initiate a poll on the client side in this time frame.

@miguelgrinberg
Copy link
Owner

A similar but simpler fix was applied in f2cce2b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants