fix race condition causing unreliable websocket upgrade #76

bennofs · 2018-08-31T16:49:50Z

Sending a single noop is not enough in all cases, since the client may start a
new polling request right away before doing the websocket upgrade tasks.

Before this fix the socketio javascript client could in some cases block for up
to ping_timeout seconds (not sending or receiving any messages) during upgrade
to websockets. This can be observed in the debug browser console logs from engine.io-client:

15:17:42.456 engine.io-client:polling we are currently polling - waiting to pause +1ms browser.js:138
... nothing for 60 seconds ...
15:18:42.247 The connection to ws://localhost:5000/ was interrupted while the page was loading. websocket.js:117

The fix is to periodically send noops if the queue is empty so that no polling
request blocks for a long time during upgrade.

Sending a single noop is not enough in all cases, since the client may start a new polling request right away before doing the websocket upgrade tasks. Before this fix the socketio javascript client could in some cases block for up to ping_timeout seconds (not sending or receiving any messages) during upgrade to websockets. This can be observed in the debug browser console logs from engine.io-client: ``` 15:17:42.456 engine.io-client:polling we are currently polling - waiting to pause +1ms browser.js:138 ... nothing for 60 seconds ... 15:18:42.247 The connection to ws://localhost:5000/ was interrupted while the page was loading. websocket.js:117 ``` The fix is to periodically send noops if the queue is empty so that no polling request blocks for a long time during upgrade.

yordadev

I've gotten a bunch of issue questions this week alone on slack that were caused by this.

Good fix.

bennofs · 2018-09-28T14:59:33Z

Any feedback on this?

miguelgrinberg · 2018-09-29T10:53:57Z

Hi, sorry for the delay. The idea of sending several NOOP packets is fine, that makes sense to me. Unfortunately this is a change that is very difficult to test, so I wanted to spend more time testing it manually with several clients in different languages and with the different backend frameworks.

I honestly have never seen a problem with sending a single NOOP, so the first issue I have is how to reproduce this so that I can evaluate your fix.

bennofs · 2018-09-29T20:16:43Z

Thanks for the response, makes sense! I don't have much time right now to help you with this, but in 2 to 3 weeks I will have more time. I can try to work out a detailed reproducer then.

A few notes on how I discovered this:

use javascript socketio client (with the transport not explictly set, so it starts with polling and then upgrades)
I used firefox and pressed Ctrl+r until the bug appears

In my case, the application loaded some data from the server and if the bug happens, you could observe that the initial load took ~1m (because it had to wait until the ping timeout before connection resumes). I guess you could also trigger this more easily by adding sleeps to the code base: add a sleep after we send the initial noop, and then initiate a poll on the client side in this time frame.

miguelgrinberg · 2020-03-10T23:40:18Z

A similar but simpler fix was applied in f2cce2b.

yordadev approved these changes Aug 31, 2018

View reviewed changes

miguelgrinberg self-assigned this Aug 31, 2018

miguelgrinberg added the investigate label Aug 31, 2018

miguelgrinberg force-pushed the master branch 2 times, most recently from 4576351 to b7cc97d Compare November 24, 2018 16:26

miguelgrinberg force-pushed the master branch 5 times, most recently from efbe3b0 to 5c84bbb Compare December 8, 2018 00:07

miguelgrinberg force-pushed the master branch from bb46a3d to 00b4411 Compare December 9, 2018 18:18

miguelgrinberg force-pushed the master branch from 926131c to d9c278f Compare January 5, 2019 20:10

miguelgrinberg force-pushed the master branch from cc84ac5 to d6c5e0b Compare February 15, 2019 23:55

miguelgrinberg force-pushed the master branch 5 times, most recently from 00e90f1 to b27cafb Compare May 30, 2019 19:07

salimaboubacar mentioned this pull request Jan 22, 2020

Do not hang in polling while websocket upgrade is ongoing #161

Closed

miguelgrinberg closed this Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix race condition causing unreliable websocket upgrade #76

fix race condition causing unreliable websocket upgrade #76

bennofs commented Aug 31, 2018

yordadev left a comment

bennofs commented Sep 28, 2018

miguelgrinberg commented Sep 29, 2018

bennofs commented Sep 29, 2018

miguelgrinberg commented Mar 10, 2020

fix race condition causing unreliable websocket upgrade #76

fix race condition causing unreliable websocket upgrade #76

Conversation

bennofs commented Aug 31, 2018

yordadev left a comment

Choose a reason for hiding this comment

bennofs commented Sep 28, 2018

miguelgrinberg commented Sep 29, 2018

bennofs commented Sep 29, 2018

miguelgrinberg commented Mar 10, 2020