Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent long delay during connection upgrade #393

Closed
james-tisato-kortical opened this issue Dec 9, 2019 · 5 comments
Closed

Intermittent long delay during connection upgrade #393

james-tisato-kortical opened this issue Dec 9, 2019 · 5 comments
Assignees

Comments

@james-tisato-kortical
Copy link

Summary
We are using python-socketio to push out regular service status updates to browser clients, which are using the socket.io JS client library. Most of the time it works fine but we are intermittently seeing a situation where there is a ~30 second delay part-way through the connection upgrade process, and no messages are received during this time. Eventually the connection is closed (although the process looks a bit messy) and a reconnect occurs, after which everything works fine again.

We can reproduce this fairly easily with a unit test that repeatedly navigates to the relevant page and away again, forcing the socket to be recreated and a connection re-established each time. We reliably run into the issue after a few iterations of this.

Versions
python-socketio: 4.3.1 (have also tried 4.4.0)
python-engineio: 3.10.0 (have also tried 3.11.0)
socket.io JS client: 2.3.0 (have also tried 2.2.0)

Code
This is a simplified version of our server code, which is running inside a Docker container within a Kubernetes pod:

from gevent import monkey, pywsgi
monkey.patch_all()

from geventwebsocket.handler import WebSocketHandler
import logging
import socketio
import time


logging.getLogger('socketio.server').setLevel(logging.INFO)
logging.getLogger('engineio.server').setLevel(logging.INFO)

sio = socketio.Server(
    logger=True,
    engineio_logger=True,
    cors_allowed_origins="*",
    async_mode='gevent'
    )
app = socketio.WSGIApp(sio)


@sio.on('connect')
def connect(sid, environ):
    logger.info(f'Client connected with session id: {sid}')
    logger.info(f'Environment is: {environ}')


@sio.on('disconnect')
def disconnect(sid):
    logger.info(f'Client disconnected from session id: {sid}')


@sio.on('join')
def join(sid, room):
    sio.enter_room(sid, room)
    logger.info(f'Client joining room {room} in session {sid}')


def generate_update_message():
    # Do some work here to generate the right status message
    # ...


def update_loop():
    while True:
        # Generate and emit an update every second
        update = generate_update_message()
        sio.emit('update', update, room='admin')
        sio.sleep(0.1)
        time.sleep(1.0)
        

def main():
    sio.start_background_task(update_loop)
    pywsgi.WSGIServer(('', 8080), app, handler_class=WebSocketHandler).serve_forever()

The relevant bit of our client code is:

function admin_vm() {
    const self = this;

    self.socket = io.connect({
        path: window.config.project_url + 'status/system/socket.io'
    });

    self.socket.on('connect', function() {
        console.log('Socket connected, joining admin room');
        self.socket.emit('join', 'admin');
    });

    self.socket.on('update', function (update) {
        update = JSON.parse(update);
        const u = update;
        console.log(
            'Update for system status left server [' + u.timestamp +
            '] arrived here [' + new Date().toString() + '] update count [' + update_count++ + ']'
            );

        // Apply the update to the UI here...
    });
}

Logs
I've captured some detailed logs of both client and server both when the connection works and when it doesn't (see attached).

bad-client.log
bad-server.log
good-client.log
good-server.log

When the issue occurs, the client log clearly shows the delay while it's probing for the availability of the websocket transport:

12:58:46.220 socket.io.js 391:131 "engine.io-client:socket probe transport \"%s\" pong +7ms"   "websocket"
12:58:46.220 socket.io.js 391:131 "engine.io-client:socket pausing current transport \"%s\" +1ms"   "polling"
12:58:46.220 socket.io.js 391:131 "engine.io-client:polling we are currently polling - waiting to pause +8ms"
12:59:11.356 socket.io.js 391:131 "engine.io-client:socket writing ping packet - expecting pong within %sms +25s"   60000
12:59:17.224 socket.io.js 391:131 "engine.io-client:polling polling got data %s +31s"   ArrayBuffer(0)
12:59:17.224 socket.io.js 391:131 "engine.io-client:polling pre-pause polling complete +2ms"
12:59:17.224 socket.io.js 391:131 "engine.io-client:polling paused +1ms"
12:59:17.224 socket.io.js 391:131 "engine.io-client:socket changing transport and sending upgrade packet +6s"
12:59:17.224 socket.io.js 391:131 "engine.io-client:socket setting transport %s +1ms"   "websocket"
12:59:17.224 socket.io.js 391:131 "engine.io-client:socket clearing existing transport %s +0ms"   "polling"
12:59:17.224 socket.io.js 391:131 "engine.io-client:polling ignoring poll - transport state \"%s\" +4ms"   "paused"
12:59:17.224 socket.io.js 391:131 "engine.io-client:socket flushing %d packets in socket +3ms"   1

On the server side, we see part of the upgrade process and then it seems to stop (while still emitting update messages)...and eventually the server gives up and closes the socket:

12:58:46,184 1 MainProcess DEBUG Attempting to upgrade connection - geventwebsocket.handler
12:58:46,185 1 MainProcess DEBUG WebSocket request accepted, switching protocols - geventwebsocket.handler
12:58:46,185 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Received request to upgrade to websocket - engineio.server
12:58:46,186 1 MainProcess DEBUG Initializing WebSocket - geventwebsocket.handler
12:58:46,187 1 MainProcess DEBUG Validating WebSocket request - geventwebsocket.handler
12:58:46,187 1 MainProcess DEBUG Can only upgrade connection if using GET method. - geventwebsocket.handler
12:58:46,187 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Received packet MESSAGE data 2["join","admin"] - engineio.server
12:58:46,187 1 MainProcess INFO received event "join" from 6294b24868144273b4a9bceaf0e439f9 [/] - socketio.server
12:58:46,188 1 MainProcess INFO ::ffff:10.0.9.139 - - [12:58:46] "POST /socket.io/?EIO=3&transport=polling&t=MxglQ0a&sid=6294b24868144273b4a9bceaf0e439f9 HTTP/1.1" 200 208 0.001325 - geventwebsocket.handler
12:58:46,188 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9 is entering room admin [/] - socketio.server
12:58:46,188 1 MainProcess INFO Client joining room admin in session 6294b24868144273b4a9bceaf0e439f9 - admin.status
12:58:46,190 1 MainProcess DEBUG Initializing WebSocket - geventwebsocket.handler
12:58:46,190 1 MainProcess DEBUG Validating WebSocket request - geventwebsocket.handler
12:58:46,199 1 MainProcess INFO ::ffff:10.0.9.139 - - [12:58:46] "GET /socket.io/?EIO=3&transport=polling&t=MxglQ0c&sid=6294b24868144273b4a9bceaf0e439f9 HTTP/1.1" 200 159 0.009237 - geventwebsocket.handler
12:58:46,215 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
12:58:46,217 1 MainProcess DEBUG Initializing WebSocket - geventwebsocket.handler
12:58:46,217 1 MainProcess DEBUG Validating WebSocket request - geventwebsocket.handler
12:58:47,400 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
12:58:48,576 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
...
12:59:14,619 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
12:59:15,841 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
12:59:17,045 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
12:59:17,046 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Client is gone, closing socket - engineio.server
12:59:17,046 1 MainProcess INFO Client disconnected from session id: 6294b24868144273b4a9bceaf0e439f9 - admin.status
12:59:17,046 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Client is gone, closing socket - engineio.server
12:59:17,047 1 MainProcess INFO ::ffff:10.0.9.139 - - [12:59:17] "GET /socket.io/?EIO=3&transport=polling&t=MxglQ14&sid=6294b24868144273b4a9bceaf0e439f9 HTTP/1.1" 200 155 30.829922 - geventwebsocket.handler
12:59:17,064 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Upgrade to websocket successful - engineio.server
12:59:17,065 1 MainProcess INFO 6294b24868144273b4a9bceaf0e439f9: Received packet PING data None - engineio.server
12:59:17,065 1 MainProcess INFO Receive error -- socket is closed - engineio.server
12:59:17,068 1 MainProcess DEBUG Closed WebSocket - geventwebsocket.handler
12:59:17,069 1 MainProcess DEBUG Failed to write closing frame -> closing socket - geventwebsocket.handler
12:59:17,069 1 MainProcess DEBUG Closed WebSocket - geventwebsocket.handler

As you can see in the full log, a re-connection follows and it works, but we really want to eliminate this 30 second delay as it leads to a bad user experience.

Workaround
As a test, we tried using the websocket transport directly instead of starting with long polling (as described at https://socket.io/docs/client-api/#With-websocket-transport-only):

    self.socket = io.connect({
        path: window.config.project_url + 'status/system/socket.io',
        transports: ['websocket']       // Default to websocket transport first, only falling back to long-polling on connection failure
    });

    self.socket.on('reconnect_attempt', () => {
        // On reconnection, reset the transports option, as the Websocket connection may
        // have failed (caused by proxy, firewall, browser, ...)
        self.socket.io.opts.transports = ['polling', 'websocket'];
    });

This seems to solve the problem - our test case can go through hundreds of iterations without any problem.

@miguelgrinberg
Copy link
Owner

Out of curiosity, can you retest this after adding a short delay before you send the join event? I'm wondering if there is some sort of race condition there where the event that you send right away in the connection handler interferes with the upgrade. Maybe try something like this:

    setTimeout(function() { self.socket.emit('join', 'admin'); }, 3000);

@james-tisato-kortical
Copy link
Author

Thanks for the very fast response!

I have tried your suggestion and unfortunately the problem still occurs. I've attached new client and server logs with this in place - see below.

bad-with-join-delay-client.log
bad-with-join-delay-server.log

Strangely, the server doesn't even seem to acknowledge the room join request sent at 16:03:47:

Client

16:03:44.239 socket.io.2.3.0.js 391:131 "socket.io-client:socket transport is open - connecting +0ms"
16:03:44.240 socket.io.2.3.0.js 391:131 "engine.io-client:socket starting upgrade probes +3ms"
16:03:44.240 socket.io.2.3.0.js 391:131 "engine.io-client:socket probing transport \"%s\" +0ms"   "websocket"
16:03:44.240 socket.io.2.3.0.js 391:131 "engine.io-client:socket creating transport \"%s\" +1ms"   "websocket"
16:03:44.240 socket.io.2.3.0.js 391:131 "engine.io-client:socket socket receive: type \"%s\", data \"%s\" +1ms"   "message" "0"
16:03:44.241 admin.js 50:16 "Socket connected, joining admin room in 3 seconds"
16:03:44.241 socket.io.2.3.0.js 391:131 "engine.io-client:polling polling +8ms"
16:03:44.241 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr poll +36ms"
16:03:44.242 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr open %s: %s +0ms"   "GET" "https://.../.../socket.io/?EIO=3&transport=polling&t=MxhPlVm&sid=120fec90214a4c31aee27d2d7f45fe56"
16:03:44.243 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr data %s +2ms"   null
16:03:44.253 socket.io.2.3.0.js 391:131 "engine.io-client:socket probe transport \"%s\" opened +9ms"   "websocket"
16:03:44.255 socket.io.2.3.0.js 391:131 "engine.io-client:polling polling got data %s +14ms"   ArrayBuffer(4)
16:03:44.257 socket.io.2.3.0.js 391:131 "engine.io-client:socket socket receive: type \"%s\", data \"%s\" +8ms"   "noop" undefined
16:03:44.257 socket.io.2.3.0.js 391:131 "engine.io-client:polling polling +3ms"
16:03:44.258 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr poll +16ms"
16:03:44.259 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr open %s: %s +0ms"   "GET" "https://.../.../socket.io/?EIO=3&transport=polling&t=MxhPlW2&sid=120fec90214a4c31aee27d2d7f45fe56"
16:03:44.260 socket.io.2.3.0.js 391:131 "engine.io-client:polling-xhr xhr data %s +1ms"   null
16:03:44.263 socket.io.2.3.0.js 391:131 "engine.io-client:socket probe transport \"%s\" pong +6ms"   "websocket"
16:03:44.264 socket.io.2.3.0.js 391:131 "engine.io-client:socket pausing current transport \"%s\" +1ms"   "polling"
16:03:44.265 socket.io.2.3.0.js 391:131 "engine.io-client:polling we are currently polling - waiting to pause +7ms"
16:03:47.395 admin.js 52:20 "Joining admin room"
16:03:47.395 socket.io.2.3.0.js 391:131 "socket.io-client:manager writing packet {\"type\":2,\"data\":[\"join\",\"admin\"],\"options\":{\"compress\":true},\"nsp\":\"/\"} +3s"
16:03:47.395 socket.io.2.3.0.js 1646:132 "socket.io-parser encoding packet {\"type\":2,\"data\":[\"join\",\"admin\"],\"options\":{\"compress\":true},\"nsp\":\"/\"} +0ms"
16:03:47.395 socket.io.2.3.0.js 1646:132 "socket.io-parser encoded {\"type\":2,\"data\":[\"join\",\"admin\"],\"options\":{\"compress\":true},\"nsp\":\"/\"} as %s +1ms"   "2[\"join\",\"admin\"]"
16:04:09.331 socket.io.2.3.0.js 391:131 "engine.io-client:socket writing ping packet - expecting pong within %sms +25s"   60000
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:polling polling got data %s +55s"   ArrayBuffer(0)
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:polling pre-pause polling complete +1ms"
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:polling paused +1ms"
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:socket changing transport and sending upgrade packet +30s"
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:socket setting transport %s +1ms"   "websocket"
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:socket clearing existing transport %s +1ms"   "polling"
16:04:38.962 socket.io.2.3.0.js 391:131 "engine.io-client:polling ignoring poll - transport state \"%s\" +4ms"   "paused"
16:04:38.963 socket.io.2.3.0.js 391:131 "engine.io-client:socket flushing %d packets in socket +2ms"   2
16:04:38.963 socket.io.2.3.0.js 391:131 "engine.io-client:socket socket close with reason: \"%s\" +6ms"   "transport close"
16:04:38.963 socket.io.2.3.0.js 391:131 "socket.io-client:manager onclose +52s"
16:04:38.963 socket.io.2.3.0.js 391:131 "socket.io-client:manager cleanup +1ms"
16:04:38.963 socket.io.2.3.0.js 391:131 "socket.io-client:socket close (%s) +55s"   "transport close"
16:04:38.963 admin.js 58:16 "Socket disconnected because: transport close"

Server

16:03:44,244 1 MainProcess DEBUG Attempting to upgrade connection - geventwebsocket.handler
16:03:44,244 1 MainProcess DEBUG WebSocket request accepted, switching protocols - geventwebsocket.handler
16:03:44,245 1 MainProcess INFO 120fec90214a4c31aee27d2d7f45fe56: Received request to upgrade to websocket - engineio.server
16:03:44,246 1 MainProcess DEBUG Initializing WebSocket - geventwebsocket.handler
16:03:44,246 1 MainProcess DEBUG Validating WebSocket request - geventwebsocket.handler
16:03:44,250 1 MainProcess INFO ::ffff:10.0.9.139 - - [16:03:44] "GET /socket.io/?EIO=3&transport=polling&t=MxhPlVm&sid=120fec90214a4c31aee27d2d7f45fe56 HTTP/1.1" 200 159 0.004213 - geventwebsocket.handler
16:03:44,264 1 MainProcess DEBUG Initializing WebSocket - geventwebsocket.handler
16:03:44,264 1 MainProcess DEBUG Validating WebSocket request - geventwebsocket.handler
16:03:44,325 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:45,503 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:46,682 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:47,909 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:49,100 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:50,282 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:51,465 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:52,657 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:03:53,845 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server

Furthermore, after we've managed to recover, the server repeatedly logs that it can't send future messages to the session that was already closed:

16:04:40,287 1 MainProcess DEBUG Attempting to upgrade connection - geventwebsocket.handler
16:04:40,287 1 MainProcess DEBUG WebSocket request accepted, switching protocols - geventwebsocket.handler
16:04:40,287 1 MainProcess INFO 0b457d5d9082446295756362a450ce45: Received request to upgrade to websocket - engineio.server
16:04:40,297 1 MainProcess INFO ::ffff:10.0.9.139 - - [16:04:40] "GET /socket.io/?EIO=3&transport=polling&t=MxhPzBI&sid=0b457d5d9082446295756362a450ce45 HTTP/1.1" 200 159 0.016742 - geventwebsocket.handler
16:04:40,344 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:40,345 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server
16:04:40,350 1 MainProcess INFO 0b457d5d9082446295756362a450ce45: Upgrade to websocket successful - engineio.server
16:04:41,527 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:41,527 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server
16:04:42,708 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:42,708 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server
16:04:43,280 1 MainProcess INFO 0b457d5d9082446295756362a450ce45: Received packet MESSAGE data 2["join","admin"] - engineio.server
16:04:43,281 1 MainProcess INFO received event "join" from 0b457d5d9082446295756362a450ce45 [/] - socketio.server
16:04:43,281 1 MainProcess INFO 0b457d5d9082446295756362a450ce45 is entering room admin [/] - socketio.server
16:04:43,281 1 MainProcess INFO Client joining room admin in session 0b457d5d9082446295756362a450ce45 - admin.status
16:04:43,891 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:43,891 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server
16:04:45,072 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:45,072 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server
16:04:46,243 1 MainProcess INFO emitting event "update" to admin [/] - socketio.server
16:04:46,243 1 MainProcess WARNING Cannot send to sid 120fec90214a4c31aee27d2d7f45fe56 - engineio.server

@miguelgrinberg
Copy link
Owner

Unfortunately the logs do not give me any clues. For some reason the client appears to be waiting for something during those 30 seconds, and I don't know what that is. If you think you can build a small example app that can reproduce this issue that would be useful. I'm going to try to do this myself too, based on the code you posted.

@salimaboubacar
Copy link
Contributor

Seems related to this => miguelgrinberg/python-engineio#160 (comment) (I think this logic is missing in python-engineio from the javascript version: https://github.com/socketio/engine.io/blob/cb0ac6fddcad12c454651bf0e1a312a154e228a4/lib/transports/polling.js#L112-L117)

@miguelgrinberg
Copy link
Owner

miguelgrinberg commented Mar 10, 2020

I believe this is addressed by miguelgrinberg/python-engineio@f2cce2b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants