Socketio client isn't notified of disconnect due to _reset() call in client.py #254

jgillula · 2021-10-15T19:54:10Z

I'm running a python-socketio server in the cloud using Google App Engine, but whenever I restart my instance, my python-socketio client never realizes that it's been disconnected even though it receives a 400 error from the server. Instead, the python-socketio client just sits there (or exits if it's currently blocking on a call to wait()).

The expected behavior is that if reconnection is set to true, the client should detect a 400 error and attempt to reconnect.

I've tracked down the bug to line 694 in client.py in the engine.io library--specifically the call to self._reset() when the server returns an HTTP error code. Here's how it works:

Server returns a HTTP error code in _write_loop() when using polling mode, and then calls _reset() on line 694.
In the call to _reset(), the engineio client attribute state is set to "disconnected"
The _write_loop() breaks, and shortly thereafter, the _read_loop() also gets a 400 error and breaks.
Since the call to _reset() already set the state to disconnected, the _read_loop() code starting at line 597 to call the "disconnect" callback is never run before the _read_loop() function returns
Since the "disconnect" callback is never run, _handle_eio_disconnect() in socketio client.py is never called, so the reconnection logic never gets triggered.

Below are the smallest examples of a client and server I could write that reproduce the error:

client.py:

#!/usr/bin/env python3
import socketio, logging

logging.basicConfig(format="%(name)s:%(filename)s[%(lineno)d]: %(message)s", level=logging.DEBUG)
client = socketio.Client(logger=logging.getLogger("socketio.server"), engineio_logger=logging.getLogger("engineio.server"))

client.connect("http://127.0.0.1:5000")
client.wait()

server.py:

#!/usr/bin/env python3
import socketio, logging, threading, time
from flask import Flask    

app = Flask(__name__)
logging.basicConfig(format="%(name)s:%(filename)s[%(lineno)d]: %(message)s", level=logging.DEBUG)
sio = socketio.Server(async_mode="threading", logger=logging.getLogger("socketio.server"), engineio_logger=logging.getLogger("engineio.server"))

@sio.event
def connect(sid, environ, auth):
    threading.Thread(target=delete_sid).start()
    logging.debug("Connection established, simulating drop in 15 seconds")
    return True

# This simulates a scenario where the TCP connection persists, but for some reason a new instance of the server is responding to requests    
def delete_sid():
    time.sleep(15)
    logging.debug("Simulating cloud instance restart by deleting all sids")
    keys = list(sio.eio.sockets.keys())
    for key in keys:
        del sio.eio.sockets[key]

app.wsgi_app = socketio.WSGIApp(sio, app.wsgi_app)
app.run(port=5000)

And the logs:
client.py:

> ./client.py 
engineio.server:client.py[278]: Attempting polling connection to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4
urllib3.connectionpool:connectionpool.py[231]: Starting new HTTP connection (1): 127.0.0.1:5000
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "GET /socket.io/?transport=polling&EIO=4&t=1634327435.0804608 HTTP/1.1" 200 None
engineio.server:client.py[305]: Polling connection accepted with {'sid': 'ynXtoVOOcbfmGxzIAAAA', 'upgrades': [], 'pingTimeout': 20000, 'pingInterval': 25000}
socketio.server:client.py[665]: Engine.IO connection established
engineio.server:client.py[513]: Sending packet MESSAGE data 0
engineio.server:client.py[571]: Sending polling GET request to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA
urllib3.connectionpool:connectionpool.py[272]: Resetting dropped connection: 127.0.0.1
urllib3.connectionpool:connectionpool.py[231]: Starting new HTTP connection (2): 127.0.0.1:5000
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327435.0996375 HTTP/1.1" 200 None
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "POST /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA HTTP/1.1" 200 None
engineio.server:client.py[492]: Received packet MESSAGE data 0{"sid":"IKqf6bqKqckrsNirAAAB"}
socketio.server:client.py[547]: Namespace / is connected
engineio.server:client.py[571]: Sending polling GET request to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA
urllib3.connectionpool:connectionpool.py[272]: Resetting dropped connection: 127.0.0.1
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327435.1163623 HTTP/1.1" 200 None
engineio.server:client.py[492]: Received packet PING data 
engineio.server:client.py[513]: Sending packet PONG data 
engineio.server:client.py[571]: Sending polling GET request to http://127.0.0.1:5000/socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA
urllib3.connectionpool:connectionpool.py[272]: Resetting dropped connection: 127.0.0.1
urllib3.connectionpool:connectionpool.py[272]: Resetting dropped connection: 127.0.0.1
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "POST /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA HTTP/1.1" 400 None
engineio.server:client.py[693]: Unexpected status code 400 in server response, aborting
engineio.server:client.py[711]: Exiting write loop task
urllib3.connectionpool:connectionpool.py[461]: http://127.0.0.1:5000 "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327460.1164758 HTTP/1.1" 400 None
engineio.server:client.py[582]: Unexpected status code 400 in server response, aborting
engineio.server:client.py[595]: Waiting for write loop task to end
engineio.server:client.py[604]: Exiting read loop task

server.py

> ./server.py 
engineio.client:server.py[155]: Server initialized for threading.
 * Serving Flask app 'server' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
werkzeug:_internal.py[225]:  * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
engineio.client:server.py[719]: The WebSocket transport is not available, you must install a WebSocket server that is compatible with your async mode to enable it. See the documentation for details. (further occurrences of this error will be logged with level INFO)
engineio.client:socket.py[92]: ynXtoVOOcbfmGxzIAAAA: Sending packet OPEN data {'sid': 'ynXtoVOOcbfmGxzIAAAA', 'upgrades': [], 'pingTimeout': 20000, 'pingInterval': 25000}
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:50:35] "GET /socket.io/?transport=polling&EIO=4&t=1634327435.0804608 HTTP/1.1" 200 -
engineio.client:socket.py[55]: ynXtoVOOcbfmGxzIAAAA: Received packet MESSAGE data 0
root:server.py[12]: Connection established, simulating drop in 15 seconds
engineio.client:socket.py[92]: ynXtoVOOcbfmGxzIAAAA: Sending packet MESSAGE data 0{"sid":"IKqf6bqKqckrsNirAAAB"}
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:50:35] "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327435.0996375 HTTP/1.1" 200 -
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:50:35] "POST /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA HTTP/1.1" 200 -
root:server.py[17]: Simulating cloud instance restart by deleting all sids
engineio.client:socket.py[92]: ynXtoVOOcbfmGxzIAAAA: Sending packet PING data None
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:51:00] "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327435.1163623 HTTP/1.1" 200 -
engineio.client:server.py[719]: Invalid session ynXtoVOOcbfmGxzIAAAA (further occurrences of this error will be logged with level INFO)
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:51:00] "POST /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA HTTP/1.1" 400 -
engineio.client:server.py[723]: Invalid session ynXtoVOOcbfmGxzIAAAA
werkzeug:_internal.py[225]: 127.0.0.1 - - [15/Oct/2021 12:51:00] "GET /socket.io/?transport=polling&EIO=4&sid=ynXtoVOOcbfmGxzIAAAA&t=1634327460.1164758 HTTP/1.1" 400 -

The solution seems pretty straightforward: remove the call to _reset() on line 694 of client.py.

Also, I haven't tested to see if the same bug happens with an asynioc_client.py, but it looks like it has the same weird call to _reset(), so it probably does.

If you'd like, I'd be happy to submit a PR removing those spurious calls to _reset().

Thanks so much!

The text was updated successfully, but these errors were encountered:

miguelgrinberg · 2021-10-15T21:49:33Z

This is in part related to miguelgrinberg/python-socketio#485. The part that is new is the fact that the disconnect handler is not being invoked when the server returns an error. That is an omission.

The part that this issue has in common with the other one is the lack of reconnect attempts when the server returns an error response. I have to think about how to handle this, because a 400 error as you suggest as an example is unlikely to succeed on a retry. 5xx errors make sense to retry though.

Your suggestion of just skipping the reset is incorrect, I believe. All that does is ignore the request that ended in error and keep going, without disconnecting. The correct behavior is to disconnect and then reconnect. If you don't disconnect, then some events that were exchanged in the request that ended up in error will be silently discarded.

jgillula · 2021-10-15T22:51:33Z

Yep, I saw miguelgrinberg/python-socketio#485 and thought it might be related. The solution you gave there didn't work in my case though (since the socketio client never even realized it was disconnected, even though the engineio client was disconnected)--so I figured it made sense to post a separate bug.

As for 400 errors, it looks like the server generates them in a wide variety of circumstances including when a session ID is invalid. It seems like that's at least one situation where a retry is likely to succeed, because it doesn't mean there's any underlying problem--just that the server forgot about the client somehow. (And in my case an attempt to retry does indeed succeed, because the client tries to use a new SID on reconnect.)

As for skipping the reset, I tried removing it and everything reconnected as expected:

The write loop still breaks and the function returns, because there's still a break on line 695 in case of error.
Shortly thereafter the read loop (polling mode) also gets a 400 response and breaks.
Since the state is still connected, the if statement on line 597 is true, so the disconnect handler is run, which means the socketio reconnection code gets run.

So when you omit the _reset() you do disconnect and then reconnect. It's true that whatever packet caused the error in the first place would be dropped in the process--but that's true now, even with the _reset() call there.

miguelgrinberg · 2021-10-16T09:40:54Z

Hmm. Okay, I'll test removing the resets. I didn't think it would do what you describe, but I'll figure it out. Thanks.

miguelgrinberg · 2021-10-23T22:44:30Z

Your solution does not work, as I expected. When a request comes back with a non 2xx status code, the read and write loop tasks must exit. Calling _reset() in the write loop task is what causes the read loop to also exit at the same time. The only way this solution works is if both tasks receive errors at the same time, which may or may not happen, depending on the error. If one of the tasks receives an error and the other doesn't, then the client will continue running in a broken state.

So I disagree that the _reset() calls are spurious, they do have a purpose.

jgillula · 2021-10-23T23:21:09Z

Yep, you're right--apologies for not catching that myself. I've edited the title of the bug to reflect that the calls to _reset() aren't spurious.

Given that, I can think of two possible solutions and I'm not sure which is better:

Introduce another state, 'error', in addition to 'connected', 'disconnected', and 'disconnecting'. Then, instead of having the write loop call _reset(), create the following method and have it call that:

def _error(self):
  self.state = 'error'
  self.sid = None

And then, replace if self.state == 'connected': on line 597 with if self.state != 'disconnected':. This way the read loop will break but the 'disconnect' handler will still be run.

Leave the call to _reset() in, but add the same logic to trigger the 'disconnect' handler to the write loop as exists in the read loop. I don't think this solution is as good, since it kind of breaks the current structure where the read loop is somehow the "primary" loop (e.g. the read loop waits for the write loop to end, the read loop currently triggers the event handler, etc.). But possibly there's a clean way to do it?

There are probably other solutions--WDYT?

miguelgrinberg self-assigned this Oct 15, 2021

miguelgrinberg added the bug label Oct 15, 2021

jgillula changed the title ~~Socketio client isn't notified of disconnect due to spurious _reset() call in client.py~~ Socketio client isn't notified of disconnect due to _reset() call in client.py Oct 23, 2021

miguelgrinberg mentioned this issue Oct 27, 2021

query the state of a connection miguelgrinberg/python-socketio#799

Closed

miguelgrinberg closed this as completed in b70695a Feb 5, 2024

miguelgrinberg added a commit that referenced this issue Feb 5, 2024

More robust handling of polling disconnects (Fixes #254)

e0bb263

miguelgrinberg mentioned this issue Feb 5, 2024

Reconnect on HTTP 503 miguelgrinberg/python-socketio#485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socketio client isn't notified of disconnect due to _reset() call in client.py #254

Socketio client isn't notified of disconnect due to _reset() call in client.py #254

jgillula commented Oct 15, 2021

miguelgrinberg commented Oct 15, 2021

jgillula commented Oct 15, 2021

miguelgrinberg commented Oct 16, 2021

miguelgrinberg commented Oct 23, 2021

jgillula commented Oct 23, 2021

Socketio client isn't notified of disconnect due to _reset() call in client.py #254

Socketio client isn't notified of disconnect due to _reset() call in client.py #254

Comments

jgillula commented Oct 15, 2021

miguelgrinberg commented Oct 15, 2021

jgillula commented Oct 15, 2021

miguelgrinberg commented Oct 16, 2021

miguelgrinberg commented Oct 23, 2021

jgillula commented Oct 23, 2021