-
-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flask-SocketIO behind nginx randomly thinks client has disconnected #327
Comments
The client-side log that you have indicates that the server is responding with a 400 error here. I do not see the backend side of that interaction. Do you have it? |
@miguelgrinberg Thanks for the response. The 400 coming from the server (I believe) is the traceback here. The 400 is emitted by the server after the client tries to receive the PONG for 60 seconds and then disconnects. I believe this is the same as the theoretically benign issue reported here. I can even have this 400 error by manually disconnecting from the client side:
Now when I manually disconnect, I don't get the line in the log file that appears just before that stack trace:
|
Any movement on this? |
Well, I have been unable to reproduce this problem. When the client requests the upgrade to websocket, for some reason the server thinks it has been inactive for too long, and decides to kick that client out. Unfortunately these logs don't have timestamps in them, so I can't really see how much time passed between the the different stages of the connection. |
@andrewSC Are you having similar issues? I've added some logging code into my application and am capturing log outputs when the connection fails to upgrade to get a little more information on what is happening. I hope to post more information here once I get the more detailed log output (with timestamps). |
@suever Sorry I was on vacation. I'm having the same issue but I think it's due to an improperly configured uwsgi instance. I'm going to try and use gunicorn and see how far that gets me. |
@andrewSC hi. if you got anything please please share with us. thanks! |
@qwexvf I honestly switched over to gunicorn from uwsgi and within about 10 mins. was able to have flask-socketio functional. The js client starts with long polling then asks for the websocket upgrade which comes back with a 101, which is all entirely expected. I know it's not quite as simple for some people to switch to gunicorn as their project might be deeply ingrained with uwsgi but that's how I solved my issue. I'm also assuming I didn't have uwsgi configured correctly and it could just be that I wasn't specifying gevent correctly in the config or other potential places. Sorry I couldn't be of more help! |
This is affecting me as well. Running nginx as a proxy (passing socketio traffic using proxy_pass and other python traffic with uwsgi_pass), uwsgi to handle a flask app. Frontend is using the socketio js lib.
Both backend and frontend running on a single Ubuntu machine, using Chrome. |
I finally got some logs capturing the event with debugging turned on for gunicorn, nginx, and flask-socketio. The only piece I don't have in this interaction is the client-side log, but hopefully from these logs, it's discernable where in the client->nginx->gunicorn->flask-socketio chain the issue is. This is currently affecting 13% of all sessions on my site. |
@suever Can you explain what you consider a problem in these log sessions? If your user refreshed the page, it is expected that there will be connection lost and then a reconnect. Am I looking at the wrong thing? |
Sorry, let me elaborate a little bit. So the first time the client connects to the page, an upgrade from polling to a websocket is sent by the client and received by flask-socketio (here) however, the upgrade to the websocket never actually succeeds and the app doesn't work because the client thinks that the connection has been upgraded and therefore it has already terminated the polling but flask-socketio doesn't think that the connection is upgraded yet because it's still waiting on the UPGRADE packet. The inability to use the page causes the user to refresh the page causing the disconnect (everything there is as expected). However, when the page is reloaded, then the request to upgrade to a websocket is again emitted by the client (here) but this time the upgrade is successful (here) It seems that in the first case, the UPGRADE packet is somehow lost between nginx, gunicorn, or flask-socketio. I don't understand enough about how that communication occurs to be able to tell from the logs if 1) the upgrade packet is indeed received by nginx and 2) the upgrade packet is properly forwarded to gunicorn. |
Okay, sorry, I understand now. I think the gunicorn error is unrelated, but if you want to definitely rule that out as a potential problem, you can install the master branch, which has the issue addressed. I believe a new official release of gunicorn with this fix is coming any day now. The Flask-SocketIO log shows that nothing happens in those 14 seconds between the attempted websocket connection and the time the user refreshed the page. That is odd, because the HTTP connection remains active until the websocket handshake is complete, and only then it is retired. It could be that 14 seconds isn't enough time to catch any traffic, but considering that the page was refreshed, am I correct in assuming that the client appeared blocked during those 14 seconds? If it is easy to repeat the test, it would be better to let the connection stay for a minute or two, to see if the HTTP long-polling mechanism is working. Unfortunately gunicorn doesn't tell us anything useful, except for that 400 error, which I believe happens after the websocket connection is closed, so it is downstream of the problem. The timestamp on the stack trace seems to agree with this, as the error came at the end of that 14 second period, and the problem we are interested in is at the beginning. I looked at the nginx log as well, and compared the bad and the good connection cases, and I really don't see any smoking gun there either. It appears on the bad case the connection was established, and there was an initial exchange of data, but then the client did not respond with the expected response in the middle of the handshake. Out of curiosity, are you able to characterize the clients that tend to fail more in some way? Is it a specific browser version, or some other attribute that makes it more likely to hit this problem? |
I have the same issue with production. everything is ok on local. I am using node.js as app server, nginx for reverse proxy and socket.io for ws. |
I have the exact same problem. Locally (where nginx is not used) everything works fine. However, on production I have the above mentioned behavior. The engineio output is as follows:
Here are the images of the reoccurring bad requests and the details of one of them (client side): This is the log from nginx
Could it be something with nginx not being properly configured for long polling or more generally for socket.io? |
@Schnodderbalken the nginx log shows things that appear to be unrelated to HTTP/WebSocket. Any idea what those are? It seems clients for other protocols are connecting to nginx. |
I'm going to close this, as I have no indication this issue persists after other related fixes related to leaking disconnected sockets. |
@miguelgrinberg Thanks! I'm running the latest versions and it seems like the error has gone away 🤞 |
I've been using Flask-SocketIO to provide real-time interaction between a web-based frontend and background celery-based processes. The site can be found at https://matl.suever.net. The source code is available at https://github.com/suever/MATL-Online. Here's an example link which should print the numbers
1
through12
in the output window when you hit "Run".The basic layout of the app is that I have the javascript SocketIO library on the client side send information to the flask application (behind nginx) which schedules a celery task which then emits information back to the client using the
message_queue
functionality of Flask-SocketIO.Seemingly randomly, the javascript client will successfully establish a connection, but when the user attempts to submit code to the server (via an emit), nothing happens as the client seems to be waiting for a PONG event which never occurs.
I have enabled debugging on the SocketIO client side as well as the engineIO backend and here is the server log entries when this occurs.
Backend Log
What seems to happen is that the client attempts to upgrade to websocket transport from HTTP polling and during that process, the server side seems to think that the client has disconnected so the PONG event is never sent back to the client which results in a timeout on the client side after 60 seconds, which then causes the connection to be re-established.
I'm at a bit of a loss of where to look next to debug this thing, particularly since the issue happens randomly. I'm open to any suggestions. I figured this was the most relevant place to ask this since there is a good bit of discussion on Flask-SocketIO and nginx here already.
Here is my nginx configuration
nginx.conf
matl.conf
And the installed packages within the virtualenv
The text was updated successfully, but these errors were encountered: