Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreasonable logic for ping timeout #134

Closed
wants to merge 2 commits into from
Closed

Unreasonable logic for ping timeout #134

wants to merge 2 commits into from

Conversation

fxdmhtt
Copy link

@fxdmhtt fxdmhtt commented Aug 8, 2019

In the author's source code, the logic for the client to send a ping request to the server is mainly in the _ping_loop method of client.py[line:493] and asyncio_client.py[line:405].

The core logic of this method is to wait for ping_interval seconds asynchronously after sending a ping request. In the meantime, it will theoretically receive the pong returned by the server. If it is not received, disconnect ws, otherwise continue the above loop. That is, every two ping requests are roughly separated by ping_interval seconds.

After the nth ping request is sent, the pong of the server response is received in a very short time, and the n+1 th ping request is sent after ping_interval seconds. If the network fluctuates, the server receives the request after (ping_interval - 2) seconds and respond immediately. The response was received by the client in a very short time and is still in ping_interval seconds. For the client, the connection is normal and does not need to be disconnected.

But for the server, last_ping is the last time the ping request was received. After receiving the ping request again, it has passed (ping_interval + ping_interval - 2) seconds, far exceeding the preset (ping_interval + 5) seconds, so the server will not hesitate to disconnect the link.

In some cross-border connections, the frequency of this phenomenon is not uncommon, but it is actually unreasonable.

For the client, the extreme case of staying connected is the (ping_interval + ping_interval) seconds between the first ping request and the second pong response.

For the server, the mysterious parameters of 5 seconds have no theoretical basis. Conversely, if the second ping request is not received after (ping_interval + ping_interval) seconds, then it is certain that the client disconnected and the server should be disconnected.

Thank you for your acceptance.

@miguelgrinberg
Copy link
Owner

Why don't you just set the ping_interval to 50 seconds in your server? Doesn't that achieve the same effect?

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

The problem is not how much ping_interval is set, but rather that the server side will be disconnected as long as the ping time exceeds (ping_interval + 5) seconds. In other words, the server requires that the ping request sent by the client must be delivered within 5s, otherwise it will be disconnected, and not a complete ping-pong period is required within ping_interval seconds.

@miguelgrinberg
Copy link
Owner

I don't understand what you are saying. If a client fails to send a ping in ping_interval seconds, then it is assumed to be gone. The 5 there is just a little grace period to allow for network latencies, etc. The pong has no part in this.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

So, why is this network delay set to 5s instead of 8s, 10s, or longer? Where is the standard?
I mean, the margin for network latency is not 5s, nor 8s, 10s or others, but ping_interval seconds.
Forgive me for my poor English. Many texts were modified with Google Translate. If I can send a picture, it will be better.
I hope you will think carefully about my suggestion.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

This is not a small problem. In the multinational network environment, a lot of wrong disconnection operations are caused, which are disconnected by the server.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Under the default parameters, ping_interval takes the value 25, which means that the server will disconnect if the interval between the two ping requests exceeds 30s. In fact, I tracked the debug log and found that many of them were disconnected due to values such as 30.37s, and after the log was disconnected, it immediately followed a log that received the ping, which proves that the 5s parameter is not reasonable. Of course, changing to 10s can alleviate the problem, but the most reasonable parameter should be ping_interval seconds, the reason is in my first comment.

@miguelgrinberg
Copy link
Owner

The allowed time for a client to send a ping is ping_interval. I added 5 seconds because even when the client sends the pings at the exact interval, a small delay or network congestion can cause the ping packet to arrive late by a small amount of time. The 5 is arbitrary, just to make it a bit larger than the actual interval.

You are suggesting that the allowed time should be twice the ping_interval, which you can actually do by changing ping_interval to 50 instead of 25. It's exactly the same, unless I'm missing something.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Changing from 25 to 50 does not solve the problem. It is still the case that the second ping sent from the client is received by the server after 5 seconds, so the server is incorrectly disconnected.

@miguelgrinberg
Copy link
Owner

miguelgrinberg commented Aug 9, 2019

If your network has an occasional congestion that can delay a packet for more than 5 seconds, then I don't see a problem with the client getting disconnected. The client will reconnect immediately after anyway.

If your network delays packets by more than 5 seconds often, then isn't that a network problem? I understand that you want to compensate for that by making the grace period longer, but while doing that helps your case, it affects the time clients that disconnect are detected, which is a problem for normal networks that don't have packet delays.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Let us assume a situation:
At 0 seconds, the client sends a ping.
At the 0.5th second, the server receives the ping and sends pong at the same time.
At the 1st second, the client receives pong and sets the flag.
After 24 seconds, at the 25th second, the client wakes up and checks the flag. The flag has been set 24 seconds ago, so continue to send pings.

Very unfortunate, the network is not very good at this time.
After 10 seconds, at the 35th second, the server received the second ping.
The server finds that the ping has been 34.5 seconds since the last ping, which is more than 25+5=30 seconds, so it is disconnected.
The client will wake up the check flag at the 50th second, which means that in 25 seconds, as long as the pong is received, it is considered normal.
But for the server, only a 5 second time window is provided.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Line 37 of asyncio_socket.py records the time of the first ping, which is 0.5 seconds. Line 57, time.time() is the time when the ping is received for the second time, and is subtracted to the interval between pings received twice. Considering that under normal circumstances, the client will take 25 seconds to rest before sending the next ping, so you only allow the second ping to send for 5 seconds. In fact, even if the second ping is received in the 45th second, as long as the network is good, the client can receive pong within 5 seconds, and the client will not disconnect. In other words, the client allows two pongs to be separated by 50 seconds, and the server only allows two pings to be separated by 30 seconds. Is it unreasonable?

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Even if it is based on the reason that the server and the client have the same requirements for the timeout, the parameter of 5 seconds should be modified.
I highly recommend that you think about it again. Perhaps this is irrelevant for most businesses, but for scenarios where network conditions are not good, this fix can improve the experience without negative externalities. To achieve Pareto improvement, why not?

@miguelgrinberg
Copy link
Owner

You are looking at this from the point of view of how long each side waits for the other side, and you expect this time to be equal on both sides. But from my point of view, the send frequency of pings is much more important than the send frequency of pongs. If a client fails to send a ping at the correct interval, then I want to disconnect that client and notify the server-side application as soon as possible. If the server is slow in sending the pong I don't really care, as long as the client gets the pong before it sends the next ping it's all good.

The server finds that the ping has been 34.5 seconds since the last ping, which is more than 25+5=30 seconds, so it is disconnected.

This may have happened when you tested this, but in reality there is a separate background task that monitors all clients and disconnects them when they missed a ping. So when you get to the 35th second as in your example, the client may be gone already depending on the monitoring cycle of this other task.

this fix can improve the experience without negative externalities.

I agree that improves your case with faulty networks, no question about that. It does have a negative side for everybody else, which is that the time it takes to detect a client that is legitimately gone doubles.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Thank you for understanding my question.
Then I have an additional question: What is the problem if detecting legitimate clients is slower than it is now?
For the corresponding solution, are you willing to add a parameter to replace that 5 seconds. Even if it is a global constant written in the code, I can modify it dynamically, which is better than now.
I really don't want to lock the socketio version, and I additionally install an engineio version of my own fork each time.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 9, 2019

Is a global constant acceptable?
Just like this:
if time.time() - self.last_ping > self.server.ping_interval + _DELAY_CONST:
or like this:
async def check_ping_timeout(self, delay=5):
For me, it is more convenient than it is now.

@miguelgrinberg
Copy link
Owner

@fxdmhtt take a look at the master branch. Now you can optionally pass a tuple to the ping_interval argument:

sio = socketio.Server(ping_interval=(25, 10))

The second number is the added grace period, which defaults to 5 as before when not given.

@fxdmhtt
Copy link
Author

fxdmhtt commented Aug 11, 2019

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants