-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: tcp: fix spurious TCP retries #9504
Conversation
Spurious TCP retries were observed using Wireshark while continuously sending TCP packets at an interval faster than the initial RTO. If the send list is empty and CONFIG_NET_TCP_TIME_WAIT_DELAY is used, the retry timer will not be correctly stopped when receiving a valid ACK. As a consequence, the timer might be running when a new packet is queued, but the logics in net_tcp_queue_data() will not restart the timer as it is already running. This will make the retry timer to expire prematurely, potentially while sending packets. The nested condition is merged into a single condition, allowing the final else clause to be reached when a valid ACK is received. Signed-off-by: Florian Vaussard <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I do not think that I already encountered #5857. Maybe #8188 already happened because I know that we had unexpected TCP disconnections sometime. My first explanation was to blame the retry logics because one of the hardware setup is a very nice collision generator. That's why I looked more carefully into the retry mechanisms. I will see if #8188 pops up in our tests. |
Codecov Report
@@ Coverage Diff @@
## master #9504 +/- ##
=========================================
- Coverage 52.22% 52.2% -0.03%
=========================================
Files 212 212
Lines 25948 25951 +3
Branches 5577 5577
=========================================
- Hits 13551 13547 -4
- Misses 10140 10148 +8
+ Partials 2257 2256 -1
Continue to review full report at Codecov.
|
Thanks. First step is indeed getting a confirmation from other folks that they see same/similar problematic behavior. Myself, I couldn't find time to concentrate on further investigating them, with floodwork of refactoring and new features Zephyr has :-(. For your reference, #5857 was spotted while playing with MicroPython's Zephyr port. Using interactive interpreted language makes playing with networking easy, and it doesn't take long to drive the IP stack into havoc. (IIRC, it was sending a bunch of smallish packets, e.g. generating an HTTP request with a bunch of send() calls). |
From commit message:
Based on this description, it's unclear, whether packets were send from or to Zephyr. |
Packets were sent from Zephyr to Zephyr :) The sending node resent some packets without letting the other node enough time to ACK the packets. Do you want me to update the description to be more clear? |
Up to you, and thanks for considering that option ;-). In this case, it's probably not too important, as the rest of the commit message explains the situation with the code. |
Spurious TCP retries were observed using Wireshark while continuously
sending TCP packets at an interval faster than the initial RTO.
If the send list is empty and CONFIG_NET_TCP_TIME_WAIT_DELAY is used,
the retry timer will not be correctly stopped when receiving a valid
ACK. As a consequence, the timer might be running when a new packet is
queued, but the logics in net_tcp_queue_data() will not restart the
timer as it is already running. This will make the retry timer to expire
prematurely, potentially while sending packets.
The nested condition is merged into a single condition, allowing the
final else clause to be reached when a valid ACK is received.
Signed-off-by: Florian Vaussard [email protected]