-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: TCP retransmit queue implementation is broken #5857
Comments
Needless to say, as you start to enable logging or add printk's, stuff gets nicely serialized and problems go away. After removing one of having-made-perfect printk's:
|
@pfalcon , do you have any suggestions on how to fix that? |
Well, this is a complex issue, which is hard to debug due to time-sensitivity to debugging, as shown above. So, ideas to approach it:
Given the current situation (1.11 release, sockets work to cover TLS), this is in backlog, but I'd really like to focus on this with 1.12 window opening. Marking the issue accordingly. (@rveerama1 also touched some of issues with TCP re-xmit in #6058, but I doubt that resolved everything, due p.3 above). Of course, I'd appreciate other developers to triage this issue. Thanks. |
is this already fixed? can someone please verify! |
It's not, just last week I was getting non-deterministic lockups and errors when playing with sockets code. Per my comment above and metadata on the issue, I'm going to dig into this when 1.12 windows opens. |
@jukka any news on this? |
This is assigned to me, and I was working on streamlined ways to reproduce it - the original logs above were produced with a server written in MicroPython, trying to reproduce it with in-tree samples. But moving in that direction, I faced other issues, like #7476 , so this is backlogged somewhat. I also started to analyze/trace the source code trying for find issues with "static analysis". |
@jukka: Oops, sorry for disturbing! |
@pfalcon What is the status? Are we going to see a fix for 1.12? |
No, this is non-deterministic issue which is hard to reproduce. (We're with other folks working on reproducing such issues, e.g. #7831, but results aren't bright so far.) |
Any new updates on this? Does the issue still exist? |
No updates for almost 2 years, does not seem important for anyone? @jukkar |
The issues described here are (IMHO) the root cause of things like #23302. It seems like more people only now start to play with Zephyr networking and raise concerns. |
I fixed several issues for TCP in #23334 that might actually help with this issue. |
Old issue and things have improved a lot in the latest version. We are also migrating to new TCP version and the testing should go there. As agreed in release readiness and bug triage meeting, closing this issue. |
Rexmit is implemented as
sys_snode_t sent_list;
, and may be accessed concurrently at least by net_tcp_send_data() and net_tcp_ack_received(), but slist.h is ridden with warnings like: "Note: the loop is unsafe and thus _cn should not be detached" and "This and other sys_slist*() functions are not thread safe.".Here's typical logging for when working by a chance doesn't work:
The text was updated successfully, but these errors were encountered: