-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP related and other fixes #6058
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6058 +/- ##
==========================================
- Coverage 52.63% 52.61% -0.02%
==========================================
Files 410 410
Lines 40048 40057 +9
Branches 7781 7781
==========================================
- Hits 21078 21077 -1
- Misses 15768 15779 +11
+ Partials 3202 3201 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one issue (with NET_DEFAULT_IF_NONE option naming) requires change IMHO.
subsys/net/ip/Kconfig
Outdated
to choose default interface. Otherwise first interface will be the | ||
default interface. | ||
|
||
config NET_DEFAULT_IF_NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the option contradicts its description: the option says that there's not default interface, while description says the first is default.
subsys/net/ip/net_if.c
Outdated
if (__net_if_start == __net_if_end) { | ||
return NULL; | ||
return iface; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't make this change. The original code makes (more) clear that the return value is NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Paul
subsys/net/ip/net_context.c
Outdated
@@ -1246,6 +1246,14 @@ NET_CONN_CB(tcp_established) | |||
return NET_DROP; | |||
} | |||
|
|||
/* Increment the ack */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct change. First, application gets a chance to process incoming data. An application may reject or fail to process data (e.g., due to lack of memory). ACK gets sent only after that, and includes outcomes of the processing status (e.g., updated/not updated receive window, etc.).
If you look thru the history, you'll likely find that the opposite change was done, which you now try to revert.
@@ -961,32 +961,14 @@ bool net_tcp_ack_received(struct net_context *ctx, u32_t ack) | |||
valid_ack = true; | |||
} | |||
|
|||
/* No need to re-send stuff we are closing down */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for the commit message:
Now ACK's are in rx_thread queue or ACK for first packet received.
I'm not sure I understand this phrase. Can you please clarify the description in commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, whole this part is not very clear:
Now ACK's are
in rx_thread queue or ACK for first packet received. Then
net_tcp_ack_received() function will delete the first packet.
Because it's valid ACK and connection state is in ESTABLISHED state
sent list is not empty. It restarts the timer and resend packets
in queue. In e.g. case Zephyr sends second packet in a list
immediately.
subsys/net/ip/tcp.c
Outdated
@@ -661,12 +661,47 @@ int net_tcp_prepare_ack(struct net_tcp *tcp, const struct sockaddr *remote, | |||
return -EINVAL; | |||
} | |||
|
|||
static void get_src_addr_ptr(struct net_tcp *tcp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a right change, see #5481
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find helper function for this purpose. May be I missed it. I will take a look.
subsys/net/lib/app/net_app.c
Outdated
@@ -562,27 +562,45 @@ struct net_context *select_client_ctx(struct net_app_ctx *ctx, | |||
|
|||
#if defined(CONFIG_NET_APP_SERVER) | |||
#if defined(CONFIG_NET_TCP) | |||
static struct net_context *get_server_ctx(struct net_app_ctx *ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message:
E.g. One net_app_ctx and two net_contexts. 'For' loop goes through
array of net_contexts and searching for matching context.
Let's say caller required net_context is second in the
What's the purpose of that intent? Please reformat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, that description also doesn't sound too clear to me (but I'm not familiar with net_app_ctx internals, maybe that's it).
My usual comment applies: this PR packs the full spectrum of changes - from obviously correct to obviously incorrect, with anything inbetween. At this point, I'm keen to say that Github reviews are failing for us, and we should go back to Gerrit which enforces "one commit per review" rule. <--- @nashif: sharing an opinion. |
I am not sure how this can be true, you could have 1000s of commits in 1 single gerrit change set, same thing as a PR in GH. Review per commit, and not as a PR, just the same way you would have done in gerrit. i agree we should not put too much unrelated stuff into one PR, it can be confusing, commits in 1 single PR need to be related. And finally, No, we are not going back to gerrit. |
subsys/net/ip/Kconfig
Outdated
default interface. | ||
|
||
config NET_DEFAULT_IF_NONE | ||
bool "Default interface is first interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be shorter so what about "First available interface"
subsys/net/ip/Kconfig
Outdated
default interface. | ||
|
||
config NET_DEFAULT_IF_ETHERNET | ||
bool "Ethernet interface is the default interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "Ethernet" here
subsys/net/ip/Kconfig
Outdated
depends on NET_L2_ETHERNET | ||
|
||
config NET_DEFAULT_IF_BLUETOOTH | ||
bool "Bluetooth interface is the default interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Bluetooth"
subsys/net/ip/Kconfig
Outdated
depends on NET_L2_BT | ||
|
||
config NET_DEFAULT_IF_IEEE802154 | ||
bool "IEEE802.15.4 interface is the default interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IEEE 802.15.4"
subsys/net/ip/Kconfig
Outdated
depends on NET_L2_IEEE802154 | ||
|
||
config NET_DEFAULT_IF_OFFLOAD | ||
bool "OFFLOAD interface is the default interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Offloaded interface"
subsys/net/ip/Kconfig
Outdated
depends on NET_L2_OFFLOAD | ||
|
||
config NET_DEFAULT_IF_DUMMY | ||
bool "Dummy interface is the default interface" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Dummy testing interface"
subsys/net/ip/net_if.c
Outdated
if (__net_if_start == __net_if_end) { | ||
return NULL; | ||
return iface; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Paul
Most of the patches seem not to be related to each other but I am not sure if it is worth sending them one by one. But if needed at least TCP stuff could be a separate PR, and interface selection and net-app fixes. So this could be split into three PRs. |
subsys/net/ip/net_context.c
Outdated
} | ||
|
||
send_ack(context, &conn->remote_addr, false); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfalcon "ret = packet_received(conn, pkt, context->tcp->recv_user_data);" Sending ack doesn't depend on returned value. Irrespective of return value, ACK will be prepared and sent.
I noticed an issue related to this while trying with RPL BR sample. When TCP handover the payload in callback, then net_app callback, and then http call back. (e.g connected) all the way goes to application. Here an application has kilobytes of payload to reply. Now net_pkt_append() doesn't append more than "datalen in net_pkt", http_prepare_and_send() create new packets until all the payload filled in buffers. After all these shebang ACK will be prepared and sent. Which is causing sometimes a delay in ACK, and the peer sends another DUP packet. Scenarios like this delay ACK preparation. And like I said current code doesn't really check return value, I thought it would make more sense to send the ACK first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Irrespective of return value, ACK will be prepared and sent.
Yes, an ACK packet will be sent in either case. But contents of the ACK packet are expected to depend on app's receive callback (i.e. receive callback may affect data put into an ACK reply), specifically, the receive window: 19ff963 .
I noticed an issue related to this while trying with RPL BR sample. ...
Thanks for the clear description of the problem you saw. I would say, this description should go into the commit message.
But then, we found yet another issue with callback-based data processing. And it can be classified as a case of https://cwe.mitre.org/data/definitions/405.html - for a small request, "http_prepare_and_send() create new packets until all the payload filled in buffers", and all that happens in a not very appropriate place - synchronous callback from the network stack, which affects the stack performance.
I'm afraid, we'd need to find another solution, and such solution exists - BSD Sockets. How it works is that there's decoupling between packet reception of packets by network stack and consumption by an application - they both run in separate threads and communicate via queue. There's also the flow control implemented, mandated by TCP, using the concept of "receive window": when a packet is just received from peer peer and put into the queue, the recv_wnd is decreased (and communicated back in an ACK packet), signaling the peer that it has decreased quota to send more data (which eventually may get to 0). recv_wnd is increased back only when data is actually consumed from the queue.
Due to the fact that we have usecases of both updating and not update recv_wnd, the (optional) update happens in recv callback. Thus, this patch would break this processing, and will break cases of communication large amounts of data (megabytes) over sockets.
My suggestion is to separate this patch into a separate PR, and further analyze and test it from different sides, but opinion is that we'd need to find an alternative way to address this issue. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then, we found yet another issue with callback-based data processing. And it can be classified as a case of https://cwe.mitre.org/data/definitions/405.html - for a small request, "http_prepare_and_send() create new packets until all the payload filled in buffers", and all that happens in a not very appropriate place - synchronous callback from the network stack, which affects the stack performance.
I am experimenting that in my WIP PR of RPL BR. Running a separate thread to reply websocket messages. Checking for proper solution.
b61d0a5#diff-780b2211d7789425a2584af4afc52b77R1881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, let's take it away patch from this PR, time being I will keep it in #5035. I will create a separate PR for this and list issues over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, please have a look.
Dropped the ACK sending before payload handler patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pfalcon could you re-review the PR as now the controversial "ACK sending before payload handler" commit is dropped from this set. |
subsys/net/ip/tcp.c
Outdated
#endif | ||
|
||
end: | ||
addr = &tcp->context->local; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a dead statement, and apparently doesn't do what was intended by the author (what?).
Sure, as my last comment 2 comments above yours says, the last concern is still get_src_addr_ptr() function added in 3146804#diff-a278935f8d5d5e0b4424ca3e1182d48cR664 @jukkar, @rveerama1 : And I'm glad that I took another look at it. Here're concerns with it:
|
k_delayed_work_cancel(&context->tcp->fin_timer) called twice immediately one after other. Signed-off-by: Ravi kumar Veeramally <[email protected]>
In case of failed to get source address from the net packet, release the assgined tcp backlog entry. Otherwise it will never be freed. Signed-off-by: Ravi kumar Veeramally <[email protected]>
Zephyr doesn't have luxury to create re-transmit timer per packet. So it has different methods to handle packets in queue. But re-sending packets on valid ack messages causing issues. E.g. A TCP node sent two packets (packet-1, packet-2). Peer replied two ACKs (ACK-1 and ACK-2), and these two ACK's are at rx_thread queue. Now ACK-1 is handled and reference of packet-1 is freed from sent list. Then if condiftion (valid ACK and connection state is ESTABLISHED) notices that, sent list is not empty. Restart the timer, modify sent flag and resend packets in a list. Here packet-2 is sent again, even though ACK-2 is already received. Situation is worse if there are more packets in the list. So only start the re-transmit timer in-case queue is not empty. It allows rx_thread to handle all incoming packets (in this e.g ACKs). When the re-trasmit timer expires, it sends the packets which are left in queue. Signed-off-by: Ravi kumar Veeramally <[email protected]>
If context is bound to IPv6 unspecified addresss and some port number, then unspecified address is passed in TCP reset packet message preparation. Eventually packet dropped at the peer. Signed-off-by: Ravi kumar Veeramally <[email protected]>
@pfalcon updated PR. Let me know if I address your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated PR. Let me know if I address your comments.
Looks good, thanks much for reworking it!
This PR contains fixes related to TCP issues and other network related issues.