Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP related and other fixes #6058

Merged
merged 4 commits into from
Feb 13, 2018
Merged

Conversation

rveerama1
Copy link
Collaborator

This PR contains fixes related to TCP issues and other network related issues.

@codecov-io
Copy link

codecov-io commented Feb 8, 2018

Codecov Report

Merging #6058 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6058      +/-   ##
==========================================
- Coverage   52.63%   52.61%   -0.02%     
==========================================
  Files         410      410              
  Lines       40048    40057       +9     
  Branches     7781     7781              
==========================================
- Hits        21078    21077       -1     
- Misses      15768    15779      +11     
+ Partials     3202     3201       -1
Impacted Files Coverage Δ
subsys/net/ip/tcp.h 100% <ø> (ø) ⬆️
subsys/net/ip/tcp.c 42.95% <0%> (-0.86%) ⬇️
subsys/net/ip/net_context.c 60.3% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa8850b...3146804. Read the comment docs.

@pfalcon pfalcon self-requested a review February 8, 2018 14:52
Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one issue (with NET_DEFAULT_IF_NONE option naming) requires change IMHO.

to choose default interface. Otherwise first interface will be the
default interface.

config NET_DEFAULT_IF_NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the option contradicts its description: the option says that there's not default interface, while description says the first is default.

if (__net_if_start == __net_if_end) {
return NULL;
return iface;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make this change. The original code makes (more) clear that the return value is NULL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Paul

@@ -1246,6 +1246,14 @@ NET_CONN_CB(tcp_established)
return NET_DROP;
}

/* Increment the ack */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct change. First, application gets a chance to process incoming data. An application may reject or fail to process data (e.g., due to lack of memory). ACK gets sent only after that, and includes outcomes of the processing status (e.g., updated/not updated receive window, etc.).

If you look thru the history, you'll likely find that the opposite change was done, which you now try to revert.

@@ -961,32 +961,14 @@ bool net_tcp_ack_received(struct net_context *ctx, u32_t ack)
valid_ack = true;
}

/* No need to re-send stuff we are closing down */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for the commit message:

Now ACK's are in rx_thread queue or ACK for first packet received.

I'm not sure I understand this phrase. Can you please clarify the description in commit message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, whole this part is not very clear:

Now ACK's are
in rx_thread queue or ACK for first packet received. Then
net_tcp_ack_received() function will delete the first packet.
Because it's valid ACK and connection state is in ESTABLISHED state
sent list is not empty. It restarts the timer and resend packets
in queue. In e.g. case Zephyr sends second packet in a list
immediately.

@@ -661,12 +661,47 @@ int net_tcp_prepare_ack(struct net_tcp *tcp, const struct sockaddr *remote,
return -EINVAL;
}

static void get_src_addr_ptr(struct net_tcp *tcp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a right change, see #5481

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find helper function for this purpose. May be I missed it. I will take a look.

@@ -562,27 +562,45 @@ struct net_context *select_client_ctx(struct net_app_ctx *ctx,

#if defined(CONFIG_NET_APP_SERVER)
#if defined(CONFIG_NET_TCP)
static struct net_context *get_server_ctx(struct net_app_ctx *ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message:

E.g. One net_app_ctx and two net_contexts. 'For' loop goes through
     array of net_contexts and searching for matching context.
     Let's say caller required net_context is second in the

What's the purpose of that intent? Please reformat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, that description also doesn't sound too clear to me (but I'm not familiar with net_app_ctx internals, maybe that's it).

@pfalcon
Copy link
Contributor

pfalcon commented Feb 8, 2018

My usual comment applies: this PR packs the full spectrum of changes - from obviously correct to obviously incorrect, with anything inbetween.

At this point, I'm keen to say that Github reviews are failing for us, and we should go back to Gerrit which enforces "one commit per review" rule. <--- @nashif: sharing an opinion.

@nashif
Copy link
Member

nashif commented Feb 8, 2018

At this point, I'm keen to say that Github reviews are failing for us, and we should go back to Gerrit which enforces "one commit per review" rule. <--- @nashif: sharing an opinion.

I am not sure how this can be true, you could have 1000s of commits in 1 single gerrit change set, same thing as a PR in GH.

Review per commit, and not as a PR, just the same way you would have done in gerrit.

i agree we should not put too much unrelated stuff into one PR, it can be confusing, commits in 1 single PR need to be related.

And finally, No, we are not going back to gerrit.

default interface.

config NET_DEFAULT_IF_NONE
bool "Default interface is first interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be shorter so what about "First available interface"

default interface.

config NET_DEFAULT_IF_ETHERNET
bool "Ethernet interface is the default interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just "Ethernet" here

depends on NET_L2_ETHERNET

config NET_DEFAULT_IF_BLUETOOTH
bool "Bluetooth interface is the default interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Bluetooth"

depends on NET_L2_BT

config NET_DEFAULT_IF_IEEE802154
bool "IEEE802.15.4 interface is the default interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"IEEE 802.15.4"

depends on NET_L2_IEEE802154

config NET_DEFAULT_IF_OFFLOAD
bool "OFFLOAD interface is the default interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Offloaded interface"

depends on NET_L2_OFFLOAD

config NET_DEFAULT_IF_DUMMY
bool "Dummy interface is the default interface"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dummy testing interface"

if (__net_if_start == __net_if_end) {
return NULL;
return iface;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Paul

@jukkar
Copy link
Member

jukkar commented Feb 9, 2018

Most of the patches seem not to be related to each other but I am not sure if it is worth sending them one by one. But if needed at least TCP stuff could be a separate PR, and interface selection and net-app fixes. So this could be split into three PRs.

}

send_ack(context, &conn->remote_addr, false);

Copy link
Collaborator Author

@rveerama1 rveerama1 Feb 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfalcon "ret = packet_received(conn, pkt, context->tcp->recv_user_data);" Sending ack doesn't depend on returned value. Irrespective of return value, ACK will be prepared and sent.

I noticed an issue related to this while trying with RPL BR sample. When TCP handover the payload in callback, then net_app callback, and then http call back. (e.g connected) all the way goes to application. Here an application has kilobytes of payload to reply. Now net_pkt_append() doesn't append more than "datalen in net_pkt", http_prepare_and_send() create new packets until all the payload filled in buffers. After all these shebang ACK will be prepared and sent. Which is causing sometimes a delay in ACK, and the peer sends another DUP packet. Scenarios like this delay ACK preparation. And like I said current code doesn't really check return value, I thought it would make more sense to send the ACK first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrespective of return value, ACK will be prepared and sent.

Yes, an ACK packet will be sent in either case. But contents of the ACK packet are expected to depend on app's receive callback (i.e. receive callback may affect data put into an ACK reply), specifically, the receive window: 19ff963 .

I noticed an issue related to this while trying with RPL BR sample. ...

Thanks for the clear description of the problem you saw. I would say, this description should go into the commit message.

But then, we found yet another issue with callback-based data processing. And it can be classified as a case of https://cwe.mitre.org/data/definitions/405.html - for a small request, "http_prepare_and_send() create new packets until all the payload filled in buffers", and all that happens in a not very appropriate place - synchronous callback from the network stack, which affects the stack performance.

I'm afraid, we'd need to find another solution, and such solution exists - BSD Sockets. How it works is that there's decoupling between packet reception of packets by network stack and consumption by an application - they both run in separate threads and communicate via queue. There's also the flow control implemented, mandated by TCP, using the concept of "receive window": when a packet is just received from peer peer and put into the queue, the recv_wnd is decreased (and communicated back in an ACK packet), signaling the peer that it has decreased quota to send more data (which eventually may get to 0). recv_wnd is increased back only when data is actually consumed from the queue.

Due to the fact that we have usecases of both updating and not update recv_wnd, the (optional) update happens in recv callback. Thus, this patch would break this processing, and will break cases of communication large amounts of data (megabytes) over sockets.

My suggestion is to separate this patch into a separate PR, and further analyze and test it from different sides, but opinion is that we'd need to find an alternative way to address this issue. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, we found yet another issue with callback-based data processing. And it can be classified as a case of https://cwe.mitre.org/data/definitions/405.html - for a small request, "http_prepare_and_send() create new packets until all the payload filled in buffers", and all that happens in a not very appropriate place - synchronous callback from the network stack, which affects the stack performance.

I am experimenting that in my WIP PR of RPL BR. Running a separate thread to reply websocket messages. Checking for proper solution.
b61d0a5#diff-780b2211d7789425a2584af4afc52b77R1881

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, let's take it away patch from this PR, time being I will keep it in #5035. I will create a separate PR for this and list issues over there.

Copy link
Collaborator Author

@rveerama1 rveerama1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, please have a look.

@rveerama1
Copy link
Collaborator Author

Split the patches into separate PRs (#6073, #6074).
@pfalcon : addressed your comments, except duplicate function to get src socket ptr. if issue #5481 provides any generic API, I will re-use that.

@pfalcon
Copy link
Contributor

pfalcon commented Feb 9, 2018

@pfalcon : addressed your comments, except duplicate function to get src socket ptr. if issue #5481 provides any generic API, I will re-use that.

Thanks. Please let me check the current situation with that, and come up with a specific suggestion.

@rveerama1
Copy link
Collaborator Author

Dropped the ACK sending before payload handler patch.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jukkar
Copy link
Member

jukkar commented Feb 12, 2018

@pfalcon could you re-review the PR as now the controversial "ACK sending before payload handler" commit is dropped from this set.

#endif

end:
addr = &tcp->context->local;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead statement, and apparently doesn't do what was intended by the author (what?).

@pfalcon
Copy link
Contributor

pfalcon commented Feb 12, 2018

@pfalcon could you re-review the PR as now the controversial "ACK sending before payload handler" commit is dropped from this set.

Sure, as my last comment 2 comments above yours says, the last concern is still get_src_addr_ptr() function added in 3146804#diff-a278935f8d5d5e0b4424ca3e1182d48cR664

@jukkar, @rveerama1 : And I'm glad that I took another look at it. Here're concerns with it:

  1. It contains an obviously dead statement, assignment to a local var without further use (like logical thinko re: logic that assignment should perform).
  2. As I mentioned, my concern (based on the name of the function) was that it would be 4th function with almost similar purpose in the codebase (variability of src/dst addr_ptr/no-ptr), tracked in net: Another duplicate function to get pkt address #5481. But looking in more detail into it, turns out "get_src_addr_ptr" isn't what it does! What it appears to do is to copy struct sockaddr into struct sockaddr_ptr, with an adhoc exception for a case when struct sockaddr is NULL (coded improperly per p.1). In particular, it doesn't copy "src" addr, it can copy any addr. Summing: the problem with that function is the naming: it's named based on the usage in the single place, instead of based on what it does. Yet, it's named generically enough to cause confusion re: if it can be reused or should reuse existing similar functions. Given that it's called from a single place, I'd suggest to just put the code directly there. Failing that, I'd suggest to make a generic copy_sockaddr_to_sockaddr_ptr() function, and still remove adhoc check if (!local) ... &tcp->context->local check from it. @jukkar , maybe you can suggest if such a generic copy function already exists.

k_delayed_work_cancel(&context->tcp->fin_timer) called twice
immediately one after other.

Signed-off-by: Ravi kumar Veeramally <[email protected]>
In case of failed to get source address from the net packet,
release the assgined tcp backlog entry. Otherwise it will
never be freed.

Signed-off-by: Ravi kumar Veeramally <[email protected]>
Zephyr doesn't have luxury to create re-transmit timer per packet. So
it has different methods to handle packets in queue. But re-sending
packets on valid ack messages causing issues.

E.g. A TCP node sent two packets (packet-1, packet-2). Peer replied
two ACKs (ACK-1 and ACK-2), and these two ACK's are at rx_thread
queue. Now ACK-1 is handled and reference of packet-1 is freed
from sent list. Then if condiftion (valid ACK and connection
state is ESTABLISHED) notices that, sent list is not empty.
Restart the timer, modify sent flag and resend packets in a
list. Here packet-2 is sent again, even though ACK-2 is already
received. Situation is worse if there are more packets in the
list.

So only start the re-transmit timer in-case queue is not empty. It
allows rx_thread to handle all incoming packets (in this e.g ACKs).
When the re-trasmit timer expires, it sends the packets which
are left in queue.

Signed-off-by: Ravi kumar Veeramally <[email protected]>
If context is bound to IPv6 unspecified addresss and some port
number, then unspecified address is passed in TCP reset packet
message preparation. Eventually packet dropped at the peer.

Signed-off-by: Ravi kumar Veeramally <[email protected]>
@rveerama1
Copy link
Collaborator Author

@pfalcon updated PR. Let me know if I address your comments.

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated PR. Let me know if I address your comments.

Looks good, thanks much for reworking it!

@jukkar jukkar merged commit 18422de into zephyrproject-rtos:master Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants