-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #984 #1155
Fix #984 #1155
Conversation
Move DTLS common routines to separate file; TLS hanshake cleanups
…erse proxy and EC J-PAKE as experimental and non-mandatory.
Some FSM DSL defines are moved to lib/fsm.h, http_limit.c ported to the new API. Address #391.12: ss_skb_alloc() extended with an agrument for head room. Many cleanups again.
However, linux asymmetric and elliptic crypto seems incomplete, so I leave the old RSA and ECDH for now. Also note that generally speaking cipher.[ch] and md.[ch] wrappers are for elimination in future - now they just link bunch of mbedTLS code w/ linux/crypto.
…d even TCP segment; Multiple handshakes FSM fixes.
Fix #1033: Change header name format in HTTP tables configuration.
Fix #772: Change 'keepalive' and 'client_body' timeouts applying.
The page is used later as skb page fragment in TLS handshake. 2. Cleanups in PEM decoder. 3. Remove DHM routines called only for FS IO (unused). 4. Revert DTLS routines (dirty code) as it's supposed to used them for QUIC.
Fix #900: Change some comments and add unit tests.
split skb as soon http_parser returned TFW_PASS
Use standart bitops functions for message flags.
Fix code review comments in this patch since there is significant overlaps in skb offsets. 1. chop skb before splitting in HTTP processing to chop all skbs. 2. offset on TLS layer comes from TCP seqnos overlap, so don't account it for TLS overhead; 3. chop skb TCP overhead before inserting the skb into the list of TLS record skbs.
so tfw_tls_msg_process() must store skb_list for further chopping in tfw_tls_chop_skb_rec(). Since ttls_recv() zeroes almost full IO, tfw_tls_chop_skb_rec() doesn't need to zero it.
Tempesta TLS performance optimizations
less bytes than the TLS overhead as well as allocate an extra skb.
1. @nsize was copy&pasted from tcp_fragment(), but the last one uses it only for fast path with skb w/o frags. 2. reserved_tailroom is in union with mark which we process separately, so the field isn't compatible with current Tempesta code. Also it's used for egress path only and we don't need it on ingress path where ss_skb_split() is called. 3. GSO segementation for skb wasn't accounted: make couple of comments in TLS code and initialize it for split skb. (Later kernel patch will bring small logic on it as well.) Some cleanups. -jN builds sometetimes still fail in libtdb/tdbq dependence (see commit 1fc007d).
make -j4 clean all.
1. accurately fix skb->truesize and TCP write memory in kernel by tcp_skb_unclone(); 2. __split_pgfrag_del() if we just move pointers, then we do not free TCP write memory, so do not change skb->truesize. 3. ss_skb_unroll(): truesize and data_len/len are completely different counters, so do not mix them in ss_skb_adjust_data_len(). By the way, during the tests I saw crazy skb overheads - truesize can be larger than len in tens kilobytes. The explanation for such overheads is various fragments stoling (e.g. our __split_pgfrag_del) and cloning. 4. cleanup: move ss_skb coalescing functions closer to their calls.
tempesta_fw/ss_skb.c
Outdated
skb->truesize -= n; | ||
|
||
/* | ||
* Initialize GSO segments counter to let TCP set it accoring to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accoring
-> according
tempesta_fw/tls.c
Outdated
|
||
tfw_tls_tcp_add_overhead(sk, head_sz + tag_sz); | ||
/* | ||
* TLS record header is always allocated form the reserved skb headroom. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
form
-> from
tempesta_fw/tls.c
Outdated
/* | ||
* TLS record header is always allocated form the reserved skb headroom. | ||
* The room for the tag may also be allocated from the reserved tailroom | ||
* or in a new page frament in slb_tail or next, probably new, skb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frament
-> fragment
slb_tail
-> skb_tail
tempesta_fw/ss_skb.c
Outdated
buff->truesize += nlen; | ||
skb->truesize -= nlen; | ||
n = skb->len - len; | ||
buff->truesize += n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that without nsize
(or asize
) we have doubled '(skb_headlen(skb) - len)'
part in buff->truesize
, if 'len < skb_headlen(skb)'
: since 'skb->truesize = SKB_TRUESIZE(size)'
in 'alloc_skb_fclone()->__alloc_skb_init()'
and 'size == asize'
in this case; so, we already included asize
into skb->truesize
during alloc_skb_fclone()
and we need to subtract it from 'skb->len - len'
value.
@@ -634,15 +634,17 @@ __split_pgfrag_del(struct sk_buff *skb_head, struct sk_buff *skb, int i, int off | |||
if (likely(!off)) { | |||
frag->page_offset += len; | |||
skb_frag_size_sub(frag, len); | |||
ss_skb_adjust_data_len(skb, -len); | |||
skb->len -= len; | |||
skb->data_len -= len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite clear why we do not adjust skb->truesize
here during removal the part of fragment, but in case of the full fragment deletion (in previous block, above) - we adjust it.
E.g. in another but similar case - in ss_skb_split()
during splitting the original skb - we adjust skb->truesize
, including the cases when splitting boundary located in the middle of the fragment of original skb: this can be seen in skb_split() -> skb_split_no_header()
and we set skb->truesize = skb->len - len
in ss_skb_split()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the change is that skb->truesize
must account all the memory used by the skb. The accounted memory is required to for socket memory accounting: how much memory do we use by all skbs in the socket. If we call skb_frag_size_sub()
, then we just move the grament pointer and don't free any memory, so the allocated memory for the skb remains the same.
Regarding skb_split() -> skb_split_no_header()
I'd say that the kernel is very inconsistent in accounting skb->truesize
. If you print skb->len
and skb->truesize
for some benchmark, then you can see difference 10-30KB for 40-60KB skbs due to clones and the framgents playing. See for example skb_gro_receive()
for skb->head_frag != NULL)
- the skb head is stolen to head->frags
, head
truesize is incremented and the data will be processed as its frag, but skb
continues to account the head data in truesize
. See also pskb_expand_head()
which not always update truesize.
If you run a test against vanilla kernel with a small patch:
root@debian:~/linux-4.14.32-tfw# git log
commit 07d7b6fe5a8973b79b4e1bff2d9da889e3fd4379
Author: Alexander K <[email protected]>
Date: Thu Apr 5 16:28:25 2018 +0300
Vanilla Linux 4.14.32
root@debian:~/linux-4.14.32-tfw# git diff
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cab4b935..87b7b9c1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1638,6 +1638,9 @@ int tcp_v4_rcv(struct sk_buff *skb)
if (!pskb_may_pull(skb, sizeof(struct tcphdr)))
goto discard_it;
+ if (skb->truesize > skb->len +2048)
+ pr_err("large truesize=%u > len=%u+2k\n", skb->truesize, skb->len);
+
th = (const struct tcphdr *)skb->data;
if (unlikely(th->doff < sizeof(struct tcphdr) / 4))
And run a benchmark envolvin heavy GRO, then you'll see truesizes much large than len. This is not the most large difference which I saw:
TCP: large truesize=19456 > len=14512+2k
tempesta_fw/tls.c
Outdated
@@ -297,6 +297,8 @@ tfw_tls_encrypt(struct sock *sk, struct sk_buff *skb, unsigned int limit) | |||
* if there is no free frag slot in skb_tail, a new skb is allocated. | |||
*/ | |||
next = skb_tail->next; | |||
t_sz_curr = skb_tail->truesize; | |||
t_sz_next = next != skb ? next->truesize : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like condition next != skb
is always true here, since next
can be either skb_tail->next
(which definitely not the skb
itself) or sk->sk_write_queue
if skb_tail
is last in the write queue.
tempesta_fw/tls.c
Outdated
/* | ||
* A new frag is added to the end of the current skb or | ||
* begin of the next skb. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass skb->next
or skb_tail->next
as skb_head
to ss_skb_expand_head_tail()
; code of __extend_pgfrags()
procedure
Line 314 in 74556b1
if (nskb != skb_head && !skb_headlen(nskb) |
ss_skb_expand_head_tail()
also confirms that.
Actually there are several fixes around skb->truesize and splitting, please see the commits comments for the details. Now the tests from #984 pass.