Fix #984 #1155

krizhanovsky · 2019-01-20T22:07:57Z

Actually there are several fixes around skb->truesize and splitting, please see the commits comments for the details. Now the tests from #984 pass.

Move DTLS common routines to separate file; TLS hanshake cleanups

…erse proxy and EC J-PAKE as experimental and non-mandatory.

Some FSM DSL defines are moved to lib/fsm.h, http_limit.c ported to the new API. Address #391.12: ss_skb_alloc() extended with an agrument for head room. Many cleanups again.

Fix #1013: Avoid gcc-7 warning in conditional ternary operator (#1013).

However, linux asymmetric and elliptic crypto seems incomplete, so I leave the old RSA and ECDH for now. Also note that generally speaking cipher.[ch] and md.[ch] wrappers are for elimination in future - now they just link bunch of mbedTLS code w/ linux/crypto.

).

…d even TCP segment; Multiple handshakes FSM fixes.

…ion.

Fix #1033: Change header name format in HTTP tables configuration.

Fix #772: Change 'keepalive' and 'client_body' timeouts applying.

The page is used later as skb page fragment in TLS handshake. 2. Cleanups in PEM decoder. 3. Remove DHM routines called only for FS IO (unused). 4. Revert DTLS routines (dirty code) as it's supposed to used them for QUIC.

Fix #900: Change some comments and add unit tests.

…quest

split skb as soon http_parser returned TFW_PASS

Use standart bitops functions for message flags.

Fix code review comments in this patch since there is significant overlaps in skb offsets. 1. chop skb before splitting in HTTP processing to chop all skbs. 2. offset on TLS layer comes from TCP seqnos overlap, so don't account it for TLS overhead; 3. chop skb TCP overhead before inserting the skb into the list of TLS record skbs.

so tfw_tls_msg_process() must store skb_list for further chopping in tfw_tls_chop_skb_rec(). Since ttls_recv() zeroes almost full IO, tfw_tls_chop_skb_rec() doesn't need to zero it.

Tempesta TLS performance optimizations

less bytes than the TLS overhead as well as allocate an extra skb.

@nsize

1. @nsize was copy&pasted from tcp_fragment(), but the last one uses it only for fast path with skb w/o frags. 2. reserved_tailroom is in union with mark which we process separately, so the field isn't compatible with current Tempesta code. Also it's used for egress path only and we don't need it on ingress path where ss_skb_split() is called. 3. GSO segementation for skb wasn't accounted: make couple of comments in TLS code and initialize it for split skb. (Later kernel patch will bring small logic on it as well.) Some cleanups. -jN builds sometetimes still fail in libtdb/tdbq dependence (see commit 1fc007d).

make -j4 clean all.

1. accurately fix skb->truesize and TCP write memory in kernel by tcp_skb_unclone(); 2. __split_pgfrag_del() if we just move pointers, then we do not free TCP write memory, so do not change skb->truesize. 3. ss_skb_unroll(): truesize and data_len/len are completely different counters, so do not mix them in ss_skb_adjust_data_len(). By the way, during the tests I saw crazy skb overheads - truesize can be larger than len in tens kilobytes. The explanation for such overheads is various fragments stoling (e.g. our __split_pgfrag_del) and cloning. 4. cleanup: move ss_skb coalescing functions closer to their calls.

aleksostapenko · 2019-01-24T09:19:12Z

tempesta_fw/ss_skb.c

+	skb->truesize -= n;
+
+	/*
+	 * Initialize GSO segments counter to let TCP set it accoring to


accoring -> according

aleksostapenko · 2019-01-24T09:44:41Z

tempesta_fw/tls.c


-	tfw_tls_tcp_add_overhead(sk, head_sz + tag_sz);
+	/*
+	 * TLS record header is always allocated form the reserved skb headroom.


form -> from

aleksostapenko · 2019-01-24T09:45:54Z

tempesta_fw/tls.c

+	/*
+	 * TLS record header is always allocated form the reserved skb headroom.
+	 * The room for the tag may also be allocated from the reserved tailroom
+	 * or in a new page frament in slb_tail or next, probably new, skb.


frament -> fragment
slb_tail -> skb_tail

aleksostapenko · 2019-01-24T13:53:17Z

tempesta_fw/ss_skb.c

-	buff->truesize += nlen;
-	skb->truesize -= nlen;
+	n = skb->len - len;
+	buff->truesize += n;


It seems that without nsize (or asize) we have doubled '(skb_headlen(skb) - len)' part in buff->truesize, if 'len < skb_headlen(skb)': since 'skb->truesize = SKB_TRUESIZE(size)' in 'alloc_skb_fclone()->__alloc_skb_init()' and 'size == asize' in this case; so, we already included asize into skb->truesize during alloc_skb_fclone() and we need to subtract it from 'skb->len - len' value.

aleksostapenko · 2019-01-24T20:42:43Z

tempesta_fw/ss_skb.c

@@ -634,15 +634,17 @@ __split_pgfrag_del(struct sk_buff *skb_head, struct sk_buff *skb, int i, int off
 	if (likely(!off)) {
 		frag->page_offset += len;
 		skb_frag_size_sub(frag, len);
-		ss_skb_adjust_data_len(skb, -len);
+		skb->len -= len;
+		skb->data_len -= len;


Not quite clear why we do not adjust skb->truesize here during removal the part of fragment, but in case of the full fragment deletion (in previous block, above) - we adjust it.
E.g. in another but similar case - in ss_skb_split() during splitting the original skb - we adjust skb->truesize, including the cases when splitting boundary located in the middle of the fragment of original skb: this can be seen in skb_split() -> skb_split_no_header() and we set skb->truesize = skb->len - len in ss_skb_split().

The reason for the change is that skb->truesize must account all the memory used by the skb. The accounted memory is required to for socket memory accounting: how much memory do we use by all skbs in the socket. If we call skb_frag_size_sub(), then we just move the grament pointer and don't free any memory, so the allocated memory for the skb remains the same.

Regarding skb_split() -> skb_split_no_header() I'd say that the kernel is very inconsistent in accounting skb->truesize. If you print skb->len and skb->truesize for some benchmark, then you can see difference 10-30KB for 40-60KB skbs due to clones and the framgents playing. See for example skb_gro_receive() for skb->head_frag != NULL) - the skb head is stolen to head->frags, head truesize is incremented and the data will be processed as its frag, but skb continues to account the head data in truesize. See also pskb_expand_head() which not always update truesize.

If you run a test against vanilla kernel with a small patch:

root@debian:~/linux-4.14.32-tfw# git log commit 07d7b6fe5a8973b79b4e1bff2d9da889e3fd4379 Author: Alexander K <[email protected]> Date: Thu Apr 5 16:28:25 2018 +0300 Vanilla Linux 4.14.32 root@debian:~/linux-4.14.32-tfw# git diff diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cab4b935..87b7b9c1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1638,6 +1638,9 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; + if (skb->truesize > skb->len +2048) + pr_err("large truesize=%u > len=%u+2k\n", skb->truesize, skb->len); + th = (const struct tcphdr *)skb->data; if (unlikely(th->doff < sizeof(struct tcphdr) / 4))

And run a benchmark envolvin heavy GRO, then you'll see truesizes much large than len. This is not the most large difference which I saw:

TCP: large truesize=19456 > len=14512+2k

aleksostapenko · 2019-01-26T06:28:38Z

tempesta_fw/tls.c

@@ -297,6 +297,8 @@ tfw_tls_encrypt(struct sock *sk, struct sk_buff *skb, unsigned int limit)
 	 * if there is no free frag slot in skb_tail, a new skb is allocated.
 	 */
 	next = skb_tail->next;
+	t_sz_curr = skb_tail->truesize;
+	t_sz_next = next != skb ? next->truesize : 0;


It looks like condition next != skb is always true here, since next can be either skb_tail->next (which definitely not the skb itself) or sk->sk_write_queue if skb_tail is last in the write queue.

aleksostapenko · 2019-01-26T07:27:29Z

tempesta_fw/tls.c

+		/*
+		 * A new frag is added to the end of the current skb or
+		 * begin of the next skb.
+		 */


We pass skb->next or skb_tail->next as skb_head to ss_skb_expand_head_tail(); code of __extend_pgfrags() procedure

tempesta/tempesta_fw/ss_skb.c

Line 314 in 74556b1

if (nskb != skb_head && !skb_headlen(nskb)

in turn points that in this case new skb will be allocated instead of inserting frag in the next skb, and comment before ss_skb_expand_head_tail() also confirms that.

krizhanovsky and others added 30 commits June 25, 2018 14:18

Fix tls unit tests; fix time checking in x509; some cleanups

8221034

Gigantic messy commit which is no way to review

8b950af

Remove more weak ciphers and hashes

a700498

Concept of TLS I/O, many code cleanups, a lot of mess

479e2f1

Fix merge conflicts

df92ca3

Kernel: tcp_write_xmit() hook to create dynamically-sized TLS records

e7b320a

Move DTLS common routines to separate file; TLS hanshake cleanups

Remove Pre-Shared Key exchange as not needed for high-performance rev…

fe3c066

…erse proxy and EC J-PAKE as experimental and non-mandatory.

Zero(almost)-copy TLS handshakes FSM;

96a079f

Some FSM DSL defines are moved to lib/fsm.h, http_limit.c ported to the new API. Address #391.12: ss_skb_alloc() extended with an agrument for head room. Many cleanups again.

Always enable ECDSA, ECDH and ASN1; rempve EAP-TLS support

d94be05

Avoid gcc-7 warning in conditional ternary operator (#1013).

688cad1

Merge pull request #1035 from tempesta-tech/ao-1013

211452d

Fix #1013: Avoid gcc-7 warning in conditional ternary operator (#1013).

Fix #1033: Change header name format in HTTP tables configuration (#1033

d206e6f

).

Multiple cleanups and fixes; Call TLS encryption for min(cwnd,rwnd)

4dfdd09

Correction processing of wildcard '*' argument (#1033).

93b2a52

Send TLS handshake messages in one data chunk, probably in one skb an…

20976f3

…d even TCP segment; Multiple handshakes FSM fixes.

Fix #772: Change keepalive and Frang client_body timeouts implementat…

727c681

…ion.

Fix bug in 'http_field_len' verification in case of duplicated headers.

8930102

Changes according review comments (#1033).

e29d8cc

Correct matching of multiple headers in 'match_hdr_raw()' (#1033).

0bfd5ea

Merge pull request #1036 from tempesta-tech/ao-1033

88252a4

Fix #1033: Change header name format in HTTP tables configuration.

Merge pull request #1039 from tempesta-tech/ao-772

2c88221

Fix #772: Change 'keepalive' and 'client_body' timeouts applying.

Fix TLS record headers writting in multi-record data transfers

134d5a6

Small TLS Handshakes FSM fixes

613497c

Fix #535: Block clients which requests have no session cookie.

ac32d62

Fix #900: Change some comments and add unit tests.

bfc5e90

Additional comment added (#900).

1086abb

Cleanups. Completely remove DTLS as QUIC uses TLS 1.3.

eb03fd6

Merge pull request #1051 from tempesta-tech/ao-900

0801674

Fix #900: Change some comments and add unit tests.

vankoven and others added 17 commits December 28, 2018 04:24

Fix skb leakage if client connection is to be closed after current re…

5394c8a

…quest

Merge pull request #1134 from tempesta-tech/ik-fix-msg-splitting

c76c9c0

split skb as soon http_parser returned TFW_PASS

Use standart bitops functions for message flags.

c60b85a

remove unused TFW_CONN_F_* macroses

5f73294

Better describe TFW_HTTP_B_CONN* flags

9f748c0

fix incorrect BUILD_BUG_ON macro behaviour

54b08b0

update the __TFW_HTTP_MSG_M_CONN_MASK name to conform usual name scheme

119a980

Merge pull request #1140 from tempesta-tech/ik-bitops

067829b

Use standart bitops functions for message flags.

ttls_recv() initializes IO context after each processed TLS record,

9fad17c

so tfw_tls_msg_process() must store skb_list for further chopping in tfw_tls_chop_skb_rec(). Since ttls_recv() zeroes almost full IO, tfw_tls_chop_skb_rec() doesn't need to zero it.

Remove unused flag

8abf92b

Merge pull request #1037 from tempesta-tech/ak-tls-memcpq

1751988

Tempesta TLS performance optimizations

TLS: fix TCP socket write memory accounting - we may actually allocate

eedbad8

less bytes than the TLS overhead as well as allocate an extra skb.

Remove unnecessary kernel comment - now we clearly reset TCP connections

31e2fcf

Declare the while target all as phony to make clean build

9862238

make -j4 clean all.

krizhanovsky assigned i-rinat, vankoven and aleksostapenko Jan 20, 2019

krizhanovsky requested review from i-rinat, vankoven and aleksostapenko January 20, 2019 22:07

aleksostapenko reviewed Jan 24, 2019

View reviewed changes

aleksostapenko reviewed Jan 26, 2019

View reviewed changes

krizhanovsky closed this Jan 26, 2019

krizhanovsky force-pushed the ak-984 branch from 74556b1 to d964ff4 Compare January 26, 2019 22:04

krizhanovsky mentioned this pull request Jan 27, 2019

Fix #984 #1161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #984 #1155

Fix #984 #1155

krizhanovsky commented Jan 20, 2019

aleksostapenko Jan 24, 2019

aleksostapenko Jan 24, 2019

aleksostapenko Jan 24, 2019

aleksostapenko Jan 24, 2019 •

edited

Loading

aleksostapenko Jan 24, 2019 •

edited

Loading

krizhanovsky Jan 27, 2019

aleksostapenko Jan 26, 2019

aleksostapenko Jan 26, 2019

Fix #984 #1155

Fix #984 #1155

Conversation

krizhanovsky commented Jan 20, 2019

aleksostapenko Jan 24, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 24, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 24, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 24, 2019 • edited Loading

Choose a reason for hiding this comment

aleksostapenko Jan 24, 2019 • edited Loading

Choose a reason for hiding this comment

krizhanovsky Jan 27, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 26, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 26, 2019

Choose a reason for hiding this comment

aleksostapenko Jan 24, 2019 •

edited

Loading

aleksostapenko Jan 24, 2019 •

edited

Loading