Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tempesta TLS performance optimizations #1037

Merged
merged 136 commits into from
Dec 31, 2018
Merged

Tempesta TLS performance optimizations #1037

merged 136 commits into from
Dec 31, 2018

Conversation

krizhanovsky
Copy link
Contributor

@krizhanovsky krizhanovsky commented Jul 10, 2018

Fix #614:

  1. Fix the memory leak in TLS;
  2. Remvoed rx and tx copying queues in TLS;
  3. remove copyings skb_{copy,store}_bits(), all crypto is done in-place now;
  4. fix multiple allocations, calloc(), malloc(), free() wrappers are almost eliminated. At least from crucial workflos for now
  5. fully rewritten mbedTLS state machine for TLS handshakes, so that now it works with chunked data in zero-copy fashion;
  6. unification with linux/crypt - we use the crypt, but RSA and ECDSA which seems incomplete in the kernel
  7. encrypt TLS records with dynamic size calculated in tcp_write_xmit() (see TLS linux-4.14.32-tfw#3): we encrypt several skbs, but not more than a minimum between TCP congestion window and the receiver's advertised window;
  8. Address Redesign of TCP synchronous sending and data caching #391 point 12: now we can use skb->head as paged fragments;
  9. Introduced unified logging as lib/log.h and unified return values
  10. ss_skb_process(): process skb chains, replace parser->to_go by return value variable, so ss_skb_process() returns how much data was actually processed (seems buggy);
  11. Fix Frang: rework limits for number of chunks #214
  12. Address RFC7525 by removing weak algorithms.
  13. Many cleanups and coding style adjustments.

…H_* ciphersuite,

but RFC 5246 defines them as mandatory.

This reverts commit fda4945.
…H_* ciphersuite,

but RFC 5246 defines them as mandatory.

This reverts commit fda4945.
tls/tls_srv.c Outdated
/*
* This is, after ttls_choose_ciphersuite() call, the earliest time when
* when we know which hash function to use and now we can start to
* compute the handshake session checksu.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double when word.
checksu -> checksum.

tls/tls_srv.c Outdated

/* Read extensins length. */
T_FSM_STATE(TTLS_CH_HS_EXTLEN) {
/* Use the first 2 bytes for toal number of extensions. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe total instead of toal should be.
In previous comment: extensins -> extensions.

tls/tls_srv.c Outdated
* from a client and then selects a stream or Authenticated Encryption
* with Associated Data (AEAD) ciphersuite, it MUST NOT send an
* encrypt-then-MAC response extension back to the client."
* We don't support other siphersuites, so we don't send EtM extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

siphersuites -> ciphersuites

tls/tls_srv.c Outdated
T_FSM_EXIT();
CHECK_STATE(1024);
/*
* RFC 5246 Certificate Request is optional, so don't requiest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiest -> request

* @curves - supported elliptic curves;
* @randbytes - random bytes;
* @finished - temporal buffer for chunks of Finished message,
* @randbytes were used in prvious messages, so we can reuse it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prvious -> previous

};

/*
* Extend the common FSM for hanshake parsing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hanshake -> handshake

*read += p - buf; \
io->rlen += p - buf;

/* Move to @st if we have at least @need bytes. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that comment is outdated: @need parameter is not present in the macro.

tls/ttls.c Outdated

/*
* Free only the linked list wrapper, not the keys themselves
* since the belong to the SNI callback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the -> they.

tls/ttls.c Outdated
}

/*
* Calculate final message checksum before goint to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goint -> going

/*
* Perform TLS handshake if necessary and decrypt the TLS message
* in-place by chunks. Add skb to the list to build scatterlist if it
* it contains end of current message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double it word.

case T_OK:
/*
* A complete TLS message decrypted and ready for upper
* layer protocols processing - fall throught.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throught -> through

*
* Many sibling skbs can be produced by TLS and HTTP layers
* together - don't coalesce them: we process messages at once
* and it hase sense to work with sparse skbs in HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hase -> has

* Split @skb before calling HTTP layer to chop it and not let HTTP
* to read after end of the message.
*/
if (parsed < skb->len) {
Copy link
Contributor

@aleksostapenko aleksostapenko Dec 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that (off + parsed < skb->len) condition should be checked here.

tls/ttls.h Outdated
*
* \param ssl SSL context
*
* \return Protcol name, or NULL if no protocol was negotiated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protcol -> Protocol

tls/ttls.h Outdated
* data);
* @rlen - read bytes of the message body so far;
* @skb_list - list of skbs attachd to the current I/O context;
* @off - data offset within @skb_list, can be afther the 1st skb;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachd -> attached
afther -> after

tls/md.c Outdated
mbedtls_free(ctx->hmac_ctx);
md_ctx->tfm = crypto_alloc_shash(md_info->alg_name, 0, 0);
if (IS_ERR(md_ctx->tfm)) {
T_ERR("cannot initizlize hash driver %s."
Copy link
Contributor

@aleksostapenko aleksostapenko Dec 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initizlize -> initialize.
The same in __ttls_md_hmac_setup() below.

spin_unlock(&tls->lock);
next_msg:
ss_skb_queue_tail(&tls->io_in.skb_list, skb);
msg_skb = tls->io_in.skb_list;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, skb_list processing can be made without msg_skb variable: here and in tfw_tls_chop_skb_rec() - we have access to tls->io_in.skb_list.

Copy link
Contributor Author

@krizhanovsky krizhanovsky Dec 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also there was a bug that tfw_tls_msg_process() gets off from TCP layer as an overlap of sequence numbers, but we use the offset as TLS header for HTTP processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ttls_recv() initializes IO context after each processed TLS record so tfw_tls_msg_process() must store skb_list for further chopping in tfw_tls_chop_skb_rec(). Since ttls_recv() zeroes almost full IO, tfw_tls_chop_skb_rec() doesn't need to zero it. Fixed.


eaten_skb:
if (likely(skb->len > prolog)) {
off += prolog;
Copy link
Contributor

@aleksostapenko aleksostapenko Dec 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that case with (skb->len > prolog) and (skb->len == prolog + off) may appear here, i.e. skb's space is exhausted but we will continue to deal with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we chop the possible TCP seqnos overlap at the begin of tfw_tls_msg_process() because we can not link such overlapping skb at the middle of the skb chain for HTTP processing (there is no way to specify offsets for each skb in the list). So prolog went away and off is TLS header and IV.

Makefile Outdated
#
# $ DEBUG=3 DBG_SS=1 DBG_TLS=1 make clean all
#
# TODO can we generate the variables w/o the copy & paste?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use default values:

DEBUG ?= 0
DBG_CFG ?= 0
DBG_HTTP_PARSER ?= 0
TFW_CFLAGS += -DDEBUG=$(DEBUG) -DDBG_CFG=$(DBG_CFG) -DDBG_HTTP_PARSER=$(DBG_HTTP_PARSER) ...

And use in *.c files:

- #ifndef DBG_HTTP_PARSER
+ #if DBG_HTTP_PARSER == 0
    #undef DEBUG
  #endif

But the more modules we have the more extra debug definitions will be used.

/* Non-limited printing. */
#define T_ERR_NL(...) T_ERR(__VA_ARGS__)
#define T_WARN_NL(...) T_WARN(__VA_ARGS__)
#define T_LOG_NL(...) T_LOG(__VA_ARGS__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non limited printing is normally USER level warnings like errors or warnings while configuration parsing. There is no need to print call stacks here. TFW_(ERR|WARN|LOG) messages are also used for USER-level only messages and callstacks are not really required there and it sometimes annoying to search though system log and search between callstacks while some of them are not internal errors.

@@ -1131,8 +1214,7 @@ __parse_connection(TfwHttpMsg *hm, unsigned char *data, size_t len)
int r = CSTR_NEQ;
__FSM_DECLARE_VARS(hm);

BUILD_BUG_ON(sizeof(parser->hbh_parser.spec) * CHAR_BIT
< TFW_HTTP_HDR_RAW);
BUILD_BUG_ON(sizeof(parser->hbh_parser.spec) * 8 < TFW_HTTP_HDR_RAW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change is required? Just to fit single line? Not very critical since we support only x86_64, but I prefer to have CHAR_BITS or BITS_PER_BYTE here.

Copy link
Contributor Author

@krizhanovsky krizhanovsky Dec 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's undeclared and we need to add an include just for the constant

* upcall beside GFSM and SS, but that's efficient and I didn't
* find a simple and better solution.
*/
conn->sk->sk_write_xmit = tfw_tls_encrypt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, but better to use simply sk->sk_write_xmit as everywhere else.

Fix code review comments in this patch since there is significant
overlaps in skb offsets.

1. chop skb before splitting in HTTP processing to chop all skbs.
2. offset on TLS layer comes from TCP seqnos overlap, so don't
   account it for TLS overhead;
3. chop skb TCP overhead before inserting the skb into the list of
   TLS record skbs.
so tfw_tls_msg_process() must store skb_list for further chopping
in tfw_tls_chop_skb_rec(). Since ttls_recv() zeroes almost full
IO, tfw_tls_chop_skb_rec() doesn't need to zero it.
@@ -251,6 +250,8 @@ enum {
TFW_HTTP_B_WHITELIST,
/* Client was disconnected, drop the request. */
TFW_HTTP_B_REQ_DROP,
/* Request was received on TLS connection. */
TFW_HTTP_B_REQ_TLS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unused flag is still here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed.

Copy link
Contributor

@aleksostapenko aleksostapenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@krizhanovsky krizhanovsky merged commit 5a9d217 into master Dec 31, 2018
n = *sgn + 1;
} else {
off = io->off;
n += *sgn + io->chunks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined behaviour if buff = null: n is uninitialised.

}
to_read = min(len, skb->len - off);
n = skb_to_sgvec(skb, sg_i, off, to_read);
if (n <= 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition is always false, n is unsigned.

unsigned char *pms = tls->hs->premaster;
unsigned char ver[2], fake_pms[48], peer_pms[48];

BUILD_BUG_ON(sizeof(tls->hs->premaster) < 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(tls->hs->premaster) is unsigned value and never can be less than 0.


tls->hs->pmslen = 48;
/* mask = diff ? 0xff : 0x00 using bit operations to avoid branches */
mask = - ((diff | - diff) >> (sizeof(unsigned int) * 8 - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff | -diff seems to be invalid operation here. This operation always result as 0xffff_ffff .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not:

$ cat t.c 
#include <stdio.h>

int
main(int argc, char *argv[])
{
	unsigned int diff = 0;
	unsigned char mask;

	mask = -((diff | -diff) >> (sizeof(unsigned int) * 8 - 1));
	printf("mask = %x\n", mask);

	diff = 1;
	mask = -((diff | -diff) >> (sizeof(unsigned int) * 8 - 1));
	printf("mask = %x\n", mask);

	return 0;
}
$ gcc t.c && ./a.out 
mask = 0
mask = ff


tls->hs->pmslen = 48;
/* mask = diff ? 0xff : 0x00 using bit operations to avoid branches */
mask = - ((diff | - diff) >> (sizeof(unsigned int) * 8 - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff | -diff seems to be invalid operation here. This operation always result as 0xffff_ffff .

@krizhanovsky krizhanovsky deleted the ak-tls-memcpq branch January 1, 2019 15:16
@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Jan 1, 2019

Fixed in #1143 , @ikoveshnikov please review the new PR.

krizhanovsky added a commit that referenced this pull request Jan 10, 2019
@krizhanovsky krizhanovsky mentioned this pull request Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS performance extensions Frang: rework limits for number of chunks
4 participants