-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mekhanik evgenii/1196 #17
Changes from all commits
9541e05
da588fb
603f769
14ed604
1b67f4d
af2ec6d
a0a9fc6
f4353dc
b92249a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -507,16 +507,28 @@ struct sock { | |
void (*sk_data_ready)(struct sock *sk); | ||
void (*sk_write_space)(struct sock *sk); | ||
#ifdef CONFIG_SECURITY_TEMPESTA | ||
int (*sk_prepare_xmit)(struct sock *sk, | ||
struct sk_buff *skb, | ||
unsigned int mss_now, | ||
unsigned int *limit, | ||
unsigned int *skbs); | ||
/* | ||
* Tempesta FW callback to ecrypt one | ||
* or more skb in socket write queue | ||
* before sending. | ||
*/ | ||
int (*sk_write_xmit)(struct sock *sk, | ||
struct sk_buff *skb, | ||
unsigned int mss_now, | ||
unsigned int limit, | ||
unsigned int skbs); | ||
unsigned int limit); | ||
/* | ||
* Tempesta FW callback to prepare and push | ||
* skbs from Tempesta FW private scheduler | ||
* to socket write queue according sender | ||
* and receiver window. | ||
*/ | ||
int (*sk_fill_write_queue)(struct sock *sk, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should document this breaking change, explaining why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it could make sense to document both the two callback, when they are called and what they do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I don't quite understand why we moved frames from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I will add comment before merge PR. There were several problems with the previous approach:
|
||
unsigned int mss_now, | ||
int ss_action); | ||
/* | ||
* Tempesta FW callback to free all private | ||
* resources associated with socket. | ||
*/ | ||
void (*sk_destroy_cb)(struct sock *sk); | ||
#endif | ||
void (*sk_error_report)(struct sock *sk); | ||
|
@@ -876,6 +888,7 @@ enum sock_flags { | |
SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */ | ||
#ifdef CONFIG_SECURITY_TEMPESTA | ||
SOCK_TEMPESTA, /* The socket is managed by Tempesta FW */ | ||
SOCK_TEMPESTA_HAS_DATA /* The socket has data in Tempesta FW write queue */ | ||
#endif | ||
}; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need
flags
to notify when skb is transmitted, e.g. tempesta-tech/tempesta#2108.Of course,
on_tcp_entail
can do this as well, but it is not currently used for this purpose, at least not for all control frames.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should implement additional on_send_ function in tempesta for 2108, not use flags in skb. Moreover we remove additional callback from kernel code, so 2108 should be reworked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can't be in
on_send
callback, it only indicates entry into the write queue of sk, but the counter decrement must happen at the time of actual transmission, because only at that point in time the skb is moved to the TCP protocol transmission process, i.e.tcp_write_xmit
or its callers, e.g.__tcp_push_pending_frames
, and not accumulated in kernel memory for some reason, such as congestion or zero TCP window of the malicious peer. This is exactly the meaning of this counter, to avoid control frame flooding.That is, for 2108, the solution is either to keep the counter decrement in
sk_write_xmit
, and thetfw_cb.flag
is still needed, or to manipulate the counter directly inss_skb_on_tcp_entail
. No additional callbacks needed here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ss_skb_on_tcp_entail
is called before pushing skb to socket write queue. It is a good place to decrement counter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applications which works in user space and counted control frames don't know nothing about
__tcp_push_pending_frames
and there is no problem.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can postpone such frames if it is really necessary (and send them only when we have enough window) and count them in
ss_skb_on_tcp_entail
. I think that we move tls_encrypt to Tempesta code and there is no sense to have one more callback in linux kernel only to count control frames. User space applications like nginx doesn't know anything abouttcp_transmit_skb
. How they can count control frames?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EvgeniiMekhanik @krizhanovsky @const-t
I have to update some thoughts.
No, the user program relies on sockbuf to determine whether the skb send is in-flight. If sockbuf is full, the
send()
system call will fail because the peer does not provide enough window, or in non-blocking fd, the write event is shelved, then the counter will accumulate, and then the user program will know that it is in a flood attack.(Moreover, the user program can use TCP_INFO getsockopt to retrieve the TCP information to get known the in-flight bytes. In fact, Nginx uses it to define some variables, although not for our purpose exactly.)
However, in our case, it's a completely different story. tempesta does not use sockbuf, but directly enqueues to sk's sk_write_queue, so it will have memory risks unless we know that skb has started to be transmitted, that is, in-flight. How to know this fact? The only place is that
tcp_transmit_skb
successfully handles skb. So this is why I suggest that we should apply the counting logic there. Neither on_send (called in ss_tx_action) nor entail (corresponding to the original prepare_xmit), or even write_xmit callback, can meet our needs to mitigate the flood of control frame attacks. This is not a theory, but has been verified through testing.And the method suggested by @EvgeniiMekhanik does not work either, because similar to entail, the skbs pushed when the window is available does not mean that all skbs can be sent immediately, because it does not know how large the network capacity is and how many skbs can be sent, so it's likely that some skbs will be hold in the memory for an undetermined time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call
ss_skb_on_tcp_entail
just before we push skb to socket write queue. For HEADERS and DATA frames we calculate tcp window insk_fill_write_queue
and don't callss_skb_tcp_entail
->ss_skb_on_tcp_entail
if we have no enough window. To solve problem with control frames we should not callss_skb_tcp_entail
->ss_skb_on_tcp_entail
for all other type of frames if there is no enough tcp window. We should push them to the queue of postponed frames and push them to socket write queue only when we have enough TCP window. In this case we can counted control frames inss_skb_on_tcp_entail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. This is sufficient if we do not care about the strict accuracy of whether the skb was successfully sent by
tcp_transimit_skb
or was even eliminated fromtcp_rtx_queue
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the
flags
are replaced with theon_send
callback in #1973, so it's not desired to keep flags, especially with only one user. Meantine, I also don't like the idea of additional queueing since to wasted memory decreasing performance and opens opportunities for DoS attacks.However, in first place, I didn't get what do we fix with #2108, so I requested changes. Maybe I'm wrong and let's discuss the fix, but I proposed to control and limit flows to client in tempesta-tech/tempesta#498 (comment)
P.S. @kingluo @EvgeniiMekhanik thank you for inviting me to the discussion