-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #76 #886 #908 #911, many other fixes and optimizations #916
Conversation
…sly -tfw_http_conn_release_closed()
all the network and scheduler locks as well as RCU calls _bh. 2. Use RCU barriers to wait untill all memory is freed on configuration reloading before trying to allocate memory for new server groups. 3. Call RCU barrier once per a while to avoid massive CPU utilization by RCU callbacks (freeing throtling); 4. Move __tfw_sg_release_all_reconfig() to the unit test where it's supposed to be. The system still hungs with ksoftirqd soft lockup on reloading 100K server groups. It seems not all locks are fixed by _bh...
…ntexts; 2. Use GFP_KERNEL for configuration (process) context; 3. introduce tfw_srv_loop_sched_rcu() to let RCU make progress with long process context on !CONFIG_PREEMPT kernels; 4. use sempaphore in configuration (process) context instead of sg_lock; 5. remove sg->lock since it's only needed for configuration and procfs (process) context and there is no need to care about concurrency as well as no need to spend more memory for each group. Also sleepable context with tfw_srv_loop_sched_rcu() is very wished for RCU progress (see Documentation/RCU/stallwarn.txt);
procfs directory. With huge number of file procfs becomes almost unusable, but this is choice for a system administrator or a TODO for us to build a filesystem tree.
tempesta_fw/http.c
Outdated
* to be forwarded. The main effort is put into servicing requests | ||
* that are on time. Unlucky requests are just given another chance | ||
* with minimal effort. | ||
* That's the intended behaviour. Suche rescheduled requests are unlucky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suche
-> Such
@@ -414,10 +414,10 @@ tfw_http_conn_msg_free(TfwHttpMsg *hm) | |||
static inline void | |||
tfw_http_conn_req_clean(TfwHttpReq *req) | |||
{ | |||
spin_lock(&((TfwCliConn *)req->conn)->seq_qlock); | |||
spin_lock_bh(&((TfwCliConn *)req->conn)->seq_qlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is not necessity to insert *_bh
spin locks into code that is called from SoftIRQ context only (tfw_http_conn_req_clean()
and tfw_http_resp_fwd()
), as SoftIRQ handlers are serialized on the same CPU and simple spin_lock
without bottom halves disabling should be enough in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release of server connections may happen in user context. tfw_http_conn_release()
also acquires the lock and works with seq_list
that's why _bh()
is used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I didn't find any call from user context (e.g. for tfw_http_conn_req_clean()
).
On the other hand it seems that using *_bh
spin locks in SoftIRQ context does not affect either performance or lock contention. So this is just a discussion note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of process context calls: tfw_cfgop_update_sg_srv_list()
-> tfw_sock_srv_start_srv()
-> tfw_sock_srv_connect_srv()
-> tfw_sock_srv_connect_try_later()
-> tfw_connection_repair()
-> conn_repair = tfw_http_conn_repair()
-> tfw_http_conn_shrink_fwdq()
-> tfw_http_req_zap_error()
-> tfw_http_error_resp_switch()
-> tfw_http_send_resp()
-> tfw_http_resp_build_error()
-> tfw_http_conn_req_clean()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i was wrong here. Missed this path.
Thanks.
@@ -1136,12 +1136,12 @@ ss_sock_create(int family, int type, int protocol, struct sock **res) | |||
struct sock *sk = NULL; | |||
const struct net_proto_family *pf; | |||
|
|||
rcu_read_lock(); | |||
rcu_read_lock_bh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quite understand this replacement: it seems there are no spinlocks here (with possibility of user/softirq deadlock) that it was necessary to disable softirq; besides, synchronize_rcu()
(not *_bh
flavor) is used in kernel with net_families
array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the reason is the same as for #916 (comment) : ss_sock_create()
can be called from process contest on the system start as well as from softirq context on server reconnection attempt (timer context). Updated #736 (comment)
static LIST_HEAD(sg_list); | ||
static LIST_HEAD(sg_list_reconfig); | ||
static DEFINE_RWLOCK(sg_lock); | ||
#define TFW_SG_HBITS 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this parameter should be configurable - for cases when server groups count is much less (or much larger) than 1K.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spend 4 pages for both of the hash tables, so reducing hash size won't give us big memory win. There is no need to move the hash size to configuration, since we're able to estimate the optimal hash size by the number of configured server groups. I have no idea whether it has sense to optimize the data structure further to handle 100K and more server groups, since profiling results has shown timers as the main bottleneck.
@@ -414,10 +414,10 @@ tfw_http_conn_msg_free(TfwHttpMsg *hm) | |||
static inline void | |||
tfw_http_conn_req_clean(TfwHttpReq *req) | |||
{ | |||
spin_lock(&((TfwCliConn *)req->conn)->seq_qlock); | |||
spin_lock_bh(&((TfwCliConn *)req->conn)->seq_qlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release of server connections may happen in user context. tfw_http_conn_release()
also acquires the lock and works with seq_list
that's why _bh()
is used here.
struct list_head srv_list; | ||
rwlock_t lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since lock is removed, no server group or server timers may happen during reconfiguration process. Just a note for #736 .
Still reviewing: there is some issue in reconfigurations code. Reconfiguration tests fail and crash Tempesta:
The crash happened on |
The crash #916 (comment) is due to assertion |
Please read commits descriptions