Fix #76 #886 #908 #911, many other fixes and optimizations #916

krizhanovsky · 2018-02-17T14:22:48Z

Please read commits descriptions

…if any.

…sly -tfw_http_conn_release_closed()

all the network and scheduler locks as well as RCU calls _bh. 2. Use RCU barriers to wait untill all memory is freed on configuration reloading before trying to allocate memory for new server groups. 3. Call RCU barrier once per a while to avoid massive CPU utilization by RCU callbacks (freeing throtling); 4. Move __tfw_sg_release_all_reconfig() to the unit test where it's supposed to be. The system still hungs with ksoftirqd soft lockup on reloading 100K server groups. It seems not all locks are fixed by _bh...

…ntexts; 2. Use GFP_KERNEL for configuration (process) context; 3. introduce tfw_srv_loop_sched_rcu() to let RCU make progress with long process context on !CONFIG_PREEMPT kernels; 4. use sempaphore in configuration (process) context instead of sg_lock; 5. remove sg->lock since it's only needed for configuration and procfs (process) context and there is no need to care about concurrency as well as no need to spend more memory for each group. Also sleepable context with tfw_srv_loop_sched_rcu() is very wished for RCU progress (see Documentation/RCU/stallwarn.txt);

procfs directory. With huge number of file procfs becomes almost unusable, but this is choice for a system administrator or a TODO for us to build a filesystem tree.

…TTP messages

aleksostapenko · 2018-02-23T08:48:43Z

tempesta_fw/http.c

- * to be forwarded. The main effort is put into servicing requests
- * that are on time. Unlucky requests are just given another chance
- * with minimal effort.
+ * That's the intended behaviour. Suche rescheduled requests are unlucky


Suche -> Such

aleksostapenko · 2018-02-23T13:55:13Z

tempesta_fw/http.c

@@ -414,10 +414,10 @@ tfw_http_conn_msg_free(TfwHttpMsg *hm)
 static inline void
 tfw_http_conn_req_clean(TfwHttpReq *req)
 {
-	spin_lock(&((TfwCliConn *)req->conn)->seq_qlock);
+	spin_lock_bh(&((TfwCliConn *)req->conn)->seq_qlock);


It seems that there is not necessity to insert *_bh spin locks into code that is called from SoftIRQ context only (tfw_http_conn_req_clean() and tfw_http_resp_fwd()), as SoftIRQ handlers are serialized on the same CPU and simple spin_lock without bottom halves disabling should be enough in these cases.

Release of server connections may happen in user context. tfw_http_conn_release() also acquires the lock and works with seq_list that's why _bh() is used here.

Yes, but I didn't find any call from user context (e.g. for tfw_http_conn_req_clean()).
On the other hand it seems that using *_bh spin locks in SoftIRQ context does not affect either performance or lock contention. So this is just a discussion note.

Example of process context calls: tfw_cfgop_update_sg_srv_list() -> tfw_sock_srv_start_srv() -> tfw_sock_srv_connect_srv() -> tfw_sock_srv_connect_try_later() -> tfw_connection_repair() -> conn_repair = tfw_http_conn_repair() -> tfw_http_conn_shrink_fwdq() -> tfw_http_req_zap_error() -> tfw_http_error_resp_switch() -> tfw_http_send_resp() -> tfw_http_resp_build_error() -> tfw_http_conn_req_clean().

Yes, i was wrong here. Missed this path.
Thanks.

aleksostapenko · 2018-02-23T15:09:30Z

tempesta_fw/sock.c

@@ -1136,12 +1136,12 @@ ss_sock_create(int family, int type, int protocol, struct sock **res)
 	struct sock *sk = NULL;
 	const struct net_proto_family *pf;

-	rcu_read_lock();
+	rcu_read_lock_bh();


Don't quite understand this replacement: it seems there are no spinlocks here (with possibility of user/softirq deadlock) that it was necessary to disable softirq; besides, synchronize_rcu() (not *_bh flavor) is used in kernel with net_families array.

Basically the reason is the same as for #916 (comment) : ss_sock_create() can be called from process contest on the system start as well as from softirq context on server reconnection attempt (timer context). Updated #736 (comment)

aleksostapenko · 2018-02-23T15:35:24Z

tempesta_fw/server.c

-static LIST_HEAD(sg_list);
-static LIST_HEAD(sg_list_reconfig);
-static DEFINE_RWLOCK(sg_lock);
+#define TFW_SG_HBITS	10


Maybe this parameter should be configurable - for cases when server groups count is much less (or much larger) than 1K.

We spend 4 pages for both of the hash tables, so reducing hash size won't give us big memory win. There is no need to move the hash size to configuration, since we're able to estimate the optimal hash size by the number of configured server groups. I have no idea whether it has sense to optimize the data structure further to handle 100K and more server groups, since profiling results has shown timers as the main bottleneck.

vankoven · 2018-02-26T12:20:03Z

tempesta_fw/http.c

@@ -414,10 +414,10 @@ tfw_http_conn_msg_free(TfwHttpMsg *hm)
 static inline void
 tfw_http_conn_req_clean(TfwHttpReq *req)
 {
-	spin_lock(&((TfwCliConn *)req->conn)->seq_qlock);
+	spin_lock_bh(&((TfwCliConn *)req->conn)->seq_qlock);


Release of server connections may happen in user context. tfw_http_conn_release() also acquires the lock and works with seq_list that's why _bh() is used here.

vankoven · 2018-02-26T13:00:06Z

tempesta_fw/server.h

 	struct list_head	srv_list;
-	rwlock_t		lock;


Since lock is removed, no server group or server timers may happen during reconfiguration process. Just a note for #736 .

vankoven · 2018-02-26T20:21:02Z

Still reviewing: there is some issue in reconfigurations code. Reconfiguration tests fail and crash Tempesta:

   40.543629] [tdb] Start Tempesta DB
[   40.554990] net_ratelimit: 1 callbacks suppressed
[   40.556483] [tempesta] Initializing Tempesta FW kernel module...
[   40.564618] [tempesta] Registering new scheduler: hash
[   40.570745] [tempesta] Registering new scheduler: http
[   40.581123] [tempesta] Registering new scheduler: ratio
[   40.626207] [tempesta] Prepearing for the configuration processing.
[   40.628830] [tempesta] Configuration processing is completed.
[   40.639633] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff94157c400000
[   40.762709] [tdb] Opened table /opt/tempesta/db/cache.tdb: size=268435456 rec_size=0 base=ffff94156c400000
[   40.765524] [tempesta] Open listen socket on: 0.0.0.0
[   40.780090] [tempesta] modules are started
[   50.519206] [tempesta] Live reconfiguration of Tempesta.
[   50.521350] [tempesta] Prepearing for the configuration processing.
[   50.523342] [tempesta] Configuration processing is completed.
[   50.525629] ------------[ cut here ]------------
[   50.527083] kernel BUG at /home/user/tempesta/tempesta/tempesta_fw/http.c:1330!
[   50.528735] invalid opcode: 0000 [#1] SMP
[   50.528735] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O) tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O) crct10dif_pclmul binfmt_misc crc32_pclmul snd_hda_codec_generic iTCO_wdt iTCO_vendor_support ghash_clmulni_intel ppdev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm evdev snd_timer snd lpc_ich serio_raw pcspkr soundcore mfd_core virtio_balloon sg virtio_console qxl parport_pc parport ttm drm_kms_helper button drm shpchp ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache sr_mod cdrom virtio_blk virtio_net crc32c_intel ahci libahci aesni_intel i2c_i801 i2c_smbus aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse uhci_hcd ehci_pci ehci_hcd libata usbcore usb_common sym53c8xx scsi_transport_spi scsi_mod virtio_pci virtio_ring virtio [last unloaded: tempesta_tls]
[   50.534804] CPU: 0 PID: 1453 Comm: sysctl Tainted: G           O    4.9.0-tempesta-amd64 #1 Debian 4.9.35-tfw4-1
[   50.534804] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   50.534804] task: ffff94155b449d00 task.stack: ffffb0c1816c8000
[   50.534804] RIP: 0010:[<ffffffffc085bd23>]  [<ffffffffc085bd23>] tfw_http_req_destruct+0x53/0x60 [tempesta_fw]
[   50.534804] RSP: 0018:ffff94157fc03c30  EFLAGS: 00010206
[   50.534804] RAX: ffff9415597e63f0 RBX: ffff94155aeba020 RCX: 000000010010000a
[   50.534804] RDX: ffff94155aeba3f0 RSI: ffffd99fc1667c40 RDI: ffff94155aeba020
[   50.534804] RBP: ffff94157fc03c50 R08: ffff9415599f1200 R09: 000000010010000a
[   50.534804] R10: ffff9415599f1400 R11: 2ce33e6c02ce3300 R12: ffff9415599b5678
[   50.534804] R13: ffff94157fc03c50 R14: ffff9415599b572c R15: ffff94157fc03c50
[   50.534804] FS:  00007f4d1a61b880(0000) GS:ffff94157fc00000(0000) knlGS:0000000000000000
[   50.534804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.534804] CR2: 00007f4d19d15d00 CR3: 000000005b8fb000 CR4: 00000000003406f0
[   50.534804] Stack:
[   50.534804]  ffff9415599b5678 ffffffffc085f579 ffff94155aeba020 ffffffffc085be92
[   50.534804]  ffff94157fc03c50 ffff94157fc03c50 0000000059645e70 212fd317b59b8a3f
[   50.534804]  0000000700000024 ffff9415599f0020 ffff94155aeba020 ffff94157fc1a540
[   50.534804] Call Trace:
[   50.534804]  <IRQ> [   50.534804]  [<ffffffffc085f579>] ? tfw_http_msg_free+0x89/0xa0 [tempesta_fw]
[   50.534804]  [<ffffffffc085be92>] ? tfw_http_resp_fwd+0x162/0x2c0 [tempesta_fw]
[   50.534804]  [<ffffffffc085c177>] ? tfw_http_send_resp+0x187/0x1a0 [tempesta_fw]
[   50.534804]  [<ffffffffc085cc2c>] ? tfw_http_conn_shrink_fwdq_resched+0x23c/0x2b0 [tempesta_fw]
[   50.534804]  [<ffffffffc085ceb4>] ? tfw_http_conn_release+0x214/0x240 [tempesta_fw]
[   50.534804]  [<ffffffffc0859334>] ? tfw_connection_release+0x24/0x50 [tempesta_fw]
[   50.534804]  [<ffffffffc087ed7e>] ? tfw_srv_conn_release+0xe/0x80 [tempesta_fw]
[   50.534804]  [<ffffffffc087bbcc>] ? ss_tx_action+0x38c/0x4b0 [tempesta_fw]
[   50.534804]  [<ffffffffb95048db>] ? net_tx_action+0x9b/0x1d0
[   50.534804]  [<ffffffffb9610d1a>] ? __do_softirq+0x10a/0x29a
[   50.534804]  [<ffffffffc087f090>] ? tfw_sock_srv_grace_shutdown_srv+0x110/0x110 [tempesta_fw]
[   50.534804]  [<ffffffffb960f16c>] ? do_softirq_own_stack+0x1c/0x30
[   50.534804]  <EOI> [   50.534804]  [<ffffffffb908056d>] ? do_softirq.part.18+0x3d/0x50
[   50.534804]  [<ffffffffb90805f8>] ? __local_bh_enable_ip+0x78/0x80
[   50.534804]  [<ffffffffc087ee73>] ? tfw_sock_srv_disconnect_srv+0x83/0xb0 [tempesta_fw]
[   50.534804]  [<ffffffffc087f07c>] ? tfw_sock_srv_grace_shutdown_srv+0xfc/0x110 [tempesta_fw]
[   50.534804]  [<ffffffffc087f0f7>] ? __tfw_cfgop_update_sg_srv_list+0x67/0x1a0 [tempesta_fw]
[   50.534804]  [<ffffffffc087f090>] ? tfw_sock_srv_grace_shutdown_srv+0x110/0x110 [tempesta_fw]
[   50.534804]  [<ffffffffc087f090>] ? tfw_sock_srv_grace_shutdown_srv+0x110/0x110 [tempesta_fw]
[   50.534804]  [<ffffffffc0879940>] ? __tfw_sg_for_each_srv+0x70/0xa0 [tempesta_fw]
[   50.534804]  [<ffffffffc087eac5>] ? tfw_sock_srv_start+0xf5/0x330 [tempesta_fw]
[   50.534804]  [<ffffffffc087784d>] ? tfw_ctlfn_state_io+0x36d/0x530 [tempesta_fw]
[   50.534804]  [<ffffffffb960b77e>] ? mutex_lock+0xe/0x30
[   50.534804]  [<ffffffffc0877524>] ? tfw_ctlfn_state_io+0x44/0x530 [tempesta_fw]
[   50.534804]  [<ffffffffb9282b43>] ? proc_sys_call_handler+0xe3/0x100
[   50.534804]  [<ffffffffb9208820>] ? vfs_write+0xb0/0x190
[   50.534804]  [<ffffffffb9209c12>] ? SyS_write+0x52/0xc0
[   50.534804]  [<ffffffffb960e1fb>] ? system_call_fast_compare_end+0xc/0x9b
[   50.534804] Code: 03 00 00 48 8d 97 e0 03 00 00 48 39 c2 75 1e 48 8b bf 58 03 00 00 48 85 ff 74 09 48 83 c4 08 e9 74 b0 01 00 48 83 c4 08 c3 0f 0b <0f> 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 
[   50.534804] RIP  [<ffffffffc085bd23>] tfw_http_req_destruct+0x53/0x60 [tempesta_fw]
[   50.534804]  RSP <ffff94157fc03c30>

The crash happened on reconf.test_stress_sched_hash.SchedHashCustomSg.test_hash_del_add_srvs. In the same time i have no issues with reconf tests on current master

krizhanovsky · 2018-03-02T16:33:43Z

The crash #916 (comment) is due to assertion BUG_ON(!list_empty(&req->fwd_list)). The root casue is that tfw_http_conn_shrink_fwdq_resched() calls tfw_http_req_err() with srv_conn == NULL since it passes the whole srv_conn->fwd_queue to schq, so tfw_http_req_delist() isn't called for a request. This works fine if req is going to be added to the error queue, but we hit the crash if we try to send an error response immediately. The assertion fail is expected and harmless in general, but still must be fixed.

krizhanovsky added 8 commits February 11, 2018 22:52

Fix scheduler modules unloading - wait for outstanding RCU callbacks …

6d11ff1

…if any.

Fix #908: tfw_peer_* functions can be called from user context

bcc1688

Fix #911: move fwd_qlock to tfw_http_conn_shrink_fwd_queue() (previou…

001a9c8

…sly -tfw_http_conn_release_closed()

#687: fix #851 (comment) - do less list traversals under fwd_qlock

1cb0cb4

More information about startup issues

a27c966

#76: performance optimization for server group lookups

94b18ed

Fix tfw_http_conn_shrink_fwdq_resched() do not touch srv_conn->nip_queue

65cb86f

krizhanovsky assigned vankoven and aleksostapenko Feb 17, 2018

krizhanovsky requested review from vankoven and aleksostapenko February 17, 2018 14:22

krizhanovsky added the ready for review label Feb 17, 2018

krizhanovsky removed the ready for review label Feb 20, 2018

Remove unnatural limit for number of servers and server groups in

2211394

procfs directory. With huge number of file procfs becomes almost unusable, but this is choice for a system administrator or a TODO for us to build a filesystem tree.

krizhanovsky added the ready for review label Feb 20, 2018

krizhanovsky added 2 commits February 20, 2018 20:31

Fix #886: sessions garbage collection and memory leaks on non-freed H…

64fe087

…TTP messages

Fix sched_helper.c according to recent server.c changes

95af930

krizhanovsky changed the title ~~Fix #76 #908 #911, many other fixes and optimizations~~ Fix #76 #886 #908 #911, many other fixes and optimizations Feb 20, 2018

aleksostapenko reviewed Feb 23, 2018

View reviewed changes

vankoven reviewed Feb 26, 2018

View reviewed changes

krizhanovsky mentioned this pull request Mar 2, 2018

Move timers to kernel thread #736

Open

Fix review comments

f7d618b

vankoven approved these changes Mar 2, 2018

View reviewed changes

aleksostapenko approved these changes Mar 3, 2018

View reviewed changes

krizhanovsky merged commit bfe33c0 into master Mar 3, 2018

krizhanovsky deleted the ak-76 branch March 3, 2018 08:45

krizhanovsky mentioned this pull request Mar 3, 2018

Sometimes get spinlock bugs #929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #76 #886 #908 #911, many other fixes and optimizations #916

Fix #76 #886 #908 #911, many other fixes and optimizations #916

krizhanovsky commented Feb 17, 2018

aleksostapenko Feb 23, 2018

aleksostapenko Feb 23, 2018

vankoven Feb 26, 2018

aleksostapenko Feb 27, 2018 •

edited

Loading

krizhanovsky Mar 2, 2018

aleksostapenko Mar 3, 2018

aleksostapenko Feb 23, 2018 •

edited

Loading

krizhanovsky Mar 2, 2018

aleksostapenko Feb 23, 2018 •

edited

Loading

krizhanovsky Mar 2, 2018

vankoven Feb 26, 2018

vankoven Feb 26, 2018

vankoven commented Feb 26, 2018

krizhanovsky commented Mar 2, 2018 •

edited

Loading

Fix #76 #886 #908 #911, many other fixes and optimizations #916

Fix #76 #886 #908 #911, many other fixes and optimizations #916

Conversation

krizhanovsky commented Feb 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksostapenko Feb 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksostapenko Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aleksostapenko Feb 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vankoven commented Feb 26, 2018

krizhanovsky commented Mar 2, 2018 • edited Loading

aleksostapenko Feb 27, 2018 •

edited

Loading

aleksostapenko Feb 23, 2018 •

edited

Loading

aleksostapenko Feb 23, 2018 •

edited

Loading

krizhanovsky commented Mar 2, 2018 •

edited

Loading