Delete keep-alive timer on connection_drop hook #1429

avbelov23 · 2020-07-07T16:40:53Z

The keep-alive timer may work after https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/tls.c#L582, but before https://github.com/tempesta-tech/tempesta/blob /master/tempesta_fw/tls.c#L584 (where the timer is deleted) and we will catch the bug https://github.com/tempesta-tech/tempesta/blob/master/tls/ttls.c#L2242 (tfw_sock_cli_keepalive_timer_cb () tfw_connection_close () -> tfw_tls_conn_close () -> ttls_close_notify ()), because The tls context will be filled with zeros and tls-> conf == NULL after https://github.com/tempesta-tech/tempesta/blob/master/tempesta_fw/tls.c#L582

krizhanovsky

While the timer_pending() check is unreliable, it turns out that we have much more serious problems with connections reference counting - need to explore and probably fix them.

krizhanovsky · 2020-07-08T17:19:13Z

tempesta_fw/sock_clnt.c

-		  msecs_to_jiffies((long)tfw_cli_cfg_ka_timeout * 1000));
+
+	if (timer_pending(&cli_conn->timer))
+		mod_timer(&cli_conn->timer,


Generally speaking, this check is unreliable: the timer can be deactivated right between the check and mod_timer(). Why we ever call tfw_cli_conn_send() on dropped connection? There are only 2 places calling the function:

tfw_http_resp_fwd() -> __tfw_http_resp_fwd() - the former one does tfw_cli_conn_get(), so the connection is alive;

tfw_h2_resp_fwd() - this one seems doesn't care about the client connection state.

The interesting thing is that there are only two places where connection reference counter is incremented: the 1st one in HTTP/1 forwarding and the 2nd one is tfw_http_conn_msg_alloc(). Nobody calls tfw_srv_conn_get().

So it seems HTTP/2 is prone for the this and I think other similar referencing a dead connection bugs. Please test HTTP/2 code against #1428

There are two macros over tfw_connection_get(), which are not needed.

Could you please elaborate what exactly this lock protects against? What party can call this function concurrently? A proper comment here would not hurt.
In my current view, once connection accounting in HTTP/2 case is fixed - which it is in this PR, this lock is not needed. The SS layer is stopped first what Tempesta is unloaded, as I recall.

UPDATE: Looks like the lock is needed because the timer deletion was moved from release() to drop(). While release() is called when there are no other users, there's no such luxury with drop() and the connection may still be in use by lingering threads of execution. If that's so, then a good comment in the commit (it won't be easy to understand in the code) would be nice.

krizhanovsky

The timer fix is good for now, but we need to fix the connection reference counting issue. Need review from @ikoveshnikov since there are plenty of subtle issues with the reference counting.

tempesta_fw/http.c

krizhanovsky · 2020-08-21T14:14:24Z

tempesta_fw/sock_clnt.c

+
+	spin_lock_bh(&((TfwCliConn *)conn)->timer_lock);
+	del_timer_sync(&((TfwCliConn *)conn)->timer);
+	spin_unlock_bh(&((TfwCliConn *)conn)->timer_lock);


I love moving the timer deletion here, the connection_drop hook, which is called when we're going to terminate the connection.

krizhanovsky · 2020-08-21T15:16:50Z

tempesta_fw/sock_clnt.c

+		mod_timer(&cli_conn->timer,
+		          jiffies +
+		          msecs_to_jiffies((long)tfw_cli_cfg_ka_timeout * 1000));
+	spin_unlock_bh(&cli_conn->timer_lock);


The keep alive timer lock is awkward, but I didn't find any better solution for now. Also it seems there is no much sense to put significant effort to eliminate the lock - this is just per connection lock after all.

However, the whole timers management is complex and has performance issues, so I added an appropriate comment in #736 (comment) .

Same here. I also tried to find another solution, e.g. for server connections we have a special value TFW_CONN_DEATHCNT in reference counter and state TFW_CONN_B_DEL in TfwSrvConn->flags. But this doesn't solve unordered del_timer/mod_timer calls, which is the problem root.

krizhanovsky

With @keshonok the PR looks good. I agree that comments are good to have. From my side I'd love to see comments for tfw_srv_conn_put() calls that they are paired with tfw_srv_conn_get_if_live().

vankoven · 2020-08-26T14:33:27Z

tempesta_fw/sock_clnt.c

@@ -234,6 +243,11 @@ tfw_sock_clnt_drop(struct sock *sk)

 	T_DBG3("connection lost: close client socket: sk=%p, conn=%p, "
 	       "client=%p\n", sk, conn, conn->peer);
+
+	spin_lock_bh(&((TfwCliConn *)conn)->timer_lock);


The lock is acquired and released only in SoftIRQ context and can be relaxed to spin_lock() (without bh)

vankoven · 2020-08-27T10:10:38Z

tempesta_fw/sock_clnt.c

+		mod_timer(&cli_conn->timer,
+		          jiffies +
+		          msecs_to_jiffies((long)tfw_cli_cfg_ka_timeout * 1000));
+	spin_unlock_bh(&cli_conn->timer_lock);


Same here. I also tried to find another solution, e.g. for server connections we have a special value TFW_CONN_DEATHCNT in reference counter and state TFW_CONN_B_DEL in TfwSrvConn->flags. But this doesn't solve unordered del_timer/mod_timer calls, which is the problem root.

vankoven · 2020-08-27T10:16:36Z

tempesta_fw/connection.h

@@ -380,7 +379,10 @@ tfw_connection_put(TfwConn *conn)
 		conn->destructor(conn);
 }

-#define tfw_cli_conn_put(c)	tfw_connection_put((TfwConn *)(c))
+/*
+ * Paired with tfw_srv_conn_get_if_live() via tfw_http_get_srv_conn() or


The comment is misplaced. Not the tfw_srv_conn_put() function itself is paired with tfw_srv_conn_get_if_live(). Instead commit must be placed on that place where the function is used.

avbelov23 force-pushed the avb-1404 branch from 8b733b1 to ee4fc7f Compare July 8, 2020 11:24

krizhanovsky requested changes Jul 8, 2020

View reviewed changes

avbelov23 force-pushed the avb-1404 branch from ee4fc7f to 15b047a Compare August 5, 2020 15:15

krizhanovsky self-requested a review August 20, 2020 19:24

krizhanovsky self-assigned this Aug 20, 2020

krizhanovsky requested a review from vankoven August 20, 2020 19:44

krizhanovsky assigned vankoven Aug 20, 2020

krizhanovsky mentioned this pull request Aug 21, 2020

Move timers to kernel thread #736

Open

krizhanovsky requested changes Aug 21, 2020

View reviewed changes

krizhanovsky approved these changes Aug 21, 2020

View reviewed changes

avbelov23 force-pushed the avb-1404 branch from 15b047a to 7f22f63 Compare August 24, 2020 15:50

vankoven approved these changes Aug 27, 2020

View reviewed changes

avbelov23 added 3 commits August 27, 2020 17:27

delete keep-alive timer on connection_drop hook

e46a45b

geting connection for send

f02d650

removing tfw_cli_conn_get/put(), tfw_srv_conn_get()

c3255db

avbelov23 force-pushed the avb-1404 branch from 7f22f63 to c3255db Compare August 27, 2020 14:35

avbelov23 merged commit 571136e into master Aug 27, 2020

avbelov23 deleted the avb-1404 branch August 27, 2020 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete keep-alive timer on connection_drop hook #1429

Delete keep-alive timer on connection_drop hook #1429

avbelov23 commented Jul 7, 2020

krizhanovsky left a comment

krizhanovsky Jul 8, 2020

keshonok Aug 21, 2020 •

edited

Loading

krizhanovsky left a comment

krizhanovsky Aug 21, 2020

krizhanovsky Aug 21, 2020

vankoven Aug 27, 2020

krizhanovsky left a comment

vankoven Aug 26, 2020

vankoven Aug 27, 2020

vankoven Aug 27, 2020

Delete keep-alive timer on connection_drop hook #1429

Delete keep-alive timer on connection_drop hook #1429

Conversation

avbelov23 commented Jul 7, 2020

krizhanovsky left a comment

Choose a reason for hiding this comment

krizhanovsky Jul 8, 2020

Choose a reason for hiding this comment

keshonok Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

krizhanovsky left a comment

Choose a reason for hiding this comment

krizhanovsky Aug 21, 2020

Choose a reason for hiding this comment

krizhanovsky Aug 21, 2020

Choose a reason for hiding this comment

vankoven Aug 27, 2020

Choose a reason for hiding this comment

krizhanovsky left a comment

Choose a reason for hiding this comment

vankoven Aug 26, 2020

Choose a reason for hiding this comment

vankoven Aug 27, 2020

Choose a reason for hiding this comment

vankoven Aug 27, 2020

Choose a reason for hiding this comment

keshonok Aug 21, 2020 •

edited

Loading