-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tempesta hooks and proper use of close(), drop(), failover, and retry() #107
Comments
This commit addresses issues described in #107. - Run Tempesta hooks under sk->sk_callback_lock. The same lock protects the use and manipulation of sk->sk_user_data now. That way Tempesta can be separated from a kernel socket safely when socket is closed. - Separate a release of Tempesta resources linked to a socket from a close of the socket itself. That makes proper use of failover, retry, and drop actions possible. Also, that makes possible proper reaction to a close by separating an intentional close by Tempesta from a close that requires a reconnect.
Linked with commit & comment 33e7d3b |
Below are some implementation notes.
BTW, that change of One of major tasks in implementation of this issue is separating Tempesta from the Linux kernel (Sync Sockets) on a specific socket when necessary. The kernel must continue to function correctly without any hiccup after the kernel socket is separated from Tempesta. After a recent architecture changes Tempesta has specific functions to link a kernel socket with Tempesta internal data for that socket, and unlink a kernel socket from Tempesta internal data for that socket. The functions are So, As soon as use and access to |
Implemented in a series of commits included in pull request #110. |
#110 merged to master. |
There are several cases when failover() function is required.
(1) is the only case that leads to kernel calling of
sk->sk_error_report()
socket hook. The kernel then callssk->sk_state_change()
hook in state TCP_CLOSE thus skipping the state TCP_ESTABLISHED. In current implementation of Synchronous Sockets the state TCP_CLOSE is never reached in the regular course of action. This ensures that whensk->sk_state_change()
is called in TCP_CLOSE state that's because of a failed connect attempt due to the absence of a server at the other end. So currently that's when Tempesta hookconnection_close()
is called. The name is misleading, and in fact this hook leads to calling oftfw_sock_srv_connect_retry()
that initiates reconnect attempts. This hook is not implemented for client sockets.(2) is the case where we need to reconnect. Currently connections to servers are considered permanent, so when a connection closes Tempesta needs to reconnect. That's when
connection_drop()
hook is called which leads to callingtfw_sock_srv_connect_failover()
for server sockets. This function releases all Tempesta resources linked to the socket, and initiates a new connect attempt to the server. The old socket is closed.Socket close in Synchronous Sockets is implemented in two variants. One is internal
ss_do_close()
that is only called in SoftIRQ context from within Sync Sockets code. The other is externalss_close()
that can be called from both SoftIRQ and process contexts.ss_close()
callsss_do_close
to do the actual close. In that processss_do_close()
calls theconnection_drop()
hook. In case of server sockets that initiates another connect attempt.The issue is in how these hooks and functions are used in Tempesta. The way it's used now it may lead to reconnect attempts where it's not needed, or even lead to an endless loop of these reconnect attempts in certain conditions.
All in all, it appears that it's not the right place to call
connection_drop()
hook from the internal close functionss_do_close()
. It needs to be called differently and the right way.Aside of the above two cases there's another one that makes the issue a bit more complicated. That's Tempesta start/stop process that is executed in process context. The stop process is especially troubling.
When Tempesta is stopped it needs to close all connections. That leads to reconnect attempts due to current code architecture. There's a need to differentiate between a socket close that should lead to another connect attempt, from a socket close that is intentional and final.
Also, when Tempesta is stopped, the release of Tempesta resources for a socket, and a socket close are performed on a live socket that may have data exchanged on it. So a socket close must be done in a correct way with proper locks in place.
The text was updated successfully, but these errors were encountered: