-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible recursive locking #61
Comments
My first thought after researching the problem was that this is a case of a The synchronous sockets code is developed after the Linux socket code. However sending and receiving processes are decoupled from each other in the Linux stack, while they are not decoupled in sync sockets code. There's not decoupling between receiving code and sending code, data is passed from receiving to sending in one go. First the receiving socket is locked, and then the lock is taken on the sending socket as well without releasing the first lock. Problem is that data go both ways at the same time, and these same locks are taken on the reverse path and in reverse order. |
Recently, @keshonok and I had a discussion about this issue. We came to an agreement, that this is actually a serious issue. Indeed, we have a deadlock here. Setup:
Because HTTP pipelining is enabled, the server and the client may send HTTP messages to each other simultaneously.
The problem is both sockets have to be locked to make a transfer. Here is the scenario: CPU1:
CPU2:
Here we have a classic deadlock case. It seems the problem closely related to the Synchronous Sockets approach: their callbacks are invoked when the socket is locked. Therefore, we can't lock any other socket in the context of an SS callback. It seems there is no easy solution here. @krizhanovsky, please comment. |
Make SS simpler by removing lots of Linux socket code from SS. Leave that job to Linux, it does it well. When a socket leaves the ESTABLISHED state, we release all Tempesta resources linked to a socket, and do not take part in any work or actions that are needed to have the socket closed. Linux does the usual job of closing the TCP connection and related sockets in a correct way. This DOES NOT solve the problem of a deadlock (issue #61).
This is related to the fix for issue #61. That fix created a window in ss_tcp_process_date() when socket is unlocked, so it may be closed in a parallel thread. If that happens, ss_tcp_process_data() can not continue as the socket's data is cleared, and skb_queue_walk_safe() is not safe anymore. Check for that condition, and bail out if that happens.
And put a couple of empty lines lines. They are nice. :-) (#61)
And put a couple of empty lines back. They are nice. :-) (#61)
This message is issued by the kernel when an SKB is moved from the receive queue to the send queue as we do in the Synchronous Sockets paradigm.
The text was updated successfully, but these errors were encountered: