-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash with SSL under load #983
Comments
@billabt Can you please take a look at this? |
@billabt I have also been debugging errors I was finally able to narrow the problem down last night to concurrent read() and write() on the same socket (backtraces below). This only appears to be a problem when using a SSL delegate. It does not seem to have a problem when using sockets the same way as long as there is no SSL delegate. Do we need to put additional guards in Kitura-net to avoid concurrent read() and write() on the same socket? Or is this something that should be fixed in Socket or SSLService? |
@na-gupta Looking at this problem and what you just mentioned, they are not related. Concurrent reads/writes to a socket should work fine as socket are fully duplex. The problem that I'm seeing with this crash is that there appear to be multiple writes going to the same socket simultaneously. This is a problem. You can check the if the socket is writable using the The malloc issue is usually the result of a continued use of a socket after it's been closed using the |
@billabt This does appear to fix the issues in osx, but is also causing the SSL tests to fail consistently on linux, where they run fine without the changes. |
Fixes are in BlueSocket v0.12.26. I think this puts the wrap on this issue. All tests are passing both on macOS and Linux. |
@billabt Thanks for the quick fix. |
@na-gupta @billabt I updated to
There are 8 threads in total, 5 are waiting. The three running threads are:
Thread 3 appears to be a new connection (we are in
I'll admit to being out of my depth here, but - are we definitely using OpenSSL in a thread safe way? This stackoverflow post suggests that we need to perform some thread setup / cleanup operations and points to openssl/crypto/threads/mttest.c as an example: I couldn't immediately find evidence of us doing this. |
The crash and malloc issues should now be fixed with the latest versions of BlueSocket and BlueSSLService along with the latest Kitura changes. @djones6. Please open an issue for the SIGPIPE problem. However, the problem can best be resolved by using the |
The crash is now fixed (by this commit to BlueSSLService 0.12.20). Raised #991 to deal with the SIGPIPE. |
On Linux, if I follow the SSL tutorial and create a simple 'Hello World' application, the application crashes if I drive load for long enough. It happens almost immediately with multiple concurrent connections and the application affinitized to 4 hardware threads, eg:
The application segfaults:
With 2 concurrent connections (
-c 2
) the crash takes longer (usually crashes within 20 seconds). It will crash without setting any process affinity (ie. withoutnumactl
), but again it seems to take longer to occur (this may be specific to my machine which is a 2-socket server).I initially thought this was specific to concurrent connections, however I have reproduced the crash twice with a single connection (
-c 1
). On my system, it took 90 seconds to result in a crash (after 740k successful requests). The second time it took a little over 3 minutes (after 1.3m requests).The first two errors logged (
ssl handshake failure
andshutdown while in init
) always happen whenwrk
starts up, I believe due to it creating a sacrificial connection to test connectivity before it starts the workload proper.The message
sslv3 alert bad record mac
often seems to appear right before the crash (although it did not appear either time when the crash occurred with a single connection) so may or may not be related.Backtrace from lldb:
Kitura version
I'm using Kitura 1.5.1, Kitura-net 1.5.2, Socket 0.12.22, SSLService 0.12.18
I also tried with #973 but it didn't help.
The text was updated successfully, but these errors were encountered: