Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with SSL under load #983

Closed
djones6 opened this issue Jan 25, 2017 · 10 comments
Closed

Crash with SSL under load #983

djones6 opened this issue Jan 25, 2017 · 10 comments
Assignees
Milestone

Comments

@djones6
Copy link
Collaborator

djones6 commented Jan 25, 2017

On Linux, if I follow the SSL tutorial and create a simple 'Hello World' application, the application crashes if I drive load for long enough. It happens almost immediately with multiple concurrent connections and the application affinitized to 4 hardware threads, eg:

numactl --physcpubind=0-3 --membind=0 ./build/debug/HelloSSL
wrk -c16 -t2 -d30s https://localhost:8080/plaintext

The application segfaults:

[2017-01-25T10:19:49.603Z] [INFO] [HTTPServer.swift:83 listen(on:)] Listening on port 8080 (delegate: SSLService.SSLService)
[2017-01-25T10:19:57.930Z] [ERROR] [IncomingSocketHandler.swift:148 handleRead()] Read from socket (file descriptor 7) failed. Error = Error code: 336036069(0x140780E5), ERROR: SSL_read, code: 336036069, reason: ssl handshake failure.
[2017-01-25T10:19:57.938Z] [ERROR] [IncomingSocketHandler.swift:148 handleRead()] Read from socket (file descriptor 7) failed. Error = Error code: 336462231(0x140E0197), ERROR: SSL_read, code: 336462231, reason: shutdown while in init.
[2017-01-25T10:19:58.174Z] [ERROR] [IncomingSocketHandler.swift:148 handleRead()] Read from socket (file descriptor 27) failed. Error = Error code: 336151548(0x140943FC), ERROR: SSL_read, code: 336151548, reason: sslv3 alert bad record mac.
Segmentation fault (core dumped)

With 2 concurrent connections (-c 2) the crash takes longer (usually crashes within 20 seconds). It will crash without setting any process affinity (ie. without numactl), but again it seems to take longer to occur (this may be specific to my machine which is a 2-socket server).

I initially thought this was specific to concurrent connections, however I have reproduced the crash twice with a single connection (-c 1). On my system, it took 90 seconds to result in a crash (after 740k successful requests). The second time it took a little over 3 minutes (after 1.3m requests).

The first two errors logged (ssl handshake failure and shutdown while in init) always happen when wrk starts up, I believe due to it creating a sacrificial connection to test connectivity before it starts the workload proper.
The message sslv3 alert bad record mac often seems to appear right before the crash (although it did not appear either time when the crash occurred with a single connection) so may or may not be related.

Backtrace from lldb:

(lldb) target create "./debug/debug/HelloSSL" --core "core"
Core file '/home/djones6/Swift-TechEmpower/kitura/core' (x86_64) was loaded.
(lldb) bt
* thread #1: tid = 12756, 0x00007f739a7fbc91 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 273, name = 'HelloSSL', stop reason = signal SIGSEGV
  * frame #0: 0x00007f739a7fbc91 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 273
    frame #1: 0x00007f739a7fc2e5 libssl.so.1.0.0`___lldb_unnamed_symbol84$$libssl.so.1.0.0 + 213
    frame #2: 0x0000000000475fd0 HelloSSL`SSLService.send(buffer=(_rawValue = 0x00007f7378061ed0), bufSize=171, self=<unavailable>, $error=<unavailable>) throws -> Int + 144 at SSLService.swift:491
    frame #3: 0x000000000047ef16 HelloSSL`protocol witness for SSLServiceDelegate.send(buffer : UnsafeRawPointer!, bufSize : Int) throws -> Int in conformance SSLService + 86 at SSLService.swift:0
    frame #4: 0x000000000045e9b7 HelloSSL`Socket.write(buffer=<unavailable>, bufSize=<unavailable>, self=<unavailable>, $error=<unavailable>) throws -> Int + 1767 at Socket.swift:2839
    frame #5: 0x00000000004b72d6 HelloSSL`IncomingSocketHandler.write(bytes=<unavailable>, length=<unavailable>, self=<unavailable>) -> () + 422 at IncomingSocketHandler.swift:322
    frame #6: 0x00000000004b7114 HelloSSL`IncomingSocketHandler.write(data=<unavailable>, self=<unavailable>) -> () + 84 at IncomingSocketHandler.swift:294
    frame #7: 0x00000000004c9678 HelloSSL`IncomingHTTPSocketProcessor.write(data=<unavailable>, self=<unavailable>) -> () + 120 at IncomingHTTPSocketProcessor.swift:111
    frame #8: 0x00000000004cc3cf HelloSSL`HTTPServerResponse.end(self=<unavailable>, $error=<unavailable>) throws -> () + 335 at HTTPServerResponse.swift:115
    frame #9: 0x00000000004cef97 HelloSSL`protocol witness for ServerResponse.end() throws -> () in conformance HTTPServerResponse + 55 at HTTPServerResponse.swift:0
    frame #10: 0x000000000055028b HelloSSL`RouterResponse.end(self=<unavailable>, $error=<unavailable>) throws -> () + 2283 at RouterResponse.swift:150
    frame #11: 0x0000000000417e64 HelloSSL`(request=<unavailable>, response=<unavailable>, next=<unavailable>, $error=<unavailable>) + 516 at main.swift:41
    frame #12: 0x0000000000417f9a HelloSSL`thunk + 154 at main.swift:0
    frame #13: 0x00000000004180a9 HelloSSL`partial apply for thunk + 89 at main.swift:0
    frame #14: 0x00000000005563e7 HelloSSL`thunk + 135 at RouterElement.swift:0
    frame #15: 0x0000000000558ff9 HelloSSL`partial apply for thunk + 105 at RouterElement.swift:0
    frame #16: 0x0000000000531ebe HelloSSL`RouterMiddlewareGenerator.handle(request=<unavailable>, response=<unavailable>, next=<unavailable>, self=<unavailable>, $error=<unavailable>) -> ()) throws -> () + 142 at RouterMiddlewareGenerator.swift:43
    frame #17: 0x0000000000531ffb HelloSSL`protocol witness for RouterMiddleware.handle(request : RouterRequest, response : RouterResponse, next : () -> ()) throws -> () in conformance RouterMiddlewareGenerator + 91 at RouterMiddlewareGenerator.swift:0
    frame #18: 0x0000000000524756 HelloSSL`RouterMiddlewareWalker.next(self=<unavailable>) -> () + 1446 at RouterMiddlewareWalker.swift:65
    frame #19: 0x0000000000558045 HelloSSL`RouterElement.processHelper(request=<unavailable>, response=<unavailable>, next=<unavailable>, self=<unavailable>) -> ()) -> () + 181 at RouterElement.swift:186
    frame #20: 0x0000000000557eb3 HelloSSL`RouterElement.performSimpleMatch(path=<unavailable>, request=<unavailable>, response=<unavailable>, next=<unavailable>, self=<unavailable>) -> ()) -> () + 2627 at RouterElement.swift:172
    frame #21: 0x000000000055724e HelloSSL`RouterElement.process(request=<unavailable>, response=<unavailable>, parameterWalker=<unavailable>, next=<unavailable>, self=<unavailable>) -> ()) -> () + 3582 at RouterElement.swift:106
    frame #22: 0x000000000051f2fb HelloSSL`RouterElementWalker.next(self=<unavailable>) -> () + 539 at RouterElementWalker.swift:66
    frame #23: 0x0000000000530035 HelloSSL`Router.process(request=<unavailable>, response=<unavailable>, callback=<unavailable>, self=<unavailable>) -> ()) -> () + 869 at Router.swift:337
    frame #24: 0x000000000053061f HelloSSL`Router.handle(request=<unavailable>, response=<unavailable>, self=<unavailable>) -> () + 591 at Router.swift:310
    frame #25: 0x0000000000531b50 HelloSSL`protocol witness for ServerDelegate.handle(request : ServerRequest, response : ServerResponse) -> () in conformance Router + 64 at Router.swift:0
    frame #26: 0x00000000004ca706 HelloSSL`IncomingHTTPSocketProcessor.(self=<unavailable>) -> ()).(closure #1) + 838 at IncomingHTTPSocketProcessor.swift:185
    frame #27: 0x00000000004998b7 HelloSSL`thunk + 39 at IncomingSocketManager.swift:0
    frame #28: 0x00007f739ab522a7 libdispatch.so`_dispatch_call_block_and_release + 7
    frame #29: 0x00007f739ab61152 libdispatch.so`_dispatch_root_queue_drain + 594
    frame #30: 0x00007f739ab8961f libdispatch.so`worker_main(unused=<unavailable>) + 95 at manager.c:506 [opt]
    frame #31: 0x00007f73990c970a libpthread.so.0`start_thread + 202
    frame #32: 0x00007f7397cdc82d libc.so.6`__clone + 109 at clone.S:109

Kitura version

I'm using Kitura 1.5.1, Kitura-net 1.5.2, Socket 0.12.22, SSLService 0.12.18
I also tried with #973 but it didn't help.

@shmuelk
Copy link
Collaborator

shmuelk commented Jan 25, 2017

@billabt Can you please take a look at this?

@na-gupta
Copy link
Contributor

@billabt I have also been debugging errors malloc: *** error for object 0x104fdde40: pointer being freed was not allocated in SSLService. The bulk of the time this happened was when we try to close the socket while a read/write was in progress. I put guards in Kitura-net to avoid doing this, but still seeing problems on travis builds (not on my laptop, which made it harder to debug).

I was finally able to narrow the problem down last night to concurrent read() and write() on the same socket (backtraces below). This only appears to be a problem when using a SSL delegate. It does not seem to have a problem when using sockets the same way as long as there is no SSL delegate.

screen shot 2017-01-25 at 8 12 04 am

screen shot 2017-01-25 at 8 11 22 am

Do we need to put additional guards in Kitura-net to avoid concurrent read() and write() on the same socket? Or is this something that should be fixed in Socket or SSLService?

@billabt
Copy link

billabt commented Jan 25, 2017

@na-gupta Looking at this problem and what you just mentioned, they are not related. Concurrent reads/writes to a socket should work fine as socket are fully duplex. The problem that I'm seeing with this crash is that there appear to be multiple writes going to the same socket simultaneously. This is a problem. You can check the if the socket is writable using the isReadableOrWritable() instance API. Calling this API with no parameters will cause the API to check the socket for readability and writability and return immediately. If the socket is writable, go ahead and write, if not, check again and don't attempt to do the write until it is.

The malloc issue is usually the result of a continued use of a socket after it's been closed using the Darwin.close() or Glibc.close() function instead of the <socket instance>.close().

@shmuelk shmuelk added this to the 1.6.x milestone Jan 25, 2017
@billabt
Copy link

billabt commented Jan 26, 2017

@na-gupta @djones6 Fixes are in BlueSocket v0.12.25 and BlueSSLService v0.12.19. The Kitura project you sent me is now passing all the tests regardless of how many CPUs are enabled. I ran the test with 2, 3, 4, 8, 16 and 24 CPUs and all tests passed in all cases.

@na-gupta
Copy link
Contributor

@billabt This does appear to fix the issues in osx, but is also causing the SSL tests to fail consistently on linux, where they run fine without the changes.

@billabt
Copy link

billabt commented Jan 26, 2017

Fixes are in BlueSocket v0.12.26. I think this puts the wrap on this issue. All tests are passing both on macOS and Linux.

@na-gupta
Copy link
Contributor

@billabt Thanks for the quick fix.

@djones6
Copy link
Collaborator Author

djones6 commented Jan 26, 2017

@na-gupta @billabt I updated to SSLService-0.12.19 and Socket-0.12.26, as well as Kitura master (to pick up #973 which is not yet tagged), but I still get a crash on Linux. The backtrace looks the same (apart from the line numbers):

(lldb) target create "./debug/debug/HelloSSL" --core "core"
Core file '/home/djones6/Swift-TechEmpower/kitura_151/core' (x86_64) was loaded.
(lldb) bt
* thread #1: tid = 16965, 0x00007fa18cedbc91 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 273, name = 'HelloSSL', stop reason = signal SIGSEGV
  * frame #0: 0x00007fa18cedbc91 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 273
    frame #1: 0x00007fa18cedc2e5 libssl.so.1.0.0`___lldb_unnamed_symbol84$$libssl.so.1.0.0 + 213
    frame #2: 0x0000000000475dab HelloSSL`SSLService.send(buffer=(_rawValue = 0x00007fa16c1fdef0), bufSize=171, self=<unavailable>, $error=<unavailable>) throws -> Int + 155 at SSLService.swift:530
    frame #3: 0x000000000047ece6 HelloSSL`protocol witness for SSLServiceDelegate.send(buffer : UnsafeRawPointer, bufSize : Int) throws -> Int in conformance SSLService + 86 at SSLService.swift:0
    frame #4: 0x000000000045e737 HelloSSL`Socket.write(buffer=Swift.UnsafeRawPointer @ 0x00007fa1737f90e8, bufSize=<unavailable>, self=<unavailable>, $error=<unavailable>) throws -> Int + 1767 at Socket.swift:2839
    frame #5: 0x00000000004b7096 HelloSSL`IncomingSocketHandler.write(bytes=Swift.UnsafeRawPointer @ 0x00007fa1737f9480, length=<unavailable>, self=<unavailable>) -> () + 422 at IncomingSocketHandler.swift:322
... etc

There are 8 threads in total, 5 are waiting. The three running threads are:

  thread #1: tid = 16965, 0x00007fa18cedbc91 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 273, name = 'HelloSSL', stop reason = signal SIGSEGV
  thread #3: tid = 16962, 0x00007fa18a3499e6 libc.so.6`__memcpy_sse2 + 294 at memcpy.S:226, stop reason = signal SIGSEGV
  thread #7: tid = 16963, 0x00007fa18b7b250d libpthread.so.0`__GI___write + 45, stop reason = signal SIGSEGV

Thread 3 appears to be a new connection (we are in Socket.acceptClientConnection) so probably not relevant.
Thread 7 has the same backtrace as Thread 1, apart from 3 extra frames:

* thread #7: tid = 16963, 0x00007fa18b7b250d libpthread.so.0`__GI___write + 45, stop reason = signal SIGSEGV
  * frame #0: 0x00007fa18b7b250d libpthread.so.0`__GI___write + 45
    frame #1: 0x00007fa18cb84b35 libcrypto.so.1.0.0`___lldb_unnamed_symbol587$$libcrypto.so.1.0.0 + 37
    frame #2: 0x00007fa18cb82b5c libcrypto.so.1.0.0`BIO_write + 108
    frame #3: 0x00007fa18cedbbf2 libssl.so.1.0.0`___lldb_unnamed_symbol82$$libssl.so.1.0.0 + 114
    frame #4: 0x00007fa18cedc2e5 libssl.so.1.0.0`___lldb_unnamed_symbol84$$libssl.so.1.0.0 + 213
    frame #5: 0x0000000000475dab HelloSSL`SSLService.send(buffer=<unavailable>, bufSize=<unavailable>, self=<unavailable>, $error=<unavailable>) throws -> Int + 155 at SSLService.swift:530
... etc

I'll admit to being out of my depth here, but - are we definitely using OpenSSL in a thread safe way?

This stackoverflow post suggests that we need to perform some thread setup / cleanup operations and points to openssl/crypto/threads/mttest.c as an example:
https://github.com/openssl/openssl/blob/OpenSSL_1_0_2-stable/crypto/threads/mttest.c#L946

I couldn't immediately find evidence of us doing this.

@billabt
Copy link

billabt commented Jan 27, 2017

The crash and malloc issues should now be fixed with the latest versions of BlueSocket and BlueSSLService along with the latest Kitura changes.

@djones6. Please open an issue for the SIGPIPE problem. However, the problem can best be resolved by using the signal(SIGPIPE, SIG_IGN) function when running on Linux and using SSL, probably inserted just after creating the SSLService delegate. Note: This only needs to be done once, not for ever Socket instance. Then, during shutdown (or when no longer using SSL), another call to signal(SIGPIPE, SIG_DFL) will restore the default signal handling. There's no issue on macOS due to the way SSL is implemented in Secure Transport because inhibiting the SIGPIPE is accomplished via a call to setsockopt() right after creating the socket.

@djones6
Copy link
Collaborator Author

djones6 commented Jan 30, 2017

The crash is now fixed (by this commit to BlueSSLService 0.12.20).

Raised #991 to deal with the SIGPIPE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants