Fix cursor thread leak from closing unconsumed iterators #917

senderista · 2021-09-15T00:03:36Z

This is not a final solution, just an expedient for the Preview release. In V1 I plan to implement a principled solution for file descriptor lifetime management (minimally, newtype wrappers over shared_ptr/unique_ptr, similar to the solution here; maximally, a per-process "shadow fd table" that allows us to detect access and freeing of reused fds as well as already-closed fds).

@JackAtGaia will be testing this in parallel to confirm that it fixes the observed thread leak in the DB server.

BTW, an unanticipated benefit of dynamically allocating memory storing the fd is that (provided the fd value doesn't escape via a rogue copy) it converts an fd leak into a memory leak, and thus makes fd leak detection trivial given our existing tooling (i.e., LeakSanitizer).

production/db/core/inc/db_client.hpp

production/db/core/src/db_client.cpp

LaurentiuCristofor

Looks good, but we should avoid Hungarian notation or derivatives. Let's just use fd instead of pfd and stream_socket instead of stream_socket_ptr.

simone-gaia

LGTM, @JackAtGaia maybe we want a JIRA to add a regression test?

senderista · 2021-09-15T00:22:12Z

Looks good, but we should avoid Hungarian notation or derivatives. Let's just use fd instead of pfd and stream_socket instead of stream_socket_ptr.

I disagree. In this case I think the semantics of a shared_ptr are different enough from an int that they're worth expressing in the name. Otherwise it's unclear why you're dereferencing a variable that is expected to just be an int Same goes for pfd: if you just call it fd then it's unclear why you're calling delete on an int.

senderista · 2021-09-15T00:23:31Z

LGTM, @JackAtGaia maybe we want a JIRA to add a regression test?

I would add a unit test if I could see a reasonable way for a unit test to catch regressions from thread leaks, but I can't. I think regressions like this will have to be caught by an end-to-end stress test.

simone-gaia · 2021-09-15T00:26:55Z

@senderista that's why I summoned Jack :D

JackAtGaia · 2021-09-15T17:12:17Z

Testing comment, not a code quality comment. Tested this out on a build provided by Tobin, and I am not seeing any thread count increase with every action.

LaurentiuCristofor · 2021-09-15T19:10:23Z

production/db/core/src/db_client.cpp

+    // same effect with an RAII wrapper, but it would need to have copy rather
+    // than move semantics, since the socket is captured by a lambda that must
+    // be copyable (since it is coerced to std::function).
+    std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; });


Late suggestion: why not move this code above and get rid of the scope guard completely?

See below for why. The asserts are not why we need the scope_guard (they should be expected to terminate the program, so the socket would be closed in any case).

LaurentiuCristofor · 2021-09-15T19:11:26Z

production/db/query_processor/src/scan_generators.cpp

+    // same effect with an RAII wrapper, but it would need to have copy rather
+    // than move semantics, since the socket is captured by a lambda that must
+    // be copyable (since it is coerced to std::function).
+    std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; });


Here too we could probably move this code and eliminate the scope_guard. The asserts can happen after the wrapping into the shared_ptr.

I think we need the scope_guard regardless, because dynamic allocation can always throw, and we need to close the socket in that case.

(There are actually 2 dynamic allocations happening here: one explicit, which is our operator new call, and the other implicit, which is shared_ptr allocating its shared refcount structure on the heap. make_shared() combines these into a single allocation, but we can't use it because it doesn't support custom deleters.)

But then why remove the scope guard in the other situations? You could have kept it for the same reason.

Anyway, I don't think we should worry about cleanup in the case that new() fails - in that case the server would stop execution anyway.

You're right that in general we don't try to recover from exceptions on the server (unless they're from a misbehaving or crashed client), but the iterator code is all client-side at least conceptually (i.e., it consumes a "cursor socket" sent over the session socket by the server). We don't control the exception handling policy in a client application (e.g., the client might try to recover from a std::bad_alloc exception thrown by the stream_socket_ptr allocation by freeing some memory that they own). That's why exception safety is important to get right on the client, while it can be treated as a mostly theoretical concern on the server (but I still treat non-exception-safety as a bug there as well).

But then why remove the scope guard in the other situations? You could have kept it for the same reason.

Not totally sure which "other situations" you're referring to, but the reason I removed scope_guard in the consumers of stream_socket_ptr is that the shared_ptr destructor now closes the socket if an exception is thrown in one of the consumers, so exception safety no longer requires a scope_guard there. The key difference is that the shared_ptr has already been successfully constructed at that point, so it now owns the socket and its destructor is responsible for closing the socket.

Fix cursor thread leak from closing unconsumed iterators

2ef0368

senderista requested review from LaurentiuCristofor, yiwen-wong and simone-gaia September 15, 2021 00:03

fix comment

ae0765c

LaurentiuCristofor reviewed Sep 15, 2021

View reviewed changes

production/db/core/inc/db_client.hpp Show resolved Hide resolved

LaurentiuCristofor reviewed Sep 15, 2021

View reviewed changes

production/db/core/src/db_client.cpp Outdated Show resolved Hide resolved

LaurentiuCristofor approved these changes Sep 15, 2021

View reviewed changes

simone-gaia approved these changes Sep 15, 2021

View reviewed changes

senderista added 2 commits September 15, 2021 11:31

Merge branch 'master' into tobin/fix_iterator_leak

ab91b81

fix naming

95a3e11

senderista merged commit a863109 into master Sep 15, 2021

senderista deleted the tobin/fix_iterator_leak branch September 15, 2021 18:42

LaurentiuCristofor reviewed Sep 15, 2021

View reviewed changes

senderista mentioned this pull request Sep 16, 2021

Eagerly close cursor socket on generator exhaustion #919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cursor thread leak from closing unconsumed iterators #917

Fix cursor thread leak from closing unconsumed iterators #917

senderista commented Sep 15, 2021 •

edited

Loading

LaurentiuCristofor left a comment

simone-gaia left a comment

senderista commented Sep 15, 2021 •

edited

Loading

senderista commented Sep 15, 2021

simone-gaia commented Sep 15, 2021

JackAtGaia commented Sep 15, 2021

LaurentiuCristofor Sep 15, 2021

senderista Sep 15, 2021

LaurentiuCristofor Sep 15, 2021

senderista Sep 15, 2021

senderista Sep 15, 2021

LaurentiuCristofor Sep 15, 2021

senderista Sep 15, 2021 •

edited

Loading

senderista Sep 15, 2021 •

edited

Loading

Fix cursor thread leak from closing unconsumed iterators #917

Fix cursor thread leak from closing unconsumed iterators #917

Conversation

senderista commented Sep 15, 2021 • edited Loading

LaurentiuCristofor left a comment

Choose a reason for hiding this comment

simone-gaia left a comment

Choose a reason for hiding this comment

senderista commented Sep 15, 2021 • edited Loading

senderista commented Sep 15, 2021

simone-gaia commented Sep 15, 2021

JackAtGaia commented Sep 15, 2021

LaurentiuCristofor Sep 15, 2021

Choose a reason for hiding this comment

senderista Sep 15, 2021

Choose a reason for hiding this comment

LaurentiuCristofor Sep 15, 2021

Choose a reason for hiding this comment

senderista Sep 15, 2021

Choose a reason for hiding this comment

senderista Sep 15, 2021

Choose a reason for hiding this comment

LaurentiuCristofor Sep 15, 2021

Choose a reason for hiding this comment

senderista Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

senderista Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

senderista commented Sep 15, 2021 •

edited

Loading

senderista commented Sep 15, 2021 •

edited

Loading

senderista Sep 15, 2021 •

edited

Loading

senderista Sep 15, 2021 •

edited

Loading