-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cursor thread leak from closing unconsumed iterators #917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but we should avoid Hungarian notation or derivatives. Let's just use fd
instead of pfd
and stream_socket
instead of stream_socket_ptr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @JackAtGaia maybe we want a JIRA to add a regression test?
I disagree. In this case I think the semantics of a |
I would add a unit test if I could see a reasonable way for a unit test to catch regressions from thread leaks, but I can't. I think regressions like this will have to be caught by an end-to-end stress test. |
@senderista that's why I summoned Jack :D |
Testing comment, not a code quality comment. Tested this out on a build provided by Tobin, and I am not seeing any thread count increase with every action. |
// same effect with an RAII wrapper, but it would need to have copy rather | ||
// than move semantics, since the socket is captured by a lambda that must | ||
// be copyable (since it is coerced to std::function). | ||
std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late suggestion: why not move this code above and get rid of the scope guard completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below for why. The asserts are not why we need the scope_guard (they should be expected to terminate the program, so the socket would be closed in any case).
// same effect with an RAII wrapper, but it would need to have copy rather | ||
// than move semantics, since the socket is captured by a lambda that must | ||
// be copyable (since it is coerced to std::function). | ||
std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too we could probably move this code and eliminate the scope_guard. The asserts can happen after the wrapping into the shared_ptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the scope_guard regardless, because dynamic allocation can always throw, and we need to close the socket in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(There are actually 2 dynamic allocations happening here: one explicit, which is our operator new
call, and the other implicit, which is shared_ptr
allocating its shared refcount structure on the heap. make_shared()
combines these into a single allocation, but we can't use it because it doesn't support custom deleters.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then why remove the scope guard in the other situations? You could have kept it for the same reason.
Anyway, I don't think we should worry about cleanup in the case that new()
fails - in that case the server would stop execution anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that in general we don't try to recover from exceptions on the server (unless they're from a misbehaving or crashed client), but the iterator code is all client-side at least conceptually (i.e., it consumes a "cursor socket" sent over the session socket by the server). We don't control the exception handling policy in a client application (e.g., the client might try to recover from a std::bad_alloc
exception thrown by the stream_socket_ptr
allocation by freeing some memory that they own). That's why exception safety is important to get right on the client, while it can be treated as a mostly theoretical concern on the server (but I still treat non-exception-safety as a bug there as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then why remove the scope guard in the other situations? You could have kept it for the same reason.
Not totally sure which "other situations" you're referring to, but the reason I removed scope_guard
in the consumers of stream_socket_ptr
is that the shared_ptr
destructor now closes the socket if an exception is thrown in one of the consumers, so exception safety no longer requires a scope_guard
there. The key difference is that the shared_ptr
has already been successfully constructed at that point, so it now owns the socket and its destructor is responsible for closing the socket.
This is not a final solution, just an expedient for the Preview release. In V1 I plan to implement a principled solution for file descriptor lifetime management (minimally, newtype wrappers over shared_ptr/unique_ptr, similar to the solution here; maximally, a per-process "shadow fd table" that allows us to detect access and freeing of reused fds as well as already-closed fds).
@JackAtGaia will be testing this in parallel to confirm that it fixes the observed thread leak in the DB server.
BTW, an unanticipated benefit of dynamically allocating memory storing the fd is that (provided the fd value doesn't escape via a rogue copy) it converts an fd leak into a memory leak, and thus makes fd leak detection trivial given our existing tooling (i.e., LeakSanitizer).