Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cursor thread leak from closing unconsumed iterators #917

Merged
merged 4 commits into from
Sep 15, 2021

Conversation

senderista
Copy link
Contributor

@senderista senderista commented Sep 15, 2021

This is not a final solution, just an expedient for the Preview release. In V1 I plan to implement a principled solution for file descriptor lifetime management (minimally, newtype wrappers over shared_ptr/unique_ptr, similar to the solution here; maximally, a per-process "shadow fd table" that allows us to detect access and freeing of reused fds as well as already-closed fds).

@JackAtGaia will be testing this in parallel to confirm that it fixes the observed thread leak in the DB server.

BTW, an unanticipated benefit of dynamically allocating memory storing the fd is that (provided the fd value doesn't escape via a rogue copy) it converts an fd leak into a memory leak, and thus makes fd leak detection trivial given our existing tooling (i.e., LeakSanitizer).

Copy link
Contributor

@LaurentiuCristofor LaurentiuCristofor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we should avoid Hungarian notation or derivatives. Let's just use fd instead of pfd and stream_socket instead of stream_socket_ptr.

Copy link
Contributor

@simone-gaia simone-gaia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @JackAtGaia maybe we want a JIRA to add a regression test?

@senderista
Copy link
Contributor Author

senderista commented Sep 15, 2021

Looks good, but we should avoid Hungarian notation or derivatives. Let's just use fd instead of pfd and stream_socket instead of stream_socket_ptr.

I disagree. In this case I think the semantics of a shared_ptr are different enough from an int that they're worth expressing in the name. Otherwise it's unclear why you're dereferencing a variable that is expected to just be an int Same goes for pfd: if you just call it fd then it's unclear why you're calling delete on an int.

@senderista
Copy link
Contributor Author

LGTM, @JackAtGaia maybe we want a JIRA to add a regression test?

I would add a unit test if I could see a reasonable way for a unit test to catch regressions from thread leaks, but I can't. I think regressions like this will have to be caught by an end-to-end stress test.

@simone-gaia
Copy link
Contributor

@senderista that's why I summoned Jack :D

@JackAtGaia
Copy link

Testing comment, not a code quality comment. Tested this out on a build provided by Tobin, and I am not seeing any thread count increase with every action.

@senderista senderista merged commit a863109 into master Sep 15, 2021
@senderista senderista deleted the tobin/fix_iterator_leak branch September 15, 2021 18:42
// same effect with an RAII wrapper, but it would need to have copy rather
// than move semantics, since the socket is captured by a lambda that must
// be copyable (since it is coerced to std::function).
std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late suggestion: why not move this code above and get rid of the scope guard completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below for why. The asserts are not why we need the scope_guard (they should be expected to terminate the program, so the socket would be closed in any case).

// same effect with an RAII wrapper, but it would need to have copy rather
// than move semantics, since the socket is captured by a lambda that must
// be copyable (since it is coerced to std::function).
std::shared_ptr<int> stream_socket_ptr(new int{stream_socket}, [](int* fd_ptr) { close_fd(*fd_ptr); delete fd_ptr; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too we could probably move this code and eliminate the scope_guard. The asserts can happen after the wrapping into the shared_ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the scope_guard regardless, because dynamic allocation can always throw, and we need to close the socket in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There are actually 2 dynamic allocations happening here: one explicit, which is our operator new call, and the other implicit, which is shared_ptr allocating its shared refcount structure on the heap. make_shared() combines these into a single allocation, but we can't use it because it doesn't support custom deleters.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then why remove the scope guard in the other situations? You could have kept it for the same reason.

Anyway, I don't think we should worry about cleanup in the case that new() fails - in that case the server would stop execution anyway.

Copy link
Contributor Author

@senderista senderista Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that in general we don't try to recover from exceptions on the server (unless they're from a misbehaving or crashed client), but the iterator code is all client-side at least conceptually (i.e., it consumes a "cursor socket" sent over the session socket by the server). We don't control the exception handling policy in a client application (e.g., the client might try to recover from a std::bad_alloc exception thrown by the stream_socket_ptr allocation by freeing some memory that they own). That's why exception safety is important to get right on the client, while it can be treated as a mostly theoretical concern on the server (but I still treat non-exception-safety as a bug there as well).

Copy link
Contributor Author

@senderista senderista Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then why remove the scope guard in the other situations? You could have kept it for the same reason.

Not totally sure which "other situations" you're referring to, but the reason I removed scope_guard in the consumers of stream_socket_ptr is that the shared_ptr destructor now closes the socket if an exception is thrown in one of the consumers, so exception safety no longer requires a scope_guard there. The key difference is that the shared_ptr has already been successfully constructed at that point, so it now owns the socket and its destructor is responsible for closing the socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants