-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-33280: m_apport in HTTP threads and server are not NULL #19439
base: candidate-9.10.x
Are you sure you want to change the base?
HPCC-33280: m_apport in HTTP threads and server are not NULL #19439
Conversation
Rely on the HTTP protocols to ensure that the apport value supplied first to threads, and then servers, will not be NULL. - Remove checks for NULL in the server implementation. - Remove private constructors that cannot be safely used. Signed-off-by: Tim Klemm <[email protected]>
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33280 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothyklemm the changes seem* fine. However it's not abundantly clear what benefit this change provides. It's not clear from the changes how m_apport is safe to use in all code paths.
The protocol classes throw exceptions if the pointer is NULL before creating threads, which in turn create servers. The thread and server classes already make multiple assumptions about the pointer not being NULL. The server method from which I removed the two checks for NULL starts by dereferencing the pointer without checking. The issue came up because my change to the server span creation time inserted an unnecessary check which was flagged by the most recent Coverity scan. If it was necessary to check where I added the check, then it would also be necessary to check before subsequent references. Unfortunately, the scan hasn't been able to point out that if the pre-existing check was necessary, all of the preceding references would also require checks. |
- Change interface signatures to pass required data by reference. - Refactor pooled thread usage by protocol classes to simplify pass-by- reference while standardizing error handling.
throw; | ||
} | ||
delete [] holder; | ||
PooledThreadInfo pti(*accepted, *apport); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct is much cleaner than the generic array.
But this needs to be tested exhaustively, I'd also like to ask @asselitx to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothyklemm left a few questions/concerns.
Overall this seems like a good change, but it doesn't seem to match the commit title. Let's make sure the commit title and message match the changes and informs the reviewer.
PooledThreadInfo(ISocket& _socket, CEspApplicationPort& _apport) : socket(_socket), apport(_apport) {} | ||
~PooledThreadInfo() | ||
{ | ||
#if __cplusplus >= 201703L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like this a lot, and yes, it might be an appropriate pattern elsewhere such as jtrace.
Other than the new log output, are there any other side effects?
What happens to the exception? In the pre-existing code, there's a throw which I don't see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had no error handling in the secure protocol. The socket was not being closed and now will be. The structure and its usage address two unhandled memory leaks.
I wrote a simple test on my Mac, because a vcpkg patch file states that this function can't be used in Apple builds (and I hadn't noticed it was already being used in the platform). A destructor inside a try block observed the exception and the catch block still caught it. For this case, we're observing without interfering with the standard stack unwind behavior.
~PooledThreadInfo() | ||
{ | ||
#if __cplusplus >= 201703L | ||
if (std::uncaught_exceptions() > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar w/ this approach. Is this count per thread? Is there any information about the exceptions available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is per thread. Exceptions could be captured if the destructor needed to know details about what caused the stack to unwind. In this instance, we were not showing interest in what caused the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking exception handling in the thread pool classes and code downstream. Could we be subject to a situation (either now or in a well-meaning future change) where an exception is caught and already closes the socket then is re-thrown? Is it safe to possibly call close twice?
Or what if an exception is caught but the socket is never closed? Maybe this falls under the category of "thinking too hard about what could go wrong if people do stupid things and you can't prevent every future mistake".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As implemented, a double close is safe - the first close clears a value that must be set for the second close to have an effect.
We already have an example of not explicitly closing the socket, because the exception isn't caught early enough. It has been suggested to me that preceding the close with an explicit shutdown could be an improvement. Reinforces my opinion that never closing is incorrect behavior.
Could somebody catch the exception because the comment stating that exception cleanup was happening elsewhere was not clear enough? Yes. Is it likely enough to warrant pre-empting it? I'll defer to the reviewers.
pti.persistentHandler = persistentHandler; | ||
pti.shouldClose = shouldClose; | ||
// cleanup on exception is handled by pti | ||
http_thread_pool->start((void*)&pti, "", m_threadCreateTimeout > 0?m_threadCreateTimeout*1000:0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any exceptions we should be catching and handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the original code, the answer is no. Instead of catching, reacting to, and re-throwing all exceptions, pti's destructor will react to the existence of an exception without capturing it.
As for the destructor's abbreviated handler relative to what was here, there is no longer a socket reference to be released nor is there a heap allocation to be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure I'm understanding, you no longer need to call accepted->Release()
because you aren't incrementing the link count when you're stuffing accepted into the pti
, unlike what was done using the void * array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pooled thread sets the socket, creating its own link which it is responsible for. Since the protocol isn't creating a link, it doesn't need to release it.
@@ -430,8 +430,6 @@ int CEspHttpServer::processRequest() | |||
espGetMethod = EspGetMethod::Unhandled; | |||
} | |||
} | |||
else if (!m_apport) | |||
wantTracing = false; | |||
Owned<ISpan> serverSpan; | |||
if (wantTracing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see what else affects "wantTracing" but if it's only dependent on !m_apport, we prob don't need this check anymore. If there are other variables affecting it, ignore this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag is affected by tracing being enabled and also by certain "esp" service GET requests that were processed prior to the original creation of the span (look just before the start of processRequest to see the method names that process without tracing).
} | ||
ctx->addTraceSummaryTimeStamp(LogMin, "handleHttp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was difficult to determine if there were any functional changes in this block, assuming it was a shift due to the removal of the nullptr check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good as long as none of my questions about the exception throwing/catching warrant a change.
Like Rodrigo said, it would be good to test thoroughly locally including leaks and exception conditions. Check with Mark and Attila to see if any existing tests cover this code, and if not, add some if it is reasonable (esp. throwing an exception that would cause socket close).
pti.persistentHandler = persistentHandler; | ||
pti.shouldClose = shouldClose; | ||
// cleanup on exception is handled by pti | ||
http_thread_pool->start((void*)&pti, "", m_threadCreateTimeout > 0?m_threadCreateTimeout*1000:0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure I'm understanding, you no longer need to call accepted->Release()
because you aren't incrementing the link count when you're stuffing accepted into the pti
, unlike what was done using the void * array.
~PooledThreadInfo() | ||
{ | ||
#if __cplusplus >= 201703L | ||
if (std::uncaught_exceptions() > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking exception handling in the thread pool classes and code downstream. Could we be subject to a situation (either now or in a well-meaning future change) where an exception is caught and already closes the socket then is re-thrown? Is it safe to possibly call close twice?
Or what if an exception is caught but the socket is never closed? Maybe this falls under the category of "thinking too hard about what could go wrong if people do stupid things and you can't prevent every future mistake".
Rely on the HTTP protocols to ensure that the apport value supplied first to threads, and then servers, will not be NULL.
Type of change:
Checklist:
Smoketest:
Testing: