Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33280: m_apport in HTTP threads and server are not NULL #19439

Open
wants to merge 2 commits into
base: candidate-9.10.x
Choose a base branch
from

Conversation

timothyklemm
Copy link
Contributor

@timothyklemm timothyklemm commented Jan 21, 2025

Rely on the HTTP protocols to ensure that the apport value supplied first to threads, and then servers, will not be NULL.

  • Remove checks for NULL in the server implementation.
  • Remove private constructors that cannot be safely used.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Rely on the HTTP protocols to ensure that the apport value supplied first to
threads, and then servers, will not be NULL.
- Remove checks for NULL in the server implementation.
- Remove private constructors that cannot be safely used.

Signed-off-by: Tim Klemm <[email protected]>
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33280

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyklemm the changes seem* fine. However it's not abundantly clear what benefit this change provides. It's not clear from the changes how m_apport is safe to use in all code paths.

@timothyklemm
Copy link
Contributor Author

The protocol classes throw exceptions if the pointer is NULL before creating threads, which in turn create servers. The thread and server classes already make multiple assumptions about the pointer not being NULL. The server method from which I removed the two checks for NULL starts by dereferencing the pointer without checking.

The issue came up because my change to the server span creation time inserted an unnecessary check which was flagged by the most recent Coverity scan. If it was necessary to check where I added the check, then it would also be necessary to check before subsequent references. Unfortunately, the scan hasn't been able to point out that if the pre-existing check was necessary, all of the preceding references would also require checks.

- Change interface signatures to pass required data by reference.
- Refactor pooled thread usage by protocol classes to simplify pass-by-
  reference while standardizing error handling.
throw;
}
delete [] holder;
PooledThreadInfo pti(*accepted, *apport);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct is much cleaner than the generic array.
But this needs to be tested exhaustively, I'd also like to ask @asselitx to review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@timothyklemm timothyklemm requested a review from asselitx January 24, 2025 14:12
Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyklemm left a few questions/concerns.
Overall this seems like a good change, but it doesn't seem to match the commit title. Let's make sure the commit title and message match the changes and informs the reviewer.

PooledThreadInfo(ISocket& _socket, CEspApplicationPort& _apport) : socket(_socket), apport(_apport) {}
~PooledThreadInfo()
{
#if __cplusplus >= 201703L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like this a lot, and yes, it might be an appropriate pattern elsewhere such as jtrace.
Other than the new log output, are there any other side effects?
What happens to the exception? In the pre-existing code, there's a throw which I don't see here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no error handling in the secure protocol. The socket was not being closed and now will be. The structure and its usage address two unhandled memory leaks.

I wrote a simple test on my Mac, because a vcpkg patch file states that this function can't be used in Apple builds (and I hadn't noticed it was already being used in the platform). A destructor inside a try block observed the exception and the catch block still caught it. For this case, we're observing without interfering with the standard stack unwind behavior.

~PooledThreadInfo()
{
#if __cplusplus >= 201703L
if (std::uncaught_exceptions() > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar w/ this approach. Is this count per thread? Is there any information about the exceptions available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is per thread. Exceptions could be captured if the destructor needed to know details about what caused the stack to unwind. In this instance, we were not showing interest in what caused the failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking exception handling in the thread pool classes and code downstream. Could we be subject to a situation (either now or in a well-meaning future change) where an exception is caught and already closes the socket then is re-thrown? Is it safe to possibly call close twice?

Or what if an exception is caught but the socket is never closed? Maybe this falls under the category of "thinking too hard about what could go wrong if people do stupid things and you can't prevent every future mistake".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As implemented, a double close is safe - the first close clears a value that must be set for the second close to have an effect.

We already have an example of not explicitly closing the socket, because the exception isn't caught early enough. It has been suggested to me that preceding the close with an explicit shutdown could be an improvement. Reinforces my opinion that never closing is incorrect behavior.

Could somebody catch the exception because the comment stating that exception cleanup was happening elsewhere was not clear enough? Yes. Is it likely enough to warrant pre-empting it? I'll defer to the reviewers.

pti.persistentHandler = persistentHandler;
pti.shouldClose = shouldClose;
// cleanup on exception is handled by pti
http_thread_pool->start((void*)&pti, "", m_threadCreateTimeout > 0?m_threadCreateTimeout*1000:0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any exceptions we should be catching and handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the original code, the answer is no. Instead of catching, reacting to, and re-throwing all exceptions, pti's destructor will react to the existence of an exception without capturing it.

As for the destructor's abbreviated handler relative to what was here, there is no longer a socket reference to be released nor is there a heap allocation to be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure I'm understanding, you no longer need to call accepted->Release() because you aren't incrementing the link count when you're stuffing accepted into the pti, unlike what was done using the void * array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pooled thread sets the socket, creating its own link which it is responsible for. Since the protocol isn't creating a link, it doesn't need to release it.

@@ -430,8 +430,6 @@ int CEspHttpServer::processRequest()
espGetMethod = EspGetMethod::Unhandled;
}
}
else if (!m_apport)
wantTracing = false;
Owned<ISpan> serverSpan;
if (wantTracing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see what else affects "wantTracing" but if it's only dependent on !m_apport, we prob don't need this check anymore. If there are other variables affecting it, ignore this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag is affected by tracing being enabled and also by certain "esp" service GET requests that were processed prior to the original creation of the span (look just before the start of processRequest to see the method names that process without tracing).

}
ctx->addTraceSummaryTimeStamp(LogMin, "handleHttp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was difficult to determine if there were any functional changes in this block, assuming it was a shift due to the removal of the nullptr check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct,

Copy link
Contributor

@asselitx asselitx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good as long as none of my questions about the exception throwing/catching warrant a change.

Like Rodrigo said, it would be good to test thoroughly locally including leaks and exception conditions. Check with Mark and Attila to see if any existing tests cover this code, and if not, add some if it is reasonable (esp. throwing an exception that would cause socket close).

pti.persistentHandler = persistentHandler;
pti.shouldClose = shouldClose;
// cleanup on exception is handled by pti
http_thread_pool->start((void*)&pti, "", m_threadCreateTimeout > 0?m_threadCreateTimeout*1000:0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure I'm understanding, you no longer need to call accepted->Release() because you aren't incrementing the link count when you're stuffing accepted into the pti, unlike what was done using the void * array.

~PooledThreadInfo()
{
#if __cplusplus >= 201703L
if (std::uncaught_exceptions() > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking exception handling in the thread pool classes and code downstream. Could we be subject to a situation (either now or in a well-meaning future change) where an exception is caught and already closes the socket then is re-thrown? Is it safe to possibly call close twice?

Or what if an exception is caught but the socket is never closed? Maybe this falls under the category of "thinking too hard about what could go wrong if people do stupid things and you can't prevent every future mistake".

esp/bindings/http/platform/httpprot.cpp Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants