You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In lib/http/HttpClient_Curl.hpp, WaitOnSocket() uses select(), which is limited to monitoring file descriptors less than 1024. On a system that creates a large number of file handles (e.g. for client sockets), this limit will be exceeded and cause a crash in WaitOnSocket(). While I am using the library on FreeBSD, this problem would also apply to other platforms, namely Linux.
WARNING: select() can monitor only file descriptors numbers that
are less than FD_SETSIZE (1024)—an unreasonably low limit for
many modern applications—and this limitation will not change.
All modern applications should instead use poll(2) or epoll(7),
which do not suffer this limitation.
There was a limit on the total number of concurrent simultaneous HTTPS connections that SDK makes to a single HTTP server in production. This is configured here . I wonder if it's happening in TEST or in production, and if there's a bug in a higher-level abstraction that doesn't properly limit the number of concurrent connections.
While the patch here is valid, I think we need to double-check why the higher-level code allows for more than 4 pending requests. This code is supposed to limit it:
I think the fix makes sense in OpenTelemetry. But it's not exactly clear why we don't get the higher-level TPM limit to apply in 1DS. In 1DS C++ SDK the limit should've prevented the bug from happening. Maybe there's something in the HTTP curl client wrapper that doesn't clean and leaking the descriptors after the request is done? i.e. uploadCount() returns 0. But the actual number of fds remains high and is incremented over time.
In lib/http/HttpClient_Curl.hpp, WaitOnSocket() uses select(), which is limited to monitoring file descriptors less than 1024. On a system that creates a large number of file handles (e.g. for client sockets), this limit will be exceeded and cause a crash in WaitOnSocket(). While I am using the library on FreeBSD, this problem would also apply to other platforms, namely Linux.
The Linux man page for select() describes this limitation and suggests using poll or epoll instead:
The crash is also reported for CentOS in the Open Telemetry C++ SDK which shares the same code-base for the curl http-client library with 1ds-cpp:
Issue: open-telemetry/opentelemetry-cpp#1220
PR for fix: open-telemetry/opentelemetry-cpp#1410
My colleague created the following simple patch to use poll() instead of select() which we apply in our FreeBSD port build to prevent the crash:
The text was updated successfully, but these errors were encountered: