-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trio.socket.getaddrinfo may invoke a worker thread when it doesn't need to #1249
Comments
Huh. That's some impressive attention to edge cases. I approve :-). Is also appreciate hearing more about what kind of use case makes this detail important! Our |
It's not actually about the threads, for me. I was writing a The larger goal is to substitute |
Not sure if this is related, but I am seeing Trio freeze up completely in "await socket.getnameinfo((ip, 0), 0)" until it eventually throws a "transient failure" exception. (By freeze up completely, I mean--presumably--something that should have gone to a worker thread hasn't: My entire application hangs for a second or two until the call bails.) Note this call normally succeeds and everything works fine but once in a blue moon it hangs and ultimately fails. [Update: More like 5-10 seconds. And the exact error is "[Errno -3] Temporary failure in name resolution"] |
trio.socket.getaddrinfo
tries to avoid invoking a worker thread when it knows that stdlib getaddrinfo won't block: the host is already an IP address and the port is already numeric. It does this by calling stdlib getaddrinfo withAI_NUMERICHOST|AI_NUMERICSERV
added to the flags. If that call succeeds, it returns immediately, otherwise it calls getaddrinfo in a worker thread.The problem is that trio assumes that a failure of this initial call is because we need to do a host and/or service lookup. This is not necessarily true. The most important case is when the caller of
trio.socket.getaddrinfo
already setAI_NUMERICHOST|AI_NUMERICSERV
in the flags; in that case the call in the worker thread is going to fail in the exact same way that the call in the main thread failed, and there's no point going through the thread.Failures for unrelated reasons (e.g. inappropriate address family) will also fail the same way in the worker threadFailures are only retried in the worker thread if the getaddrinfo error code isEAI_NONAME
, so we only need to worry about the cases where that can happen, but there are still several types of invalid "host" and/or "port" argument that will produceEAI_NONAME
and there's no point retrying, even if caller didn't setAI_NUMERICHOST|AI_NUMERICSERV
.To fix this, getaddrinfo just needs to pay a little more attention to why the initial call failed and what the caller's flags were, and not retry calls that will provably fail the same way in the worker. This will also let us tell the authors of custom
HostResolver
implementations that they will not be called when both the host and the service were already numeric (but it's still possible for them to get an IP address with a service keyword, or a domain name with a numeric port) and they can rely on the rest of the arguments to have been validated already -- in particular they do not need to worry about the address family being something other thanAF_INET
,AF_INET6
, orAF_UNSPEC
.I plan to work on this myself but I am filing this issue in advance to get advice about testing. Is there a good way to test "this operation does not do anything in a worker thread" already? I suppose I could define a test
HostResolver
that bombs out if called, since the logic is "call custom HR or else call stdlib getaddrinfo in a worker" if we get to that point, but maybe someone has a better idea?The text was updated successfully, but these errors were encountered: