-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DnsServiceDiscoverer
: add total resolution timeout
#2908
Conversation
Motivation: The existing `DnsServiceDiscovererBuilder.queryTimeout` is applied per each UDP/TCP query. If resolution requires multiple CNAME resolutions, retries on a different NS server, or applies a list of search domains, the total time is not capped. As a result, resolution can take significantly longer than originally anticipated. Modifications: - Add `DnsServiceDiscovererBuilder.resolutionTimeout` option that can be used to cap total resolution duration with default value set to `2 x queryTimeout`; - Add tests for `queryTimeout` and `resolutionTimeout`; Result: Users can configure the total resolution timeout.
this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() : | ||
// Default value is chosen based on a combination of default "timeout" and "attempts" options of | ||
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html | ||
resolver.queryTimeoutMillis() * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open for alternative suggestions. Originally, I planned to use resolver.queryTimeoutMillis() * resolver.maxQueriesPerResolve()
, but Netty's default values are:
queryTimeout = 5 sec
maxQueriesPerResolve = 16
which results in 80 sec and seems unreasonably high.
Decided to go with x2 because because the linux default for attempts
(maxQueriesPerResolve) is 2 (see https://man7.org/linux/man-pages/man5/resolv.conf.5.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, this seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me as well 👍
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() : | ||
// Default value is chosen based on a combination of default "timeout" and "attempts" options of | ||
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html | ||
resolver.queryTimeoutMillis() * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, this seems reasonable to me.
...k-dns-discovery-netty/src/main/java/io/servicetalk/dns/discovery/netty/DefaultDnsClient.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public DefaultDnsServiceDiscovererBuilder resolutionTimeout(final @Nullable Duration resolutionTimeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the name between resolution
and query
is quite close and not directly obvious how they are different. WDYT about using "total" in the name to make it more clear - i.e. totalResolutionTimeout
in addition to the javadoc clarification you have in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated between adding "total" or not in the name as well. Decided to drop it for 2 reasons:
- Consistency with naming in
DnsServiceDiscovererObserver.DnsResolutionObserver
. This timeout starts whenonNewResolution
is called and stops whenDnsResolutionObserver
terminates. - Bcz of the SRV lookups, it's not really "total".
this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() : | ||
// Default value is chosen based on a combination of default "timeout" and "attempts" options of | ||
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html | ||
resolver.queryTimeoutMillis() * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me as well 👍
@@ -182,19 +182,46 @@ DnsServiceDiscovererBuilder dnsServerAddressStreamProvider( | |||
|
|||
/** | |||
* Set the number of dots which must appear in a name before an initial absolute query is made. | |||
* <p> | |||
* If not set, the default value is read from {@code ndots} option of {@code /etc/resolv.conf}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it is not present in the resolv.conf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for resolv.conf clarify the behavior if value is not specified in the file: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
netty follows the same behavior. I would prefer to avoid documenting the behavior ST does not control directly, bcz either linux or netty behavior can change and it's hard to track to keep docs consistent.
* <p> | ||
* Each resolution may execute one or more DNS queries, like following multiple CNAME(s) or trying different search | ||
* domains. This is the total timeout for all intermediate queries involved in a single resolution request. Note, | ||
* that <a href="https://tools.ietf.org/html/rfc2782">SRV</a> resolutions may generate independent resolutions for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a slightly different conversations, but I wonder if this is the right behavior. Even if technically it's two different resolutions, from a user perspective I still want to apply a "total" timeout, right? Since in the end they are just individual queries against a nameserver where we need a total timeout to cap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I don't like this fact either but it's what it's. the way SRV resolutions are implemented right now makes it too complicated to apply a timeout at SRV+all_A_resolutions level. requires a big refactoring.
taking into account, they are less frequent and used only for BACKGROUND
discovery strategy, the need for a proper total timeout is less.
Motivation:
The existing
DnsServiceDiscovererBuilder.queryTimeout
is applied per each UDP/TCP query. If resolution requires multiple CNAME resolutions, retries on a different NS server, or applies a list of search domains, the total time is not capped. As a result, resolution can take significantly longer than originally anticipated.Modifications:
DnsServiceDiscovererBuilder.resolutionTimeout
option that can be used to cap total resolution duration with default value set to2 x queryTimeout
;queryTimeout
andresolutionTimeout
;Result:
Users can configure the total resolution timeout.