Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DnsServiceDiscoverer: add total resolution timeout #2908

Merged
merged 3 commits into from
May 10, 2024

Conversation

idelpivnitskiy
Copy link
Member

Motivation:

The existing DnsServiceDiscovererBuilder.queryTimeout is applied per each UDP/TCP query. If resolution requires multiple CNAME resolutions, retries on a different NS server, or applies a list of search domains, the total time is not capped. As a result, resolution can take significantly longer than originally anticipated.

Modifications:

  • Add DnsServiceDiscovererBuilder.resolutionTimeout option that can be used to cap total resolution duration with default value set to 2 x queryTimeout;
  • Add tests for queryTimeout and resolutionTimeout;

Result:

Users can configure the total resolution timeout.

Motivation:

The existing `DnsServiceDiscovererBuilder.queryTimeout` is applied per
each UDP/TCP query. If resolution requires multiple CNAME resolutions,
retries on a different NS server, or applies a list of search domains,
the total time is not capped. As a result, resolution can take
significantly longer than originally anticipated.

Modifications:

- Add `DnsServiceDiscovererBuilder.resolutionTimeout` option that can be
used to cap total resolution duration with default value set to
`2 x queryTimeout`;
- Add tests for `queryTimeout` and `resolutionTimeout`;

Result:

Users can configure the total resolution timeout.
@idelpivnitskiy idelpivnitskiy self-assigned this May 6, 2024
this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() :
// Default value is chosen based on a combination of default "timeout" and "attempts" options of
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
resolver.queryTimeoutMillis() * 2;
Copy link
Member Author

@idelpivnitskiy idelpivnitskiy May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open for alternative suggestions. Originally, I planned to use resolver.queryTimeoutMillis() * resolver.maxQueriesPerResolve(), but Netty's default values are:

queryTimeout = 5 sec
maxQueriesPerResolve = 16

which results in 80 sec and seems unreasonably high.

Decided to go with x2 because because the linux default for attempts (maxQueriesPerResolve) is 2 (see https://man7.org/linux/man-pages/man5/resolv.conf.5.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, this seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me as well 👍

this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() :
// Default value is chosen based on a combination of default "timeout" and "attempts" options of
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
resolver.queryTimeoutMillis() * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, this seems reasonable to me.

}

@Override
public DefaultDnsServiceDiscovererBuilder resolutionTimeout(final @Nullable Duration resolutionTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the name between resolution and query is quite close and not directly obvious how they are different. WDYT about using "total" in the name to make it more clear - i.e. totalResolutionTimeout in addition to the javadoc clarification you have in place?

Copy link
Member Author

@idelpivnitskiy idelpivnitskiy May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated between adding "total" or not in the name as well. Decided to drop it for 2 reasons:

  1. Consistency with naming in DnsServiceDiscovererObserver.DnsResolutionObserver. This timeout starts when onNewResolution is called and stops when DnsResolutionObserver terminates.
  2. Bcz of the SRV lookups, it's not really "total".

this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() :
// Default value is chosen based on a combination of default "timeout" and "attempts" options of
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
resolver.queryTimeoutMillis() * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me as well 👍

@@ -182,19 +182,46 @@ DnsServiceDiscovererBuilder dnsServerAddressStreamProvider(

/**
* Set the number of dots which must appear in a name before an initial absolute query is made.
* <p>
* If not set, the default value is read from {@code ndots} option of {@code /etc/resolv.conf}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it is not present in the resolv.conf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for resolv.conf clarify the behavior if value is not specified in the file: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
netty follows the same behavior. I would prefer to avoid documenting the behavior ST does not control directly, bcz either linux or netty behavior can change and it's hard to track to keep docs consistent.

* <p>
* Each resolution may execute one or more DNS queries, like following multiple CNAME(s) or trying different search
* domains. This is the total timeout for all intermediate queries involved in a single resolution request. Note,
* that <a href="https://tools.ietf.org/html/rfc2782">SRV</a> resolutions may generate independent resolutions for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a slightly different conversations, but I wonder if this is the right behavior. Even if technically it's two different resolutions, from a user perspective I still want to apply a "total" timeout, right? Since in the end they are just individual queries against a nameserver where we need a total timeout to cap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I don't like this fact either but it's what it's. the way SRV resolutions are implemented right now makes it too complicated to apply a timeout at SRV+all_A_resolutions level. requires a big refactoring.
taking into account, they are less frequent and used only for BACKGROUND discovery strategy, the need for a proper total timeout is less.

@idelpivnitskiy
Copy link
Member Author

@daschl I will merge to avoid conflicts with #2918. Lmk if you have more comments and I will open a follow-up

@idelpivnitskiy idelpivnitskiy merged commit c85a8f4 into apple:main May 10, 2024
15 checks passed
@idelpivnitskiy idelpivnitskiy deleted the resolutionTimeout branch May 10, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants