Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --retries global to a name and try with other name servers in a given layer #451

Merged
merged 25 commits into from
Oct 2, 2024

Conversation

phillip-stephens
Copy link
Contributor

@phillip-stephens phillip-stephens commented Sep 19, 2024

Closes #93

Description

  • makes --retries global to a name lookup
    • bump default retries from --retries=1 to --retries=3. Since retries are no longer just for a single nameserver, we need more and found 3 offered a good balance with runtime/errors.
  • In --iterative, if ZDNS cannot reach (either due to timeout or some other error) a nameserver, a random alternative one will be used from the same layer
  • Added --network-timeout to let the user specify the connection timeout for a single on-the-wire network call
  • consolidated cachedRetryingLookup and retryingLookup into just cachedLookup since we moved the retry logic up a layer.
    • In order for ZDNS to have time to retry another nameserver in the --iteration-timeout, we have to set this smaller. I figured the user should have control over this, but I found 2 seconds worked well.
  • added retry logic to a new cyclingLookup function. cyclingLookup takes in an array of nameservers and provided there are --retries available, will attempt to connect/query a new random un-queried nameserver upon failure.
    • If all nameservers have been queried, we'll restart the process, treating all nameservers as unqueried

Performance Implications

The aspect of this change that could theoretically affect performance for some queries is prior to this change, during iterateOnAuthorities, we'd pick the first authority that we could resolve (need to perform an A lookup to get the authority's IP address if it isn't provided in Additionals) to proceed with the iterativeLookup process. In order for cyclingLookup to be able to cycle through a list of potential nameservers, iterateOnAuthorities has been changed to collect all authorities sequentially.
- So if google.com has ns1 - ns4, we'll have to send 4 sequential queries for the IP addresses.
- This means resolving a domain will take longer in some cases (especially where upstream nameservers don't provide the IP addresses in Additionals. I extended the various --timeouts to help. Usually these are provided in Additionals though and so there's no impact.

I used the top 70k CrUX domains with default flags for both branch and main
time head -n 70000 current.csv| ./zdns A --iterative

Phillip/93-domain-retries

Trial Runtime Failed to Resolve Timeouts
1 2 m 58s 1165 95
2 3 m 3s 1173 107
3 3 m 3 s 1171 97
Avg 3 m 1s 1170 99

main

Trial Runtime Failed to Resolve Timeouts
1 3 m 14s 1204 139
2 3 m 24s 1203 138
3 3 m. 27s 1203 139
Avg 3m 22s 1203 139

Conclusion

With the caveat that this is a single workflow, we can see a 10% runtime improvement and 2.7% and 29% reduction in name resolution failures and timeouts, respectively.

README.md Outdated Show resolved Hide resolved
src/zdns/lookup.go Outdated Show resolved Hide resolved
@phillip-stephens phillip-stephens marked this pull request as ready for review September 20, 2024 14:42
@phillip-stephens phillip-stephens requested a review from a team as a code owner September 20, 2024 14:42
@phillip-stephens phillip-stephens changed the title WIP - Make --retries global to a name and try with other name servers in a given layer Make --retries global to a name and try with other name servers in a given layer Sep 20, 2024
@phillip-stephens
Copy link
Contributor Author

Do not merge, #411 has cropped back up so figuring out a fix.

@phillip-stephens
Copy link
Contributor Author

irctc.co.in is a good test domain for this. It's nameserver, ns2 has two A records that point to IPs that don't respond at least on my network.

$ dig -t A ns2.rcil.gov.in @ns6.registry.in

; <<>> DiG 9.10.6 <<>> -t A ns2.rcil.gov.in @ns6.registry.in
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 515
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 7
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ns2.rcil.gov.in.		IN	A

;; AUTHORITY SECTION:
rcil.gov.in.		86400	IN	NS	ns2.rcil.gov.in.
rcil.gov.in.		86400	IN	NS	ns1.rcil.gov.in.

;; ADDITIONAL SECTION:
ns2.rcil.gov.in.	3600	IN	A	59.89.57.195
ns2.rcil.gov.in.	3600	IN	A	103.127.143.32
ns2.rcil.gov.in.	3600	IN	A	203.153.47.251
ns1.rcil.gov.in.	3600	IN	A	103.127.143.32
ns1.rcil.gov.in.	3600	IN	A	112.133.194.149
ns1.rcil.gov.in.	3600	IN	A	203.153.43.65

;; Query time: 33 msec
;; SERVER: 2001:502:ad09::20#53(2001:502:ad09::20)
;; WHEN: Fri Sep 27 16:14:38 EDT 2024
;; MSG SIZE  rcvd: 172


~ ⌚ 16:14:39
$ ping 59.89.57.195
PING 59.89.57.195 (59.89.57.195): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
^C
--- 59.89.57.195 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss

~ ⌚ 16:14:48
$ ping 103.127.143.32
PING 103.127.143.32 (103.127.143.32): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
^C
--- 103.127.143.32 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss

~ ⌚ 16:14:56
$ ping 203.153.47.251
PING 203.153.47.251 (203.153.47.251): 56 data bytes
64 bytes from 203.153.47.251: icmp_seq=0 ttl=46 time=313.520 ms
64 bytes from 203.153.47.251: icmp_seq=1 ttl=46 time=344.751 ms
^C
--- 203.153.47.251 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 313.520/329.135/344.751/15.615 ms

@zakird zakird merged commit 8f4d21c into main Oct 2, 2024
3 checks passed
@zakird zakird deleted the phillip/93-domain-retries branch October 2, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Top level nameserver failures should result in retries in --iterative or --retries
2 participants