Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue when OS' nameserver is loopback with --iterative #415

Merged
merged 15 commits into from
Aug 9, 2024

Conversation

phillip-stephens
Copy link
Contributor

@phillip-stephens phillip-stephens commented Aug 7, 2024

When #412 was merged, it removed the assumption we had that with --iterative mode, the root servers were hard-coded as the 13 global root servers. #412 added the ability for a user to specify --name-servers in --iterative mode to test iteration against a specified name server. Thus, the assumption was broken.

This meant that with main, if the OS configured resolver was loopback, iterative lookups would fail since we were setting the root-servers as Cloudflare/Google here.

➜  zdns git:(main) ✗ head -n 1 benchmark/10k_crux_top_domains.input| ./zdns A --iterative --verbosity=5
DEBU[0000] OS external resolution nameservers are loopback and iterative mode is enabled. Using default non-loopback nameservers to prevent resolution failure edge case
INFO[0000] using local addresses: [137.184.54.104]
INFO[0000] for non-iterative lookups, using external nameservers: 8.8.8.8:53, 8.8.4.4:53, 1.1.1.1:53, 1.0.0.1:53
INFO[0000] for iterative lookups, using nameservers: 8.8.8.8:53, 8.8.4.4:53, 1.1.1.1:53, 1.0.0.1:53
...
{"data":{"additionals":[{"flags":"","type":"EDNS0","udpsize":1232,"version":0}],"protocol":"udp","resolver":"1.1.1.1:53"},"duration":0.039246057,"name":"m.fabswingers.com","status":"SERVFAIL","timestamp":"2024-08-07T22:07:28Z"}

Other Changes

  • Added a lock in ResolverConfig since each Resolver attempts to populateAndValidate the config at once in worker_manager.go, this lead to some race conditions easily prevented by locking before validation. This lock is only retrieved during Resolver setup, so there shouldn't be performance degradation post initial setup.
  • Removed now un-used function

Testing

After having this issue crop up and working thru edge cases with Phillip/ipv6, I tried to be more systematic with my testing for this.

Sorry for formatting, Github is not letting us use all available screen real estate here 😢

Variables

  • A - /etc/resolv.conf using loopback nameservers
  • B - user-provided nameservers (T/F)
    • Option 1 - user provided loopback nameservers
    • Option 2user provided non-loopback nameservers
  • C - --iterative mode used or not
  • D - --localAddr specified
    • Option 1 - user provided loopback addr
    • Option 2 - user provided non-loopback addr
      2 * 2 * 3 * 3 = 36 possibilities
A B C D Command Expected Outcome Tested Outcome
F F T F ./zdns A --iterative Iteration should start from a default root server with the non-loopback NS, SUCCESS PASS
F F T Opt 1 ./zdns A --iterative --local-addr="127.0.0.1"
Should fail config validation, no nameservers reachable PASS
F F T Opt 2 ./zdns A --iterative --local-addr="192.168.1.243"
Iteration should start from a default root server with the non-loopback local-addr, SUCCESS PASS
F F F F ./zdns A External query to whatever NS was set in /etc/resolv.conf PASS
F F F Opt 1 ./zdns A --local-addr="127.0.0.1" Should fail config validation, no nameservers reachable PASS
F F F Opt 2 ./zdns A --local-addr="192.168.1.243" External query to whatever NS was set in /etc/resolv.conf PASS
F Opt 1 T F ./zdns A --iterative --name-servers="127.0.0.53" Pass config validation, iterative lookup attempted using local addr loopback, to NS loopback PASS
F Opt 1 T Opt 1 ./zdns A --iterative --name-servers="127.0.0.53" --local-addr="127.0.0.1" Pass config validation, iterative lookup attempted using local addr loopback, to NS loopback PASS
F Opt 1 T Opt 2 ./zdns A --iterative --name-servers="127.0.0.53" --local-addr="192.168.1.243" Config does not pass validation PASS
F Opt 1 F F ./zdns A --name-servers="127.0.0.53" Pass config validation, external lookup attempted using local addr loopback, to NS loopback PASS
F Opt 1 F Opt 1 ./zdns A --name-servers="127.0.0.53" --local-addr="127.0.0.1" Pass config validation, external lookup attempted using local addr loopback, to NS loopback PASS
F Opt 1 F Opt 2 ./zdns A --name-servers="127.0.0.53" --local-addr="192.168.1.243" Config does not pass validation PASS
F Opt 2 T F ./zdns A --iterative --name-servers="198.41.0.4" Attempt iterative lookup to a.root-server.net using a non-loopback local addr PASS
F Opt 2 T Opt 1 ./zdns A --iterative --name-servers="198.41.0.4" --local-addr="127.0.0.1" Fail config validation, loopback mismatch PASS
F Opt 2 T Opt 2 ./zdns A --iterative --name-servers="198.41.0.4" --local-addr="192.168.1.243" Attempt iterative lookup to a.root-server.net using a non-loopback local addr PASS
F Opt 2 F F ./zdns A --name-servers="198.41.0.4" Attempt external lookup to a.root-server.net using a non-loopback local addr PASS
F Opt 2 F Opt 1 ./zdns A --name-servers="198.41.0.4" --local-addr="127.0.0.1" Fail config validation, loopback mismatch PASS
F Opt 2 F Opt 2 ./zdns A --name-servers="198.41.0.4" --local-addr="192.168.1.243" Attempt external lookup to a.root-server.net using a non-loopback local addr PASS
T F T F ./zdns A --iterative Iteration should start from a default root server with a non-loopback local addr, SUCCESS PASS
T F T Opt 1 ./zdns A --iterative --local-addr="127.0.0.1"
Fail config validation PASS
T F T Opt 2 ./zdns A --iterative --local-addr="192.168.1.243"
Iteration should start from a default root server with the non-loopback local addr, SUCCESS PASS
T F F F ./zdns A External query to whatever NS was set in /etc/resolv.conf using loopback local addr PASS
T F F Opt 1 ./zdns A --local-addr="127.0.0.1" Attempt external lookup against local NS with loopback local addr PASS
T F F Opt 2 ./zdns A --local-addr="192.168.1.243" Fail config validation PASS
T Opt 1 T F ./zdns A --iterative --name-servers="127.0.0.53" Pass config validation, iterative lookup attempted using local addr loopback, to NS loopback PASS
T Opt 1 T Opt 1 ./zdns A --iterative --name-servers="127.0.0.53" --local-addr="127.0.0.1" Pass config validation, iterative lookup attempted using local addr loopback, to NS loopback PASS
T Opt 1 T Opt 2 ./zdns A --iterative --name-servers="127.0.0.53" --local-addr="192.168.1.243" Fail config valiation PASS
T Opt 1 F F ./zdns A --name-servers="127.0.0.53" Pass config validation, external lookup attempted using local addr loopback, to NS loopback PASS
T Opt 1 F Opt 1 ./zdns A --name-servers="127.0.0.53" --local-addr="127.0.0.1" Pass config validation, external lookup attempted using local addr loopback, to NS loopback PASS
T Opt 1 F Opt 2 ./zdns A --name-servers="127.0.0.53" --local-addr="192.168.1.243" Fail config validation PASS
T Opt 2 T F ./zdns A --iterative --name-servers="198.41.0.4" Attempt iterative lookup to a.root-server.net using a non-loopback local addr PASS
T Opt 2 T Opt 1 ./zdns A --iterative --name-servers="198.41.0.4" --local-addr="127.0.0.1" Fail config validation, loopback mismatch PASS
T Opt 2 T Opt 2 ./zdns A --iterative --name-servers="198.41.0.4" --local-addr="192.168.1.243" Attempt iterative lookup to a.root-server.net using a non-loopback local addr PASS
T Opt 2 F F ./zdns A --name-servers="198.41.0.4" Attempt external lookup to a.root-server.net using a non-loopback local addr PASS
T Opt 2 F Opt 1 ./zdns A --name-servers="198.41.0.4" --local-addr="127.0.0.1" Fail config validation, loopback mismatch PASS
T Opt 2 F Opt 2 ./zdns A --name-servers="198.41.0.4" --local-addr="192.168.1.243" Attempt external lookup to a.root-server.net using a non-loopback local addr PASS

@phillip-stephens phillip-stephens marked this pull request as ready for review August 8, 2024 14:25
@phillip-stephens phillip-stephens requested a review from a team as a code owner August 8, 2024 14:25
@phillip-stephens phillip-stephens requested a review from zakird August 8, 2024 14:25
@@ -86,6 +88,10 @@ type ResolverConfig struct {

// PopulateAndValidate checks if the ResolverConfig is valid and populates any missing fields with default values.
func (rc *ResolverConfig) PopulateAndValidate() error {
// when creating multiple resolvers, each will call this function against the same config.
rc.Lock()
Copy link
Member

@zakird zakird Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this needs to be locked. Shouldn't anything for the shared thing be done once ahead of time? Or simply, copy and use. It feels weird to me that a downstream thing is changing the config.

Copy link
Contributor Author

@phillip-stephens phillip-stephens Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is. The CLI calls config.PopulateAndValidate once and no lock is required from the CLI's point-of-view

However, since someone can use the library without the CLI, inside InitResolver we also check that the config is populated with defaults and is valid here. When used with the CLI, this is a redundant check. The initialization of workers happens in a multi-threaded fashion here, so the lock is needed for that case when many threads are checking the config at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if multiple threads are looking at the config at once, I'm not sure I understand why they should have any need to modify the config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, and as an ideal it should be idempotent when the config is populated/valid to run again.

The issue I saw was around LocalAddrs. A single worker would complain that after validation, the localAddrs was empty and it would panic.

I think the issue is here where we checked for IPv6 local addresses (since we don't support IPv6 in main) and replaced the IPv4 local addresses with all local addresses that pass validation. When called on a valid config, this is just going to replace the localAddrs array with an equivalent one, but it's still a write to a shared data structure.

My thinking was I could either try to ensure that a valid config would never be modified at all (even to replace a config field with an equal value as I think is happening here) to ensure idempotency in multi-threaded scenarios, or I just add a lock.

Figured the lock was easier.

Copy link
Member

@zakird zakird Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think structurally though this isn't really what we want. If the receiver gets a bad config, then it should error. We should probably validate the config before we pass it in if we're the ones who populated it. I don't think it should try to modify the conf downstream, this just seems like it's going to set us up for confusion.

The specific IPv4/IPv6 error described here sounds more like an architectural issue we need to address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very fair. It's been rather non-trivial to do both

  • populate missing values
  • verify config is valid
    in the same function correctly and without causing issues in some edge case.

Splitting the concerns into CLI/out-side caller populates the ResolverConfig and there's a ValidateConfig function for InitResolver to use to be sure the config is valid seems cleaner and we don't have this weirdness.

I'll make the change.

@phillip-stephens phillip-stephens requested a review from zakird August 9, 2024 15:13
@zakird zakird merged commit 6758e08 into main Aug 9, 2024
3 checks passed
@zakird zakird deleted the phillip/fix-nameserver-population branch August 9, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants