-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix issue when OS' nameserver is loopback with --iterative #415
Conversation
src/zdns/resolver.go
Outdated
@@ -86,6 +88,10 @@ type ResolverConfig struct { | |||
|
|||
// PopulateAndValidate checks if the ResolverConfig is valid and populates any missing fields with default values. | |||
func (rc *ResolverConfig) PopulateAndValidate() error { | |||
// when creating multiple resolvers, each will call this function against the same config. | |||
rc.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this needs to be locked. Shouldn't anything for the shared thing be done once ahead of time? Or simply, copy and use. It feels weird to me that a downstream thing is changing the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it is. The CLI calls config.PopulateAndValidate
once and no lock is required from the CLI's point-of-view
However, since someone can use the library without the CLI, inside InitResolver
we also check that the config is populated with defaults and is valid here. When used with the CLI, this is a redundant check. The initialization of workers happens in a multi-threaded fashion here, so the lock is needed for that case when many threads are checking the config at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if multiple threads are looking at the config at once, I'm not sure I understand why they should have any need to modify the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, and as an ideal it should be idempotent when the config is populated/valid to run again.
The issue I saw was around LocalAddrs
. A single worker would complain that after validation, the localAddrs
was empty and it would panic.
I think the issue is here where we checked for IPv6 local addresses (since we don't support IPv6 in main
) and replaced the IPv4 local addresses with all local addresses that pass validation. When called on a valid config, this is just going to replace the localAddrs
array with an equivalent one, but it's still a write to a shared data structure.
My thinking was I could either try to ensure that a valid config would never be modified at all (even to replace a config field with an equal value as I think is happening here) to ensure idempotency in multi-threaded scenarios, or I just add a lock.
Figured the lock was easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think structurally though this isn't really what we want. If the receiver gets a bad config, then it should error. We should probably validate the config before we pass it in if we're the ones who populated it. I don't think it should try to modify the conf downstream, this just seems like it's going to set us up for confusion.
The specific IPv4/IPv6 error described here sounds more like an architectural issue we need to address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair. It's been rather non-trivial to do both
- populate missing values
- verify config is valid
in the same function correctly and without causing issues in some edge case.
Splitting the concerns into CLI/out-side caller populates the ResolverConfig and there's a ValidateConfig
function for InitResolver
to use to be sure the config is valid seems cleaner and we don't have this weirdness.
I'll make the change.
…le seperation of concerns
When #412 was merged, it removed the assumption we had that with
--iterative
mode, the root servers were hard-coded as the 13 global root servers. #412 added the ability for a user to specify--name-servers
in--iterative
mode to test iteration against a specified name server. Thus, the assumption was broken.This meant that with
main
, if the OS configured resolver was loopback, iterative lookups would fail since we were setting theroot-servers
as Cloudflare/Google here.Other Changes
ResolverConfig
since each Resolver attempts to populateAndValidate the config at once inworker_manager.go
, this lead to some race conditions easily prevented by locking before validation. This lock is only retrieved during Resolver setup, so there shouldn't be performance degradation post initial setup.Testing
After having this issue crop up and working thru edge cases with
Phillip/ipv6
, I tried to be more systematic with my testing for this.Sorry for formatting, Github is not letting us use all available screen real estate here 😢
Variables
/etc/resolv.conf
using loopback nameservers--iterative
mode used or not--localAddr
specified2 * 2 * 3 * 3 = 36 possibilities
/etc/resolv.conf
/etc/resolv.conf
/etc/resolv.conf
using loopback local addr