-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go 1.4.2 cannot parse some dns responses from consul #854
Comments
This is a problem for people compiling with I've created a gist to show the exact code we're running in the output below. In Go 1.3.3, the above code outputs the following. Note that the 401 response is correct, I just wanted to show that the DNS resolves. 2015/04/10 17:08:56 (go1.3.3)
2015/04/10 17:08:56 CNAME: metrics-prod-1711139259.us-east-1.elb.amazonaws.com.
2015/04/10 17:08:56 Hosts: [23.23.110.30 54.225.85.31 23.21.91.141 54.235.144.213 50.19.125.98 54.235.110.130 23.23.89.169 54.243.232.250]
2015/04/10 17:08:56 Response: &{401 Unauthorized 401 HTTP/1.1 1 1 map[Content-Type:[application/json;charset=utf-8] Date:[Fri, 10 Apr 2015 17:08:56 GMT] Server:[nginx] Status:[401 Unauthorized] Www-Authenticate:[Basic realm="Librato API"] Content-Length:[49] Connection:[keep-alive]] 0xc2081693c0 49 [] false map[] 0xc20803a4e0 0xc208071f80} In Go 1.4.2, with the same exact code this is output: 2015/04/10 17:09:11 (go1.4.2)
2015/04/10 17:09:11 CNAME: metrics-prod-1711139259.us-east-1.elb.amazonaws.com.
2015/04/10 17:09:11 Host Error: lookup metrics-api.librato.com on 10.1.42.1:53: cannot unmarshal DNS message
2015/04/10 17:09:11 Hosts: []
2015/04/10 17:09:11 HTTP Error: Get https://metrics-api.librato.com/v1/snapshots/1: dial tcp: lookup metrics-api.librato.com on 10.1.42.1:53: cannot unmarshal DNS message
2015/04/10 17:09:11 Response: <nil> Note the error: The same DNS response when using We're using For now, we've found a workaround. It involves setting the Consul config to not recurse (before, it was set to {
"recursor": ""
} And then, when running the docker container, setting the first DNS as Consul (which is advertised on the docker run --dns $(ip -o -4 addr show docker0 | awk -F '[ /]+' '{print $4}') --dns 8.8.8.8 Edit: For anyone who is using the workaround, note that's only good if you don't actually need Consul to recurse through DNS. I would look at #971 if you do need it to recurse (though, while using #971, Consul seems to panic from #1023) |
Tagging as bug, it may just have to do with us sending back a non-compressed DNS response |
@armon Just FYI, we (Docker) just changed the registry dns name to a CNAME which means this bug (or netgo but, see my linked issue) is breaking everyone using docker and consul with recursor right now. |
Okay, I think I found the actual problem: miekg/dns#216 |
Realized that this issue is now being discussed over on #971, I'll link this for context but close this in favor of that issue, which has the latest discussion. |
Actually since that's a PR I'll leave this open until we merge a fix. |
Having this problem too. Using a recursor and trying to ping a host where the DNS record is ~2500 bytes I get:
Setting I think either of these would work:
It seems development on #971 has stopped, and it's probably not the right solution anyways as very large responses (such as the 2.5kB one I have) will still potentially go over the 512 byte limit. |
is there a work around? |
@thebenwaters it's a horrible one, but you can run your container in host network mode I think. |
@hermansc I don't think this makes any difference. But there might be unrelated udp issues with Docker: |
@slackpad Hi. Is there a resolution for this one in the works? |
I just spent the last 8 hours trying to track this crap down, then finally found this. Damnit. |
@slackpad Are these fixes in the master branch? |
@tkambler yes they are in master - please let me know if you still see any issues. DNS compression is opt-out so you shouldn't need to do anything special to enable it with a later build. Earlier today we announced a release candidate build of 0.7 that's not ready for production yet, but has these fixes if you'd like to test using that - https://groups.google.com/d/msg/consul-tool/7KDuvdwNpi0/LSY5LiPnCwAJ. |
Sure enough - swapping out Consul for v0.7rc immediately resolves the problem. |
This can be reproduced by calling
net.LookupHost("metrics-api.librato.com")
with go 1.4.2 using consul as the dns server.After some investigation, we believe the problem is that the udp dns response exceeds the 512 byte limit specified in rfc 1035, and does not set the truncate flag. The go 1.4.2 implementation of the dns client cannot parse such responses. This can be seen by looking at
readDNSResponse
starting on line 41 of net/dnsclient_unix.go. The function uses a fixed size buffer, and will only pass part of the message to the deserialization function. Prior versions of go used a 2000 byte buffer as seen here.Setting the
dns_config.enable_truncate
flag in consul configuration did not resolve this problem. It appears that the flag is only used when sending srv record responses.The frequency of this problem could be lowered by using dns message compression (described in section 4.1.4 of rfc 1035). The response from 8.8.8.8 for the lookup is close to 500 bytes smaller than from consul, and well under the 512 byte limit.
The text was updated successfully, but these errors were encountered: