-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow DNS interface to use agent cache #5300
Conversation
6c7766b
to
78b9350
Compare
@banks I told you about this one :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aestek documentation mainly, but not clear to end user how max_stale VS max_age VS enable cache do work together
I will work the documentation so it's clearer what setting corresponds to what. Maybe the |
@Aestek One thing about the current Agent Caching documentation is that most of it is specific to HTTP including all the various headers that can be passed with the requests. It would however be worthwhile to document the agent cache usage on this page: https://www.consul.io/docs/agent/dns.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. There are a few questions inline.
One other prerequisite to getting this merged would be for all of the RPCs in use to be cacheable. Presumably the reason you didn't already include it was because the Catalog.NodeServices
RPC was not already an implemented cache-type. It should be fairly straight forward implementation that is mostly copy pasted from the existing Catalog.ServiceNodes
cache type.
Lastly after my other comment on the PR about all the configurations I found out that the cache will only perform stale queries so max_stale is superceded by using the cache and our docs should reflect that.
agent/dns.go
Outdated
reply, ok := raw.(*structs.PreparedQueryExecuteResponse) | ||
if !ok { | ||
// This should never happen, but we want to protect against panics | ||
d.logger.Printf("dns: internal error: response type not correct") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should add a "[ERR]" to the front of the message.
agent/dns.go
Outdated
@@ -1201,6 +1256,12 @@ RPC: | |||
ttl, _ = d.GetTTLForService(out.Service) | |||
} | |||
|
|||
// if the result came from cache, substract its age from the ttl | |||
ttl -= age |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aestek & @pierresouchay
I am not sure that the TTL calculations here are what is best/least-surprising.
One thing that initially caught me off guard was that this could potentially set a TTL lower than the remaining cache lifetime of the data.
For example, if you have a service ttl of 30s, a cache max age of 60s and your cached data is 20s old. This will set the TTL to 10s. At first glance you could ask why should a DNS client ask for the same data before the cache would have expired. If the cache hasn't expired and the request is made again it will just use the cached value the same as before. Therefore the TTL could have been something like min(<remaining cache lifetime>, <service ttl - age>)
which would have resulted in a 40s TTL.
There are two reasons why that might not be best.
- Automatic cache refreshing
- The data might be fetched on behalf of the HTTP API with a different max age or requiring revalidation.
For prepared queries, there is no automatic cache refreshing so that doesn't matter right here.
So the question is whether it is better to have the service and node ttls be a max TTL for those resources or whether we should allow the cache lifetime to increase it. There are advantages and disadvantages to both. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting max(<remaining cache lifetime>, <service ttl - age>)
? In your example min(<remaining cache lifetime>, <service ttl - age>)
would still have resulted in a 20s TTL.
While I agree that its useless to have a TTL lower than the next cache refresh duration it seems weird to me to set a TTL higher than the one configured by the user.
When agent DNS cache the TTL settings can be lowered a lot since the requests would not hit so hard on the servers anymore. In our case we might juste use them for rate limiting with values between 1s and 5s.
Maybe it's best not to change the TTL depending on the cache age at all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Sorry, max(<remaining cache lifetime>, <service ttl - age>
However thinking about it some more it might be best the way you originally did it. It may result in more requests but they should be very easy to respond to from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aestek @mkeeler As an DNS admin, I would like to enjoy the feature without having to handle the complexity of implementation. I have the feeling it should be simpler than that.
My feeling is:
TTL
=> stay unchangedmax-age
=max_stale
duration ideally (so, I don't care what your guys are doing of kind optimizations)
Thus, enabling cache would simply be:
dns_config.enable_cache=true
bdcac9b
to
393b268
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more changes if you would. In particular I liked how you split out the cache get or RPC for NodeServices into its own function. It would be great if you could do the same for the other similar code.
Adds to new configuration parameters "dns_config.use_cache" and "dns_config.cache_max_age" controlling how DNS requests use the agent cache when quering servers.
393b268
to
2f8d5cb
Compare
<3 |
Yeah! We are gonna test this at scale next week in our DCs |
Sounds great :) |
Adds to new configuration parameters "dns_config.use_cache" and
"dns_config.cache_max_age" controlling how DNS requests use the agent
cache when quering servers.