-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add random replacement mapper cache #281
Conversation
07b8584
to
c6dc60b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of different caching strategies is weak – when would I use the random replacement cache? Would it be enough to sufficiently size the LRU cache instead, or are there work loads where an RR cache is always better?
The mapper package is accumulating a lot of responsibilities, and consequently a lot of mode switching. I would prefer to break out all the caches into a separate "mapping cache" package, to be composed by the user / in main.go. This could be either by injecting the dependency later:
c := cache.NewLRUCache(*cacheSize)
mapper.InitFromFile(*mappingConfig, c)
or (this feels cleaner to me) we define an interface for metric mappers, have the various caches implement it, and wrap the pure mapper:
metricMapper := mapper.InitFromFile(*mappingConfig)
cachedMetricMapper := cache.NewLRUCache(metricMapper, *cacheSize)
What do you think about breaking up the package in this way?
cacheType string | ||
} | ||
|
||
type CacheOption func(*cacheOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the benefit of this construction over, say, passing explicit separate parameters to the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By passing it in as optional arguments, then callers do not need to change their calls. Within statsd_exporter it's not a big deal to change all the call sites, but I know the mapper package is used by a few other projects, so they could continue to work when they updated their dependencies.
If memory resources are limited for the statsd exporter, the LRU cache will provide a higher hit-rate as it's keeping the hottest mappings. However, with high throughput and lots of metrics, it can become a bottleneck due to the write lock needed for all operations. If memory resources are not limited and throughput is very high, the random replacement cache will outperform the LRU cache if the size is big enough to hold all the mappings since a write lock is only needed when adding new mappings. |
thank you for the explanation! it makes sense to have both options then. |
@matthiasr I'll fix the conflicts. I'm okay to break up the packages. Are we worried about compatibility for others who may have imported the mapper package? I could add a shim to ensure the old calls in mapper worked by calling into mapper/cache or something like that. |
No, I'm not very worried about it. We have versions, and as long as the
upgrade path is clear and easy I would rather have a clean API
…On Sat, 11 Jan 2020, 02:07 Brian Akins, ***@***.***> wrote:
@matthiasr <https://github.com/matthiasr> I'll fix the conflicts. I'm
okay to break up the packages. Are we worried about compatibility for
others who may have imported the mapper package? I could add a shim to
ensure the old calls in mapper worked by calling into mapper/cache or
something like that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#281?email_source=notifications&email_token=AABAEBSSITN4LVELL3ORERTQ5ELWTA5CNFSM4KCE5R62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIVUXWQ#issuecomment-573262810>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAEBWDJ6EPHBEPEBXB643Q5ELWTANCNFSM4KCE5R6Q>
.
|
@bakins are you still willing to work on this? If the desire to break out caching from mapping is holding this up, I'm also willing to merge this as is (after resolving the conflicts) and defer the refactoring. |
@matthiasr I'm okay either way. I'll fix the conflicts and then we can decide. Whichever way is easier for you. |
Signed-off-by: bakins <[email protected]>
c6dc60b
to
90e247b
Compare
Let's get it in like this and do the refactoring separately. Thanks a lot! |
explain the tradeoffs for the cache strategies based on this comment: #281 (comment) Signed-off-by: Matthias Rampke <[email protected]>
Add a simple random replacement cache based on a map.
Add command line flag to choose mapper cache type, defaulting to existing lru cache.
In the case that all entries fit into the cache, this new cache is faster as it uses a read lock for gets and also has less bookkeeping than the lru cache.
In the case that all entries do not fit into cache, the random replacement cache can avoid thrashing the cache. In some cases, the lru cache will have a very low hit ratio as it is constantly evicting entries. Our use case is a huge number of unique metric names in "old" statsd format that we map to relatively few Prometheus metrics.
Note: I realize this is probably a much larger change than we may want all at once, but wanted to get feedback on the approach, etc.
Benchmark results: