Add random replacement mapper cache #281

bakins · 2020-01-02T17:51:22Z

Add a simple random replacement cache based on a map.

Add command line flag to choose mapper cache type, defaulting to existing lru cache.

In the case that all entries fit into the cache, this new cache is faster as it uses a read lock for gets and also has less bookkeeping than the lru cache.

In the case that all entries do not fit into cache, the random replacement cache can avoid thrashing the cache. In some cases, the lru cache will have a very low hit ratio as it is constantly evicting entries. Our use case is a huge number of unique metric names in "old" statsd format that we map to relatively few Prometheus metrics.

Note: I realize this is probably a much larger change than we may want all at once, but wanted to get feedback on the approach, etc.

Benchmark results:

go test -bench='.*Cache.*' -benchtime=10s -benchmem ./pkg/mapper/

pkg: github.com/prometheus/statsd_exporter/pkg/mapper
BenchmarkGlob10RulesCached/lru-12                           	76762423	       155 ns/op	      48 B/op	       2 allocs/op
BenchmarkGlob10RulesCached/random-12                        	153020101	        78.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkRegex10RulesAverageCached/lru-12                   	78939664	       150 ns/op	      48 B/op	       2 allocs/op
BenchmarkRegex10RulesAverageCached/random-12                	153228613	        78.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGlob100RulesCached/lru-12                          	78150949	       153 ns/op	      48 B/op	       2 allocs/op
BenchmarkGlob100RulesCached/random-12                       	151522714	        78.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGlob100RulesMultipleCapturesCached/lru-12          	76434423	       158 ns/op	      64 B/op	       2 allocs/op
BenchmarkGlob100RulesMultipleCapturesCached/random-12       	100000000	       105 ns/op	      48 B/op	       1 allocs/op
BenchmarkRegex100RulesMultipleCapturesWorstCached/lru-12    	75379135	       159 ns/op	      64 B/op	       2 allocs/op
BenchmarkRegex100RulesMultipleCapturesWorstCached/random-12 	100000000	       109 ns/op	      48 B/op	       1 allocs/op
BenchmarkGlob100RulesCached100Metrics/lru-12                	  643710	     18510 ns/op	    6240 B/op	     200 allocs/op
BenchmarkGlob100RulesCached100Metrics/random-12             	 1000000	     10976 ns/op	    4320 B/op	      90 allocs/op
BenchmarkGlob100RulesCached100MetricsSmallCache/lru-12      	   76018	    167613 ns/op	   28502 B/op	    1000 allocs/op
BenchmarkGlob100RulesCached100MetricsSmallCache/random-12   	  128126	     96533 ns/op	   13483 B/op	     446 allocs/op
BenchmarkGlob100RulesCached100MetricsRandomSmallCache/lru-12         	   10000	   1154307 ns/op	  183750 B/op	    6361 allocs/op
BenchmarkGlob100RulesCached100MetricsRandomSmallCache/random-12      	   16196	    731579 ns/op	  108832 B/op	    3478 allocs/op
PASS
ok  	github.com/prometheus/statsd_exporter/pkg/mapper	224.196s

matthiasr

My understanding of different caching strategies is weak – when would I use the random replacement cache? Would it be enough to sufficiently size the LRU cache instead, or are there work loads where an RR cache is always better?

The mapper package is accumulating a lot of responsibilities, and consequently a lot of mode switching. I would prefer to break out all the caches into a separate "mapping cache" package, to be composed by the user / in main.go. This could be either by injecting the dependency later:

c := cache.NewLRUCache(*cacheSize)
mapper.InitFromFile(*mappingConfig, c)

or (this feels cleaner to me) we define an interface for metric mappers, have the various caches implement it, and wrap the pure mapper:

metricMapper := mapper.InitFromFile(*mappingConfig)
cachedMetricMapper := cache.NewLRUCache(metricMapper, *cacheSize)

What do you think about breaking up the package in this way?

matthiasr · 2020-01-10T14:18:24Z

pkg/mapper/mapper_cache.go

+	cacheType string
+}
+
+type CacheOption func(*cacheOptions)


what is the benefit of this construction over, say, passing explicit separate parameters to the constructor?

By passing it in as optional arguments, then callers do not need to change their calls. Within statsd_exporter it's not a big deal to change all the call sites, but I know the mapper package is used by a few other projects, so they could continue to work when they updated their dependencies.

glightfoot · 2020-01-10T16:22:01Z

If memory resources are limited for the statsd exporter, the LRU cache will provide a higher hit-rate as it's keeping the hottest mappings. However, with high throughput and lots of metrics, it can become a bottleneck due to the write lock needed for all operations.

If memory resources are not limited and throughput is very high, the random replacement cache will outperform the LRU cache if the size is big enough to hold all the mappings since a write lock is only needed when adding new mappings.

matthiasr · 2020-01-10T16:56:40Z

thank you for the explanation! it makes sense to have both options then.

bakins · 2020-01-11T01:07:53Z

@matthiasr I'll fix the conflicts. I'm okay to break up the packages. Are we worried about compatibility for others who may have imported the mapper package? I could add a shim to ensure the old calls in mapper worked by calling into mapper/cache or something like that.

matthiasr · 2020-01-11T11:08:09Z

No, I'm not very worried about it. We have versions, and as long as the upgrade path is clear and easy I would rather have a clean API

…

On Sat, 11 Jan 2020, 02:07 Brian Akins, ***@***.***> wrote: @matthiasr <https://github.com/matthiasr> I'll fix the conflicts. I'm okay to break up the packages. Are we worried about compatibility for others who may have imported the mapper package? I could add a shim to ensure the old calls in mapper worked by calling into mapper/cache or something like that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#281?email_source=notifications&email_token=AABAEBSSITN4LVELL3ORERTQ5ELWTA5CNFSM4KCE5R62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIVUXWQ#issuecomment-573262810>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABAEBWDJ6EPHBEPEBXB643Q5ELWTANCNFSM4KCE5R6Q> .

matthiasr · 2020-02-14T12:50:44Z

@bakins are you still willing to work on this? If the desire to break out caching from mapping is holding this up, I'm also willing to merge this as is (after resolving the conflicts) and defer the refactoring.

bakins · 2020-03-04T17:01:03Z

@matthiasr I'm okay either way. I'll fix the conflicts and then we can decide. Whichever way is easier for you.

Signed-off-by: bakins <[email protected]>

matthiasr · 2020-03-05T09:25:11Z

Let's get it in like this and do the refactoring separately. Thanks a lot!

explain the tradeoffs for the cache strategies based on this comment: #281 (comment) Signed-off-by: Matthias Rampke <[email protected]>

bakins force-pushed the random-replacement branch from 07b8584 to c6dc60b Compare January 2, 2020 17:53

matthiasr reviewed Jan 10, 2020

View reviewed changes

matthiasr mentioned this pull request Feb 18, 2020

Setup issues prometheus/graphite_exporter#117

Closed

Add random replacement cache

90e247b

Signed-off-by: bakins <[email protected]>

bakins force-pushed the random-replacement branch from c6dc60b to 90e247b Compare March 4, 2020 17:28

matthiasr merged commit 60fbaf5 into prometheus:master Mar 5, 2020

matthiasr pushed a commit that referenced this pull request Mar 5, 2020

Add changelog & readme entries for #281

2ad8989

explain the tradeoffs for the cache strategies based on this comment: #281 (comment) Signed-off-by: Matthias Rampke <[email protected]>

matthiasr mentioned this pull request Mar 5, 2020

Factor out mapper caches into their own package #295

Closed

bakins deleted the random-replacement branch March 18, 2020 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add random replacement mapper cache #281

Add random replacement mapper cache #281

bakins commented Jan 2, 2020

matthiasr left a comment

matthiasr Jan 10, 2020

bakins Jan 11, 2020

glightfoot commented Jan 10, 2020

matthiasr commented Jan 10, 2020

bakins commented Jan 11, 2020

matthiasr commented Jan 11, 2020 via email

matthiasr commented Feb 14, 2020

bakins commented Mar 4, 2020

matthiasr commented Mar 5, 2020

Add random replacement mapper cache #281

Add random replacement mapper cache #281

Conversation

bakins commented Jan 2, 2020

matthiasr left a comment

Choose a reason for hiding this comment

matthiasr Jan 10, 2020

Choose a reason for hiding this comment

bakins Jan 11, 2020

Choose a reason for hiding this comment

glightfoot commented Jan 10, 2020

matthiasr commented Jan 10, 2020

bakins commented Jan 11, 2020

matthiasr commented Jan 11, 2020 via email

matthiasr commented Feb 14, 2020

bakins commented Mar 4, 2020

matthiasr commented Mar 5, 2020