feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

MikeGoldsmith · 2024-10-03T12:56:16Z

Which problem is this PR solving?

Updates the InMemCache cache implementation to use uses a hasicorp/golang-lru cache to order traces. The cache checks if the capacity is set to the default cache size (10000), and replaces the capacity with math.MaxInt32 instead.

This means the cache will only contain active traces and consume only the required bytes as expired because old traces are removed instead of lingering in the buffer. The collector regularly checks the Refinery process's memory usage, and will shed old traces if over the configured allocation then manually called GC to clean up used resources.

Because Refinery applies a consistent trace timeout, the LRU enables us to efficiently retrieve the oldest trace when evicting expired traces unlike the ring buffer than required us to loop over the entire buffer when searching for expired traces or removing sent traces.

Below is a comparison of the cache before and after the update

Closes Revisiting trace cache strategy #1339

Short description of the changes

Add new trace cache, backed by an LRU cache
Update benchmarks to compare current and new cache implementations for various actions (Set, Get, Remove, etc)

Before

BenchmarkCache_Set/InMemCache-10           	   	1000000000	         0.0000008 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_Get/InMemCache-10          	    	1000000000	         0.0000004 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_RemoveTraces/InMemCache-10              	1000000000	         0.0000335 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_TakeExpiredTraces/InMemCache-10         	   18817	     63078 ns/op	      50 B/op	       1 allocs/op

After

BenchmarkCache_Set/TraceCache-10           	   	1000000000	         0.0000004 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_Get/TraceCache-10          	    	1000000000	         0.0000003 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_RemoveTraces/TraceCache-10              	1000000000	         0.0000015 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_TakeExpiredTraces/TraceCache-10         	27758473	        45.23 ns/op	      41 B/op	       0 allocs/op

Note the updated cache is considerable faster removing traces (x10 faster) and removing expired traces (> x1000).

kentquirk · 2024-10-03T13:32:29Z

I haven't even looked at the implementation, but why do we even want an LRU any more? There's no reason to limit the number of traces in the cache; the size of the trace cache is basically a rounding error in the total memory usage.

You weren't in the conversation, but when I talked to Yingrong about this, we talked about simply letting the cache size adjust as necessary. It's the total memory usage that actually matters.

collect/cache/cache.go

VinozzZ · 2024-10-18T20:55:30Z

collect/collect_test.go

@@ -194,7 +194,7 @@ func TestOriginalSampleRateIsNotedInMetaField(t *testing.T) {
 		GetTracesConfigVal: config.TracesConfig{
 			SendTicker:   config.Duration(2 * time.Millisecond),
 			SendDelay:    config.Duration(1 * time.Millisecond),
-			TraceTimeout: config.Duration(60 * time.Second),
+			TraceTimeout: config.Duration(1 * time.Second),


Why do we change this to 1 second?

The non-sampled event is added and we only waits 5 seconds for it to appear in the transmission queue. The cache will only expire and send the events once the trace timeout is reached.

I'm not sure how / why it currently works in main.

Oh, it's because the cache size is set to 3 in the cache.NewInMemCache(). Once the cache is full, traces will be ejected from the cache when a new span comes in and put into the transmission queue.

feat: Add alternative LRU trace cache

db64f64

MikeGoldsmith added the type: enhancement New feature or request label Oct 3, 2024

MikeGoldsmith self-assigned this Oct 3, 2024

MikeGoldsmith requested a review from a team as a code owner October 3, 2024 12:56

MikeGoldsmith added this to the v2.9 milestone Oct 3, 2024

MikeGoldsmith added 3 commits October 3, 2024 13:56

Merge branch 'main' into mike/cache-rework

6a620e7

assign metrics and logger fields when creating lru cache

b7fd542

clean up lru cache function descriptions

9376771

try bigcache instead of lru

61aab62

MikeGoldsmith marked this pull request as draft October 3, 2024 15:47

MikeGoldsmith changed the title ~~feat: Add alternative LRU trace cache~~ feat: Add alternative trace cache implementation Oct 3, 2024

MikeGoldsmith added 8 commits October 3, 2024 16:50

clean up

e6451d4

add used bytes to cache interface

ece6167

try a pool backed cache

b8ef02d

remove GetUsedBytes from cache

39465c7

remove bigcache

3d1ad39

remove sync pool, use math.MaxInt32 as cache size, add metrics

81cb6f2

Merge branch 'main' into mike/cache-rework

2722e9b

update benchmark test names

7a8d1fc

VinozzZ reviewed Oct 15, 2024

View reviewed changes

collect/cache/cache.go Outdated Show resolved Hide resolved

collect/cache/cache.go Outdated Show resolved Hide resolved

collect/cache/cache.go Outdated Show resolved Hide resolved

merge default and new trace cache implementations

f856177

MikeGoldsmith force-pushed the mike/cache-rework branch from f113a49 to f856177 Compare October 16, 2024 19:02

MikeGoldsmith changed the title ~~feat: Add alternative trace cache implementation~~ feat: Update in-memory trace cache to use LRU instead of ring buffer Oct 16, 2024

Merge branch 'main' into mike/cache-rework

d16f386

MikeGoldsmith marked this pull request as ready for review October 18, 2024 12:08

VinozzZ requested changes Oct 18, 2024

View reviewed changes

don’t use configuration for cache capacity

06023d0

VinozzZ approved these changes Oct 21, 2024

View reviewed changes

Merge branch 'main' into mike/cache-rework

ff1afbb

MikeGoldsmith merged commit 8829dec into main Oct 22, 2024
5 checks passed

MikeGoldsmith deleted the mike/cache-rework branch October 22, 2024 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

MikeGoldsmith commented Oct 3, 2024 •

edited

Loading

kentquirk commented Oct 3, 2024

VinozzZ Oct 18, 2024

MikeGoldsmith Oct 21, 2024

VinozzZ Oct 21, 2024

feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

Conversation

MikeGoldsmith commented Oct 3, 2024 • edited Loading

Which problem is this PR solving?

Short description of the changes

kentquirk commented Oct 3, 2024

VinozzZ Oct 18, 2024

Choose a reason for hiding this comment

MikeGoldsmith Oct 21, 2024

Choose a reason for hiding this comment

VinozzZ Oct 21, 2024

Choose a reason for hiding this comment

MikeGoldsmith commented Oct 3, 2024 •

edited

Loading