Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update in-memory trace cache to use LRU instead of ring buffer #1359

Merged
merged 17 commits into from
Oct 22, 2024

Conversation

MikeGoldsmith
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith commented Oct 3, 2024

Which problem is this PR solving?

Updates the InMemCache cache implementation to use uses a hasicorp/golang-lru cache to order traces. The cache checks if the capacity is set to the default cache size (10000), and replaces the capacity with math.MaxInt32 instead.

This means the cache will only contain active traces and consume only the required bytes as expired because old traces are removed instead of lingering in the buffer. The collector regularly checks the Refinery process's memory usage, and will shed old traces if over the configured allocation then manually called GC to clean up used resources.

Because Refinery applies a consistent trace timeout, the LRU enables us to efficiently retrieve the oldest trace when evicting expired traces unlike the ring buffer than required us to loop over the entire buffer when searching for expired traces or removing sent traces.

Below is a comparison of the cache before and after the update

Short description of the changes

  • Add new trace cache, backed by an LRU cache
  • Update benchmarks to compare current and new cache implementations for various actions (Set, Get, Remove, etc)

Before

BenchmarkCache_Set/InMemCache-10           	   	1000000000	         0.0000008 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_Get/InMemCache-10          	    	1000000000	         0.0000004 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_RemoveTraces/InMemCache-10              	1000000000	         0.0000335 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_TakeExpiredTraces/InMemCache-10         	   18817	     63078 ns/op	      50 B/op	       1 allocs/op

After

BenchmarkCache_Set/TraceCache-10           	   	1000000000	         0.0000004 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_Get/TraceCache-10          	    	1000000000	         0.0000003 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_RemoveTraces/TraceCache-10              	1000000000	         0.0000015 ns/op	       0 B/op	       0 allocs/op
BenchmarkCache_TakeExpiredTraces/TraceCache-10         	27758473	        45.23 ns/op	      41 B/op	       0 allocs/op

Note the updated cache is considerable faster removing traces (x10 faster) and removing expired traces (> x1000).

@MikeGoldsmith MikeGoldsmith added the type: enhancement New feature or request label Oct 3, 2024
@MikeGoldsmith MikeGoldsmith self-assigned this Oct 3, 2024
@MikeGoldsmith MikeGoldsmith requested a review from a team as a code owner October 3, 2024 12:56
@MikeGoldsmith MikeGoldsmith added this to the v2.9 milestone Oct 3, 2024
@kentquirk
Copy link
Contributor

I haven't even looked at the implementation, but why do we even want an LRU any more? There's no reason to limit the number of traces in the cache; the size of the trace cache is basically a rounding error in the total memory usage.

You weren't in the conversation, but when I talked to Yingrong about this, we talked about simply letting the cache size adjust as necessary. It's the total memory usage that actually matters.

@MikeGoldsmith MikeGoldsmith marked this pull request as draft October 3, 2024 15:47
@MikeGoldsmith MikeGoldsmith changed the title feat: Add alternative LRU trace cache feat: Add alternative trace cache implementation Oct 3, 2024
collect/cache/cache.go Outdated Show resolved Hide resolved
collect/cache/cache.go Outdated Show resolved Hide resolved
collect/cache/cache.go Outdated Show resolved Hide resolved
@MikeGoldsmith MikeGoldsmith changed the title feat: Add alternative trace cache implementation feat: Update in-memory trace cache to use LRU instead of ring buffer Oct 16, 2024
@MikeGoldsmith MikeGoldsmith marked this pull request as ready for review October 18, 2024 12:08
collect/cache/cache.go Outdated Show resolved Hide resolved
@@ -194,7 +194,7 @@ func TestOriginalSampleRateIsNotedInMetaField(t *testing.T) {
GetTracesConfigVal: config.TracesConfig{
SendTicker: config.Duration(2 * time.Millisecond),
SendDelay: config.Duration(1 * time.Millisecond),
TraceTimeout: config.Duration(60 * time.Second),
TraceTimeout: config.Duration(1 * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we change this to 1 second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-sampled event is added and we only waits 5 seconds for it to appear in the transmission queue. The cache will only expire and send the events once the trace timeout is reached.

I'm not sure how / why it currently works in main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's because the cache size is set to 3 in the cache.NewInMemCache(). Once the cache is full, traces will be ejected from the cache when a new span comes in and put into the transmission queue.

@MikeGoldsmith MikeGoldsmith merged commit 8829dec into main Oct 22, 2024
5 checks passed
@MikeGoldsmith MikeGoldsmith deleted the mike/cache-rework branch October 22, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisiting trace cache strategy
3 participants