-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Update in-memory trace cache to use LRU instead of ring buffer #1359
Conversation
I haven't even looked at the implementation, but why do we even want an LRU any more? There's no reason to limit the number of traces in the cache; the size of the trace cache is basically a rounding error in the total memory usage. You weren't in the conversation, but when I talked to Yingrong about this, we talked about simply letting the cache size adjust as necessary. It's the total memory usage that actually matters. |
f113a49
to
f856177
Compare
@@ -194,7 +194,7 @@ func TestOriginalSampleRateIsNotedInMetaField(t *testing.T) { | |||
GetTracesConfigVal: config.TracesConfig{ | |||
SendTicker: config.Duration(2 * time.Millisecond), | |||
SendDelay: config.Duration(1 * time.Millisecond), | |||
TraceTimeout: config.Duration(60 * time.Second), | |||
TraceTimeout: config.Duration(1 * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we change this to 1 second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-sampled event is added and we only waits 5 seconds for it to appear in the transmission queue. The cache will only expire and send the events once the trace timeout is reached.
I'm not sure how / why it currently works in main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's because the cache size is set to 3
in the cache.NewInMemCache()
. Once the cache is full, traces will be ejected from the cache when a new span comes in and put into the transmission queue.
Which problem is this PR solving?
Updates the InMemCache cache implementation to use uses a hasicorp/golang-lru cache to order traces. The cache checks if the capacity is set to the default cache size (10000), and replaces the capacity with
math.MaxInt32
instead.This means the cache will only contain active traces and consume only the required bytes as expired because old traces are removed instead of lingering in the buffer. The collector regularly checks the Refinery process's memory usage, and will shed old traces if over the configured allocation then manually called GC to clean up used resources.
Because Refinery applies a consistent trace timeout, the LRU enables us to efficiently retrieve the oldest trace when evicting expired traces unlike the ring buffer than required us to loop over the entire buffer when searching for expired traces or removing sent traces.
Below is a comparison of the cache before and after the update
Short description of the changes
Before
After
Note the updated cache is considerable faster removing traces (x10 faster) and removing expired traces (> x1000).