feat: Add a cache to the cache #1296

kentquirk · 2024-08-22T18:07:45Z

Which problem is this PR solving?

We have a customer who has so much traffic sometimes, that the cuckoo drop cache can't record the dropped IDs as fast as they're dropping them, so it's overrunning its input queue.

This is a frustrating single point of failure because there is a single goroutine responsible for filling this cache which means adding CPUs won't help, and because of trace locality, adding more nodes won't help when a burst of spans comes from a single giant trace. Making the queue larger just means that it will take a little longer to fill up.

The contention is that we write to the cache when we drop a trace, but we read from it for every span that arrives. So if you have a single huge trace, you might fairly quickly decide to drop it, but still have to query the cache tens of thousands of times for new spans. The cuckoo cache is pretty fast but we can make it faster.

Short description of the changes

Add a cache in front of the cache (a Set with a TTL of 3 seconds) that buffers only the most recently dropped trace IDs; we check that before we check the cuckoo cache. This set responds quite a bit faster (at least 4x) than the cuckoo cache, and importantly it also prevents lock contention for the cuckoo cache, speeding it up for the cache writes.
Tweak the logic for draining the write queue; it benchmarks faster this way.
Move the metrics timer inside the lock so we're not measuring the waiting time
The function genID used by another benchmark, that I also wanted to use here, was broken so I fixed it.
Added a couple of benchmarks I used to prove to myself that the Set was fast enough.

VinozzZ

I also noticed a little bug in generics.SetWithTTL where it's mixing usage between time.Now and clockwork.Now here. Do you mind to fix it in this PR since we are using SetWithTTL ?

generics/setttl_test.go

collect/cache/cuckooSentCache.go

cartermp · 2024-08-22T20:09:39Z

This is a great PR.

kentquirk added 3 commits August 22, 2024 13:53

The genID function was broken!

95ab208

Add some benchmarks

f31b471

Yodawg the cache

96a1ab8

kentquirk requested a review from a team as a code owner August 22, 2024 18:07

kentquirk self-assigned this Aug 22, 2024

kentquirk added this to the v2.8 milestone Aug 22, 2024

Merge branch 'main' into kent.cache_cache

7f12f82

VinozzZ reviewed Aug 22, 2024

View reviewed changes

generics/setttl_test.go Show resolved Hide resolved

collect/cache/cuckooSentCache.go Show resolved Hide resolved

Fix clock

e35a857

kentquirk requested a review from VinozzZ August 22, 2024 19:44

VinozzZ approved these changes Aug 22, 2024

View reviewed changes

kentquirk merged commit 9f5ea80 into main Aug 22, 2024
5 checks passed

kentquirk deleted the kent.cache_cache branch August 22, 2024 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add a cache to the cache #1296

feat: Add a cache to the cache #1296

kentquirk commented Aug 22, 2024

VinozzZ left a comment

cartermp commented Aug 22, 2024

feat: Add a cache to the cache #1296

feat: Add a cache to the cache #1296

Conversation

kentquirk commented Aug 22, 2024

Which problem is this PR solving?

Short description of the changes

VinozzZ left a comment

Choose a reason for hiding this comment

cartermp commented Aug 22, 2024