Prevent NONEEDEDGLUE errors and add cache statistics #456

phillip-stephens · 2024-09-27T20:24:44Z

Description

Prior to this change, we were adding Authorities and Additionals as separate records in the cache. This meant that the Authorities was inserted as an NS record, but the Additionals were inserted as separate A/AAAA records. Separately, our cache is an LRU cache with 4096 shards, 2 entries per shard by default. What this means is that we have no guarantees that all of a domain's Authorities + Additionals are in the cache, since a partial set can be evicted.

This loss of accuracy was what caused the NONEEDEDGLUE records to pop up, especially as more domains in the same run.

Changes

Add CacheStats that capture cache hits, misses, evictions with --verbosity=5. No performance impact if --verbosity < 5
- Ex: time="2024-09-27T20:28:52Z" level=debug msg="Cache statistics: hits=29870 misses=85581 adds=30790 ejects=21604 hitRate=25.872448% missRate=74.127552%"
Switched to using a single SingleQueryResult pointer vs. returning the struct since each function would have to create a new structure and copy the internals. The pointer avoids this.
Re-wrote cache so that an entire Authority record or `Answer record are stored atomically
- Authority records store the authority's for a certain domain - .com
- Answer records store the authoritative answers for a certain query - A google.com

Performance

Tested with 10000 top domains, A --iterative --threads=100 --verbosity=5

main

Runtime - 37.2 s
Allocs - 2199 MB total allocations (this is not peak memory use since the Garbage Collector cleans up un-used memory, this is just total allocated objects)
NOERROR - 9919
TIMEOUT/ITERATIVE_TIMEOUT - 5
NXDOMAIN - 54

Branch

Runtime - 33.9 s
Allocs - 1216 MB
NOERROR - 9916
TIMEOUT/ITERATIVE_TIMEOUT - 9
NXDOMAIN - 55

Main takeaway is there was a sharp reduction in allocations, however the runtime reduction wasn't that large (and likely within the margin of error, other A/B tests didn't show such a large delta), so it seems garbage collection is not our bottleneck.

Either way, I think this is a needed change for cache use.

… place

…omic units

… failures, which is par for the course

zakird · 2024-09-28T01:40:32Z

src/cli/worker_manager.go

@@ -530,6 +533,7 @@ func Run(gc CLIConf) {
 	close(outChan)
 	close(metaChan)
 	routineWG.Wait()
+	resolverConfig.Cache.Stats.PrintStatistics()


We should move into metadata

…t cache stats at verbosty=5

phillip-stephens added 21 commits September 24, 2024 10:08

improved logs for debugging

b2fac6d

add top layer caching

b9b0432

added a dideject bool to the cachehash so we know when ejections take…

22376df

… place

added cache stats

52bbfd3

fix poor ptr receiver on cache stats

97b3836

return didEject with Add in shardedcachehash

75db398

re-wrote zdns pkg cache logic to support caching entire results as at…

e85cc11

…omic units

changes are at least accurate

2a4da0c

google/myactivity.google iterative example working

e1d2a4c

working on failed domains!

dcc60be

fingers crossed, it's looking good. benchmark ran very fast with < 80…

a8c7401

… failures, which is par for the course

add cache info

b1e1439

log with log pkg in cache stats

4a224d4

only log debug if we're in debug mode, saves expensive lookups

fb6510c

use SingleQueryResult ptrs for memory reduction

2935f2c

Merge branch 'main' into phillip/fix-cache-lookups-iterative

85b6de2

clean up log msg

1f7e0f2

remove unneeded log

cac3038

fix up cache tests, handle retv nil bug

0e3ea2a

lint issue

f0e0790

lint

777c5a1

phillip-stephens marked this pull request as ready for review September 27, 2024 21:21

phillip-stephens requested a review from a team as a code owner September 27, 2024 21:21

zakird reviewed Sep 28, 2024

View reviewed changes

zakird approved these changes Sep 28, 2024

View reviewed changes

insert cache stats into metadata rather than print in logs, only prin…

5d30da8

…t cache stats at verbosty=5

phillip-stephens merged commit c9f92e1 into main Sep 30, 2024
3 checks passed

phillip-stephens deleted the phillip/fix-cache-lookups-iterative branch September 30, 2024 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent NONEEDEDGLUE errors and add cache statistics #456

Prevent NONEEDEDGLUE errors and add cache statistics #456

phillip-stephens commented Sep 27, 2024 •

edited

Loading

zakird Sep 28, 2024

Prevent NONEEDEDGLUE errors and add cache statistics #456

Prevent NONEEDEDGLUE errors and add cache statistics #456

Conversation

phillip-stephens commented Sep 27, 2024 • edited Loading

Description

Changes

Performance

main

Branch

zakird Sep 28, 2024

Choose a reason for hiding this comment

phillip-stephens commented Sep 27, 2024 •

edited

Loading