Txhandler: Improve dedupe hit rate #5022

iansuvak · 2023-01-18T04:49:07Z

Summary

Add a check in the cache without rehashing before writing and returning to avoid the race condition that allows close to simultaneous received transactions to miss the cache due to a race condition.

Currently checks both caches but could reasonably only check the current cache if we wanted to trade some time cost for slightly lower hit rate.

Test Plan

Existing tests should pass.

Benchmark results:

Old:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1620752	       773.7 ns/op	     200 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1030 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2157205	       591.4 ns/op	     165 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  994120	      1312 ns/op	     221 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1668930	       782.4 ns/op	     195 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  991202	      1313 ns/op	     221 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1564032	       811.0 ns/op	     205 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  989119	      1320 ns/op	     222 B/op	       4 allocs/op

New:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1630840	       770.9 ns/op	     199 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1121 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2179699	       586.2 ns/op	     164 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  929139	      1438 ns/op	     229 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1500031	       813.2 ns/op	     212 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  925924	      1429 ns/op	     229 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1600089	       808.9 ns/op	     202 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  891339	      1475 ns/op	     234 B/op	       4 allocs/op

codecov · 2023-01-18T05:17:29Z

Codecov Report

Merging #5022 (94a6f5b) into master (eddf773) will increase coverage by 0.00%.
The diff coverage is 33.33%.

@@           Coverage Diff           @@
##           master    #5022   +/-   ##
=======================================
  Coverage   53.63%   53.64%           
=======================================
  Files         432      432           
  Lines       54057    54060    +3     
=======================================
+ Hits        28996    28999    +3     
- Misses      22812    22816    +4     
+ Partials     2249     2245    -4

Impacted Files	Coverage Δ
data/txDupCache.go	`95.14% <33.33%> (-1.86%)`	⬇️
ledger/blockqueue.go	`85.63% <0.00%> (-2.88%)`	⬇️
ledger/tracker.go	`74.26% <0.00%> (-0.85%)`	⬇️
cmd/tealdbg/debugger.go	`72.69% <0.00%> (-0.81%)`	⬇️
catchup/service.go	`69.32% <0.00%> (-0.49%)`	⬇️
ledger/acctupdates.go	`69.60% <0.00%> (+0.12%)`	⬆️
network/wsNetwork.go	`64.92% <0.00%> (+0.17%)`	⬆️
catchup/peerSelector.go	`100.00% <0.00%> (+1.04%)`	⬆️
network/wsPeer.go	`68.32% <0.00%> (+1.89%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

data/txDupCache.go

cce · 2023-01-18T15:29:55Z

data/txDupCache.go

+	if _, found := c.cur[*d]; found {
+		return d, true
+	}
+	if _, found := c.prev[*d]; found {


the prev map is using a different salt. I think you would need to have innerCheck return (d *crypto.Digest, pd *crypto.Digest, hit bool) instead of just the one digest for cur to avoid rehash here.

cce · 2023-01-18T15:33:38Z

I guess would need to step back and look at how often these rare duplicates are getting through in practice, and is it more performant overall to let that small fraction through than to pay a penalty of what looks like 4-8% slower performance overall from your benchmarks?

cce · 2023-01-18T15:35:56Z

11.7% slower for the BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10 high thread case but we should only have 20 tx handlers I think which is closer to threads=16 which is 8.8% slower from your benchmarks?

iansuvak · 2023-01-18T15:49:26Z

It's about 2.5-3% of the transactions that we see are duplicates and could reasonably only check the cur to avoid having to rehash with the old salt.

Benchmarks for that are here which looks like 4% slowdown on the thread=16 case.

The real cost to weigh is the cost of this slowdown vs the cost of re-processing these through the TxHandler right?
Either way happy to close if consensus is we don't want this

:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1558552	       787.0 ns/op	     206 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1076 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2173646	       569.7 ns/op	     164 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  958526	      1368 ns/op	     225 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1612461	       794.1 ns/op	     201 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  956253	      1371 ns/op	     225 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1465798	       820.3 ns/op	     216 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  942253	      1399 ns/op	     227 B/op	       4 allocs/op

cce · 2023-01-18T16:24:11Z

That's right, the cost of the slowdown for all cache checks vs the reprocessing cost of the ones that got through. Do you know how many total transactions were processed during the period you were looking at? Not just the ones that made it through the filter but the raw TX message count that hit this code — that would push the duplicate rate you're seeing of 2.5%-3% a lot further down I assume?

(Also these changes are only affecting saltedCacheMaker not the digestCacheMaker so the benchmarks are only relevant for half the results you're posting)

iansuvak · 2023-01-18T18:59:11Z

That's right, the cost of the slowdown for all cache checks vs the reprocessing cost of the ones that got through. Do you know how many total transactions were processed during the period you were looking at? Not just the ones that made it through the filter but the raw TX message count that hit this code — that would push the duplicate rate you're seeing of 2.5%-3% a lot further down I assume?

The TX messages that didn't make it through the filter by definition wouldn't hit this code no? They got filtered out using RLocks only right? The raw number is a lot larger of course but it still seems to me that this would be worth it iff:

2.5% * t_handleTxn >= ~120ns
t_handleTxn >= ~4800ns

Do we know the average or worst case t above?

(Also these changes are only affecting saltedCacheMaker not the digestCacheMaker so the benchmarks are only relevant for half the results you're posting)

Yup, Just posted the whole output of the single Benchmark function without editing it.

cce · 2023-01-18T20:07:40Z

your node should have been constantly increasing its tx counter during this time, so you should have access to the value of the tx counter value was before and after your data collection

data/txDupCache.go

algonautshant

Looks correct.
I wish a test was possible, but it is impossible to determine if the new branch is exercised.

Improve txdedupe hitrate

aa37bb1

iansuvak requested a review from algorandskiy January 18, 2023 04:49

Properly return the hash and add a comment

fdcf3bb

iansuvak added the Team Carbon-11 label Jan 18, 2023

iansuvak changed the title ~~Improve TransactionHandler dedupe hit rate~~ Txhandler: Improve dedupe hit rate Jan 18, 2023

iansuvak added the Skip-Release-Notes label Jan 18, 2023

iansuvak requested review from algonautshant, brianolson and cce January 18, 2023 15:23

cce reviewed Jan 18, 2023

View reviewed changes

data/txDupCache.go Outdated Show resolved Hide resolved

cce reviewed Jan 18, 2023

View reviewed changes

Move the check before the pageswap

34db980

cce reviewed Jan 18, 2023

View reviewed changes

data/txDupCache.go Show resolved Hide resolved

Move inside else block to avoid redundant lookup

94a6f5b

algonautshant approved these changes Jan 19, 2023

View reviewed changes

algorandskiy approved these changes Jan 23, 2023

View reviewed changes

algorandskiy merged commit a6eece5 into algorand:master Jan 23, 2023

algorandskiy assigned iansuvak Jan 23, 2023

onetechnical mentioned this pull request Mar 1, 2023

DevOps: alphanet master remerge #5173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Txhandler: Improve dedupe hit rate #5022

Txhandler: Improve dedupe hit rate #5022

iansuvak commented Jan 18, 2023

codecov bot commented Jan 18, 2023 •

edited

Loading

cce Jan 18, 2023 •

edited

Loading

cce commented Jan 18, 2023

cce commented Jan 18, 2023

iansuvak commented Jan 18, 2023

cce commented Jan 18, 2023

iansuvak commented Jan 18, 2023 •

edited

Loading

cce commented Jan 18, 2023

algonautshant left a comment

Txhandler: Improve dedupe hit rate #5022

Txhandler: Improve dedupe hit rate #5022

Conversation

iansuvak commented Jan 18, 2023

Summary

Test Plan

codecov bot commented Jan 18, 2023 • edited Loading

Codecov Report

cce Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

cce commented Jan 18, 2023

cce commented Jan 18, 2023

iansuvak commented Jan 18, 2023

cce commented Jan 18, 2023

iansuvak commented Jan 18, 2023 • edited Loading

cce commented Jan 18, 2023

algonautshant left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 18, 2023 •

edited

Loading

cce Jan 18, 2023 •

edited

Loading

iansuvak commented Jan 18, 2023 •

edited

Loading