Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Txhandler: Improve dedupe hit rate #5022

Merged
merged 4 commits into from
Jan 23, 2023
Merged

Conversation

iansuvak
Copy link
Contributor

Summary

Add a check in the cache without rehashing before writing and returning to avoid the race condition that allows close to simultaneous received transactions to miss the cache due to a race condition.

Currently checks both caches but could reasonably only check the current cache if we wanted to trade some time cost for slightly lower hit rate.

Test Plan

Existing tests should pass.

Benchmark results:

Old:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1620752	       773.7 ns/op	     200 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1030 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2157205	       591.4 ns/op	     165 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  994120	      1312 ns/op	     221 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1668930	       782.4 ns/op	     195 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  991202	      1313 ns/op	     221 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1564032	       811.0 ns/op	     205 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  989119	      1320 ns/op	     222 B/op	       4 allocs/op

New:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1630840	       770.9 ns/op	     199 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1121 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2179699	       586.2 ns/op	     164 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  929139	      1438 ns/op	     229 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1500031	       813.2 ns/op	     212 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  925924	      1429 ns/op	     229 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1600089	       808.9 ns/op	     202 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  891339	      1475 ns/op	     234 B/op	       4 allocs/op

@iansuvak iansuvak requested a review from algorandskiy January 18, 2023 04:49
@codecov
Copy link

codecov bot commented Jan 18, 2023

Codecov Report

Merging #5022 (94a6f5b) into master (eddf773) will increase coverage by 0.00%.
The diff coverage is 33.33%.

@@           Coverage Diff           @@
##           master    #5022   +/-   ##
=======================================
  Coverage   53.63%   53.64%           
=======================================
  Files         432      432           
  Lines       54057    54060    +3     
=======================================
+ Hits        28996    28999    +3     
- Misses      22812    22816    +4     
+ Partials     2249     2245    -4     
Impacted Files Coverage Δ
data/txDupCache.go 95.14% <33.33%> (-1.86%) ⬇️
ledger/blockqueue.go 85.63% <0.00%> (-2.88%) ⬇️
ledger/tracker.go 74.26% <0.00%> (-0.85%) ⬇️
cmd/tealdbg/debugger.go 72.69% <0.00%> (-0.81%) ⬇️
catchup/service.go 69.32% <0.00%> (-0.49%) ⬇️
ledger/acctupdates.go 69.60% <0.00%> (+0.12%) ⬆️
network/wsNetwork.go 64.92% <0.00%> (+0.17%) ⬆️
catchup/peerSelector.go 100.00% <0.00%> (+1.04%) ⬆️
network/wsPeer.go 68.32% <0.00%> (+1.89%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@iansuvak iansuvak changed the title Improve TransactionHandler dedupe hit rate Txhandler: Improve dedupe hit rate Jan 18, 2023
if _, found := c.cur[*d]; found {
return d, true
}
if _, found := c.prev[*d]; found {
Copy link
Contributor

@cce cce Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the prev map is using a different salt. I think you would need to have innerCheck return (d *crypto.Digest, pd *crypto.Digest, hit bool) instead of just the one digest for cur to avoid rehash here.

@cce
Copy link
Contributor

cce commented Jan 18, 2023

I guess would need to step back and look at how often these rare duplicates are getting through in practice, and is it more performant overall to let that small fraction through than to pay a penalty of what looks like 4-8% slower performance overall from your benchmarks?

@cce
Copy link
Contributor

cce commented Jan 18, 2023

11.7% slower for the BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10 high thread case but we should only have 20 tx handlers I think which is closer to threads=16 which is 8.8% slower from your benchmarks?

@iansuvak
Copy link
Contributor Author

It's about 2.5-3% of the transactions that we see are duplicates and could reasonably only check the cur to avoid having to rehash with the old salt.

Benchmarks for that are here which looks like 4% slowdown on the thread=16 case.

The real cost to weigh is the cost of this slowdown vs the cost of re-processing these through the TxHandler right?
Either way happy to close if consensus is we don't want this

:

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-10         	 1558552	       787.0 ns/op	     206 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-10         	 1000000	      1076 ns/op	     220 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-10         	 2173646	       569.7 ns/op	     164 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-10         	  958526	      1368 ns/op	     225 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-10        	 1612461	       794.1 ns/op	     201 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-10        	  956253	      1371 ns/op	     225 B/op	       4 allocs/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-10       	 1465798	       820.3 ns/op	     216 B/op	       2 allocs/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-10       	  942253	      1399 ns/op	     227 B/op	       4 allocs/op

@cce
Copy link
Contributor

cce commented Jan 18, 2023

That's right, the cost of the slowdown for all cache checks vs the reprocessing cost of the ones that got through. Do you know how many total transactions were processed during the period you were looking at? Not just the ones that made it through the filter but the raw TX message count that hit this code — that would push the duplicate rate you're seeing of 2.5%-3% a lot further down I assume?

(Also these changes are only affecting saltedCacheMaker not the digestCacheMaker so the benchmarks are only relevant for half the results you're posting)

@iansuvak
Copy link
Contributor Author

iansuvak commented Jan 18, 2023

That's right, the cost of the slowdown for all cache checks vs the reprocessing cost of the ones that got through. Do you know how many total transactions were processed during the period you were looking at? Not just the ones that made it through the filter but the raw TX message count that hit this code — that would push the duplicate rate you're seeing of 2.5%-3% a lot further down I assume?

The TX messages that didn't make it through the filter by definition wouldn't hit this code no? They got filtered out using RLocks only right? The raw number is a lot larger of course but it still seems to me that this would be worth it iff:

2.5% * thandleTxn >= ~120ns
thandleTxn >= ~4800ns

Do we know the average or worst case t above?

(Also these changes are only affecting saltedCacheMaker not the digestCacheMaker so the benchmarks are only relevant for half the results you're posting)

Yup, Just posted the whole output of the single Benchmark function without editing it.

@cce
Copy link
Contributor

cce commented Jan 18, 2023

your node should have been constantly increasing its tx counter during this time, so you should have access to the value of the tx counter value was before and after your data collection

Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct.
I wish a test was possible, but it is impossible to determine if the new branch is exercised.

@algorandskiy algorandskiy merged commit a6eece5 into algorand:master Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants