-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune performance of fetch pending hashes #13219
Comments
we don't need ordering we only need to evict older hashes
awesome
that makes sense, no need to buffer them |
That's very cool, I think this really improve perf. I will propose different data structure - need a bit more time for this. |
a circular buffer will work here, will just evict older hashes as new one enters |
@Elvis339 correct me if i'm wrong but in the optimized version the most resource intensive stuffs seem to be multi threading |
Circular buffer definitely seems interesting, I did a quick check on the most used methods of
|
depends tbh a hash set is a pointer to a data on the heap, that means to remove you take a pointer from the stack and then using this pointer you remove a data at a precise location on the heap, the catch is the CPU can never optimize this, it always has to fetch the data from the main memory now a circular buffer is interesting in the sense that, depending on the size of data you are storing (in this case hash, 33 bytes at most) will be more optimizable by the CPU than the hash set, this is it, your CPU fetches a data let say data A at Address 1, now the CPU does not fetch data in the exact size you want it fetches depending on the computer maybe 64 bytes, so when your CPU fetches data at Address 1, it automatically fetches data at Address 2 then gives you data at Address 1, stores data of address two into a cache, now it's expecting next time you request for data it already has it and just produces it from the cache while the hash set can seem better it can't optimize for cache because it not contiguous and best perf optimization, is just you using the cache effectively. while the hash set gives you lot of cache misses the circular buffer won't of course, this needs a benchmark all these are theory tbh. @mattsse what do you think ? |
Interesting points about memory locality, I believe some of the statements about CPU optimization need clarification:
The core question isn't about picking one over the other universally, but rather identifying the most appropriate and simplest data structure for our specific use case. Looking at the current implementation, the LRU cache is performing redundant operations - maintaining ordering when we might not need it. I agree with most of the statements you wrote, but disagree with some, I think you're on a good path and I like the idea of having CPU cache-friendly data structures! You're absolutely right that benchmarking would tell us the real story. Would be cool to test with:
Got any specific benchmark scenarios in mind? Would be interesting to see how different patterns affect cache behavior. |
I disagree with both 2 and 1, hash set is just a data structure with binding logic, it's nothing more than that, the circular buffer will always have a higher locality of reference than a HashMap assuming all things are equal The CPU optimization technique in this case is the prefetch and it works on contiguous Data structure, that's why an array can be sometimes better than a HashMap when indexing depending on the size circular buffer solves the LRU problem, they are like LRU only they use FIFO mechanism. any benchmark should do tbh |
This issue is stale because it has been open for 21 days with no activity. |
This issue was closed because it has been inactive for 7 days since being marked as stale. |
Describe the feature
This issue builds on Profile tx fetching to improve transaction fetching performance.
Optimizations Applied
These changes resulted in ~51% performance improvement in transaction fetching operations.
Additional Consideration: LRU Usage
Currently, some fields like
seen_hashes
are using an LRU cache, but this might not be optimal.Why LRU might not be ideal:
Would love feedback on whether ordering is actually important, as moving to a simpler data structure could yield sweet performance juice.
Additional context
Flamegraphs
Not Optimized
Optimized
Benchmark
Not Optimized
Optimized
The text was updated successfully, but these errors were encountered: