Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up finding jumpdests #80

Merged
merged 3 commits into from
Aug 3, 2019
Merged

Speed up finding jumpdests #80

merged 3 commits into from
Aug 3, 2019

Conversation

chfast
Copy link
Member

@chfast chfast commented Jul 2, 2019

This replaces the linear search if the jumpdest with binary search. It also applies data-driven approach where the jumpdest "map" is not vector of pairs, but two vectors of offsets and targets. We also shrank the size of elements from int to int16_t.

There is a small trade-off here. The analysis takes longer because requires 2x more vector resizes. And for contracts with small number of jumpdests (like blake2b_huff which only has 3 of them) the time increase in analysis might hide the gain in execution. But still I believe it's worth the 33% speed increase in blake2b_shifts which has a lot of jumpdests.

Comparing bin/evmone-bench-master to bin/evmone-bench
Benchmark                            Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------
sha1_shifts/analysis              +0.0501         +0.0501             5             5             5             5
sha1_shifts/empty                 -0.0379         -0.0379            50            48            50            48
sha1_shifts/1351                  -0.0488         -0.0488           938           893           938           893
sha1_shifts/2737                  -0.0496         -0.0497          1825          1734          1825          1734
sha1_shifts/5311                  -0.0486         -0.0486          3554          3381          3554          3381
sha1_shifts/65536                 -0.0494         -0.0495         43275         41135         43274         41132
stop/analysis                     +0.0131         +0.0131             0             0             0             0
stop                              +0.0007         +0.0007             1             1             1             1
blake2b_huff/analysis             +0.0354         +0.0355            52            54            52            54
blake2b_huff/empty                +0.0291         +0.0291            73            75            73            75
blake2b_huff/abc                  +0.0307         +0.0307            72            75            72            75
blake2b_huff/2805nulls            -0.0033         -0.0033           485           483           485           483
blake2b_huff/2805aa               +0.0025         +0.0025           482           483           482           483
blake2b_huff/5610nulls            -0.0065         -0.0065           896           890           896           890
blake2b_huff/8415nulls            -0.0075         -0.0075          1285          1276          1285          1276
blake2b_huff/65536nulls           -0.0094         -0.0094          9609          9519          9609          9519
sha1_divs/analysis                +0.0529         +0.0529             5             5             5             5
sha1_divs/empty                   -0.0164         -0.0164            98            96            98            96
sha1_divs/1351                    -0.0209         -0.0209          1913          1873          1913          1873
sha1_divs/2737                    -0.0207         -0.0207          3728          3651          3728          3651
sha1_divs/5311                    -0.0211         -0.0211          7272          7118          7272          7118
sha1_divs/65536                   -0.0196         -0.0196         88532         86797         88531         86798
weierstrudel/analysis             +0.0591         +0.0591            65            69            65            69
weierstrudel/0                    +0.0005         +0.0005           337           337           337           337
weierstrudel/1                    -0.0192         -0.0192           660           648           660           648
weierstrudel/2                    -0.0192         -0.0192           824           809           824           809
weierstrudel/3                    -0.0194         -0.0194           989           970           989           970
weierstrudel/8                    -0.0231         -0.0231          1806          1764          1806          1764
weierstrudel/9                    -0.0227         -0.0227          1971          1926          1971          1926
weierstrudel/14                   -0.0240         -0.0240          2790          2723          2790          2723
blake2b_shifts/analysis           +0.1496         +0.1496            26            30            26            30
blake2b_shifts/empty              +0.0000         +0.0000             0             0             0             0
blake2b_shifts/2805nulls          -0.3398         -0.3398          6568          4336          6568          4336
blake2b_shifts/5610nulls          -0.3313         -0.3313         12930          8646         12929          8646
blake2b_shifts/8415nulls          -0.3276         -0.3276         19254         12947         19254         12947
blake2b_shifts/65536nulls         -0.2887         -0.2887        143245        101887        143243        101885

@chfast chfast force-pushed the jump branch 2 times, most recently from a199687 to 18f421e Compare July 26, 2019 10:38
@codecov-io
Copy link

codecov-io commented Jul 26, 2019

Codecov Report

Merging #80 into master will increase coverage by 4.38%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #80      +/-   ##
==========================================
+ Coverage   83.27%   87.66%   +4.38%     
==========================================
  Files          20       20              
  Lines        1985     1986       +1     
  Branches      218      216       -2     
==========================================
+ Hits         1653     1741      +88     
+ Misses        307      220      -87     
  Partials       25       25

@chfast chfast force-pushed the jump branch 3 times, most recently from 110f662 to 9ca8739 Compare July 26, 2019 11:03
@chfast chfast requested a review from gumb0 July 26, 2019 11:09
@gumb0
Copy link
Member

gumb0 commented Jul 26, 2019

How does having two vectors instead of a vector of pairs help? Shorter data better fits into the cache line?

@gumb0
Copy link
Member

gumb0 commented Jul 26, 2019

Does this beat unordered_map?

test/utils/dump.cpp Outdated Show resolved Hide resolved
@chfast
Copy link
Member Author

chfast commented Jul 26, 2019

How does having two vectors instead of a vector of pairs help? Shorter data better fits into the cache line?

Only the first vector is traversed. In this PR we lowered the memory on which search happens by 4. In worst case of 2 * 0x6000 this is still 1.5x of L1 cache. Also the last checks in the binary search are close to one another so we might hit the same cache line (64 bytes - 32 items).

Anyway, different variants were benchmarked in https://github.com/ethereum/evmone/tree/internal_benchmarks (to be merged independently).

One "easy" missing optimization is to pack both vectors into single memory allocation and not to over-allocate. I was thinking about first using stack space of 0x6000 * 2 * 2 and them copy it to the single heap allocated memory buffer.

There is also possibility to use SIMD to compare 16 or 32 items at a time. Or even use "k-Ary Search": https://event.cwi.nl/damon2009/DaMoN09-KarySearch.pdf

Does this beat unordered_map?

I will have to check. I hope it is because unordered_map is not ideal: there is creation overhead (many allocations, not friendly if we'd like to cache the analysis results) and the default hash function is identity so you can easy create a contract that will make the search linear. And we would ignore the fact that the data is already sorted.

@chfast
Copy link
Member Author

chfast commented Aug 2, 2019

Using unordered_map makes it ~4x faster. Benchmark added in cea415e. I have to rerun it later because I'm currently testing that with a laptop running on battery.

I think we will have to revisit using hash map here later on. I've seen some interesting recent work in the subject, including hashmaps using continuous memory.

@chfast chfast force-pushed the jump branch 2 times, most recently from a058908 to d3cfba8 Compare August 3, 2019 15:48
@chfast chfast merged commit 597e658 into master Aug 3, 2019
@chfast chfast deleted the jump branch August 3, 2019 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants