Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster AVX2 encoding #153

Merged
merged 1 commit into from
Nov 10, 2020
Merged

Faster AVX2 encoding #153

merged 1 commit into from
Nov 10, 2020

Conversation

klauspost
Copy link
Owner

  • Remove 50% of bounds checks when copying.
  • Use RIP only addressing, free one register.
benchmark                                 old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                    57663.49      58005.87      1.01x
BenchmarkGalois1M-32                      49479.31      49848.29      1.01x
BenchmarkGaloisXor128K-32                 46310.69      46501.88      1.00x
BenchmarkGaloisXor1M-32                   43804.86      43984.39      1.00x
BenchmarkEncode10x2x10000-32              25926.93      27457.75      1.06x
BenchmarkEncode100x20x10000-32            2635.82       2818.95       1.07x
BenchmarkEncode17x3x1M-32                 63215.11      61576.76      0.97x
BenchmarkEncode10x4x16M-32                19551.54      19505.07      1.00x
BenchmarkEncode5x2x1M-32                  79612.06      81985.14      1.03x
BenchmarkEncode10x2x1M-32                 121478.29     127739.41     1.05x
BenchmarkEncode10x4x1M-32                 70757.61      74423.67      1.05x
BenchmarkEncode50x20x1M-32                19811.96      20103.32      1.01x
BenchmarkEncode17x3x16M-32                27202.10      27825.34      1.02x
BenchmarkEncode_8x4x8M-32                 19029.04      19701.31      1.04x
BenchmarkEncode_12x4x12M-32               22449.87      22480.51      1.00x
BenchmarkEncode_16x4x16M-32               24536.74      24672.24      1.01x
BenchmarkEncode_16x4x32M-32               24381.34      24981.99      1.02x
BenchmarkEncode_16x4x64M-32               24717.69      25086.94      1.01x
BenchmarkEncode_8x5x8M-32                 16763.51      17154.04      1.02x
BenchmarkEncode_8x6x8M-32                 15067.22      15205.87      1.01x
BenchmarkEncode_8x7x8M-32                 13156.38      13589.40      1.03x
BenchmarkEncode_8x9x8M-32                 11363.74      11523.70      1.01x
BenchmarkEncode_8x10x8M-32                10359.37      10474.91      1.01x
BenchmarkEncode_8x11x8M-32                9627.07       9463.24       0.98x
BenchmarkEncode_8x8x05M-32                30104.80      32634.89      1.08x
BenchmarkEncode_8x8x1M-32                 36497.28      36425.88      1.00x
BenchmarkEncode_8x8x8M-32                 12186.19      11602.41      0.95x
BenchmarkEncode_8x8x32M-32                11670.72      11413.71      0.98x
BenchmarkEncode_24x8x24M-32               21709.83      21652.50      1.00x
BenchmarkEncode_24x8x48M-32               22494.40      22280.59      0.99x
BenchmarkVerify10x2x10000-32              10567.56      10483.91      0.99x
BenchmarkVerify50x5x50000-32              28102.84      27923.63      0.99x
BenchmarkVerify10x2x1M-32                 30298.33      30106.18      0.99x
BenchmarkVerify5x2x1M-32                  16115.91      15847.03      0.98x
BenchmarkVerify10x4x1M-32                 15382.13      14852.68      0.97x
BenchmarkVerify50x20x1M-32                8476.02       8466.24       1.00x
BenchmarkVerify10x4x16M-32                15101.03      15434.71      1.02x
BenchmarkReconstruct10x2x10000-32         26228.18      26960.19      1.03x
BenchmarkReconstruct50x5x50000-32         31091.42      30975.82      1.00x
BenchmarkReconstruct10x2x1M-32            58548.87      60281.92      1.03x
BenchmarkReconstruct5x2x1M-32             39499.23      41791.80      1.06x
BenchmarkReconstruct10x4x1M-32            41448.60      43053.15      1.04x
BenchmarkReconstruct50x20x1M-32           17185.99      17354.67      1.01x
BenchmarkReconstruct10x4x16M-32           18798.60      18847.43      1.00x
BenchmarkReconstructData10x2x10000-32     27208.48      27538.38      1.01x
BenchmarkReconstructData50x5x50000-32     32135.65      32078.91      1.00x
BenchmarkReconstructData10x2x1M-32        63180.19      67332.17      1.07x
BenchmarkReconstructData5x2x1M-32         47532.85      49932.17      1.05x
BenchmarkReconstructData10x4x1M-32        50059.14      52323.15      1.05x
BenchmarkReconstructData50x20x1M-32       26679.75      26714.11      1.00x
BenchmarkReconstructData10x4x16M-32       24854.99      24527.23      0.99x
BenchmarkReconstructP10x2x10000-32        115089.87     113229.75     0.98x
BenchmarkReconstructP10x5x20000-32        129838.75     132871.10     1.02x
BenchmarkParallel_8x8x64K-32              69951.43      69980.44      1.00x
BenchmarkParallel_8x8x05M-32              11752.94      11724.35      1.00x
BenchmarkParallel_20x10x05M-32            18553.93      18613.33      1.00x
BenchmarkParallel_8x8x1M-32               11639.19      11746.86      1.01x
BenchmarkParallel_8x8x8M-32               11799.36      11685.63      0.99x
BenchmarkParallel_8x8x32M-32              11510.94      11791.72      1.02x
BenchmarkParallel_8x3x1M-32               20268.92      20678.21      1.02x
BenchmarkParallel_8x4x1M-32               17616.05      17856.17      1.01x
BenchmarkParallel_8x5x1M-32               15590.87      15872.42      1.02x
BenchmarkStreamEncode10x2x10000-32        14917.08      15408.39      1.03x
BenchmarkStreamEncode100x20x10000-32      2014.81       2077.31       1.03x
BenchmarkStreamEncode17x3x1M-32           11839.37      12434.80      1.05x
BenchmarkStreamEncode10x4x16M-32          9151.14       9206.98       1.01x
BenchmarkStreamEncode5x2x1M-32            13598.55      13663.56      1.00x
BenchmarkStreamEncode10x2x1M-32           13192.91      13453.41      1.02x
BenchmarkStreamEncode10x4x1M-32           12109.90      12050.68      1.00x
BenchmarkStreamEncode50x20x1M-32          8640.73       8370.10       0.97x
BenchmarkStreamEncode17x3x16M-32          10473.17      10527.04      1.01x
BenchmarkStreamVerify10x2x10000-32        7032.23       7128.82       1.01x
BenchmarkStreamVerify50x5x50000-32        13023.46      13109.31      1.01x
BenchmarkStreamVerify10x2x1M-32           11941.63      11949.91      1.00x
BenchmarkStreamVerify5x2x1M-32            8029.93       8263.39       1.03x
BenchmarkStreamVerify10x4x1M-32           8137.82       8271.11       1.02x
BenchmarkStreamVerify50x20x1M-32          7378.87       7708.81       1.04x
BenchmarkStreamVerify10x4x16M-32          8973.18       8955.29       1.00x

* Remove 50% of bounds checks when copying.
* Use RIP only addressing, free one register.

```
benchmark                                 old MB/s      new MB/s      speedup
BenchmarkGalois128K-32                    57663.49      58005.87      1.01x
BenchmarkGalois1M-32                      49479.31      49848.29      1.01x
BenchmarkGaloisXor128K-32                 46310.69      46501.88      1.00x
BenchmarkGaloisXor1M-32                   43804.86      43984.39      1.00x
BenchmarkEncode10x2x10000-32              25926.93      27457.75      1.06x
BenchmarkEncode100x20x10000-32            2635.82       2818.95       1.07x
BenchmarkEncode17x3x1M-32                 63215.11      61576.76      0.97x
BenchmarkEncode10x4x16M-32                19551.54      19505.07      1.00x
BenchmarkEncode5x2x1M-32                  79612.06      81985.14      1.03x
BenchmarkEncode10x2x1M-32                 121478.29     127739.41     1.05x
BenchmarkEncode10x4x1M-32                 70757.61      74423.67      1.05x
BenchmarkEncode50x20x1M-32                19811.96      20103.32      1.01x
BenchmarkEncode17x3x16M-32                27202.10      27825.34      1.02x
BenchmarkEncode_8x4x8M-32                 19029.04      19701.31      1.04x
BenchmarkEncode_12x4x12M-32               22449.87      22480.51      1.00x
BenchmarkEncode_16x4x16M-32               24536.74      24672.24      1.01x
BenchmarkEncode_16x4x32M-32               24381.34      24981.99      1.02x
BenchmarkEncode_16x4x64M-32               24717.69      25086.94      1.01x
BenchmarkEncode_8x5x8M-32                 16763.51      17154.04      1.02x
BenchmarkEncode_8x6x8M-32                 15067.22      15205.87      1.01x
BenchmarkEncode_8x7x8M-32                 13156.38      13589.40      1.03x
BenchmarkEncode_8x9x8M-32                 11363.74      11523.70      1.01x
BenchmarkEncode_8x10x8M-32                10359.37      10474.91      1.01x
BenchmarkEncode_8x11x8M-32                9627.07       9463.24       0.98x
BenchmarkEncode_8x8x05M-32                30104.80      32634.89      1.08x
BenchmarkEncode_8x8x1M-32                 36497.28      36425.88      1.00x
BenchmarkEncode_8x8x8M-32                 12186.19      11602.41      0.95x
BenchmarkEncode_8x8x32M-32                11670.72      11413.71      0.98x
BenchmarkEncode_24x8x24M-32               21709.83      21652.50      1.00x
BenchmarkEncode_24x8x48M-32               22494.40      22280.59      0.99x
BenchmarkVerify10x2x10000-32              10567.56      10483.91      0.99x
BenchmarkVerify50x5x50000-32              28102.84      27923.63      0.99x
BenchmarkVerify10x2x1M-32                 30298.33      30106.18      0.99x
BenchmarkVerify5x2x1M-32                  16115.91      15847.03      0.98x
BenchmarkVerify10x4x1M-32                 15382.13      14852.68      0.97x
BenchmarkVerify50x20x1M-32                8476.02       8466.24       1.00x
BenchmarkVerify10x4x16M-32                15101.03      15434.71      1.02x
BenchmarkReconstruct10x2x10000-32         26228.18      26960.19      1.03x
BenchmarkReconstruct50x5x50000-32         31091.42      30975.82      1.00x
BenchmarkReconstruct10x2x1M-32            58548.87      60281.92      1.03x
BenchmarkReconstruct5x2x1M-32             39499.23      41791.80      1.06x
BenchmarkReconstruct10x4x1M-32            41448.60      43053.15      1.04x
BenchmarkReconstruct50x20x1M-32           17185.99      17354.67      1.01x
BenchmarkReconstruct10x4x16M-32           18798.60      18847.43      1.00x
BenchmarkReconstructData10x2x10000-32     27208.48      27538.38      1.01x
BenchmarkReconstructData50x5x50000-32     32135.65      32078.91      1.00x
BenchmarkReconstructData10x2x1M-32        63180.19      67332.17      1.07x
BenchmarkReconstructData5x2x1M-32         47532.85      49932.17      1.05x
BenchmarkReconstructData10x4x1M-32        50059.14      52323.15      1.05x
BenchmarkReconstructData50x20x1M-32       26679.75      26714.11      1.00x
BenchmarkReconstructData10x4x16M-32       24854.99      24527.23      0.99x
BenchmarkReconstructP10x2x10000-32        115089.87     113229.75     0.98x
BenchmarkReconstructP10x5x20000-32        129838.75     132871.10     1.02x
BenchmarkParallel_8x8x64K-32              69951.43      69980.44      1.00x
BenchmarkParallel_8x8x05M-32              11752.94      11724.35      1.00x
BenchmarkParallel_20x10x05M-32            18553.93      18613.33      1.00x
BenchmarkParallel_8x8x1M-32               11639.19      11746.86      1.01x
BenchmarkParallel_8x8x8M-32               11799.36      11685.63      0.99x
BenchmarkParallel_8x8x32M-32              11510.94      11791.72      1.02x
BenchmarkParallel_8x3x1M-32               20268.92      20678.21      1.02x
BenchmarkParallel_8x4x1M-32               17616.05      17856.17      1.01x
BenchmarkParallel_8x5x1M-32               15590.87      15872.42      1.02x
BenchmarkStreamEncode10x2x10000-32        14917.08      15408.39      1.03x
BenchmarkStreamEncode100x20x10000-32      2014.81       2077.31       1.03x
BenchmarkStreamEncode17x3x1M-32           11839.37      12434.80      1.05x
BenchmarkStreamEncode10x4x16M-32          9151.14       9206.98       1.01x
BenchmarkStreamEncode5x2x1M-32            13598.55      13663.56      1.00x
BenchmarkStreamEncode10x2x1M-32           13192.91      13453.41      1.02x
BenchmarkStreamEncode10x4x1M-32           12109.90      12050.68      1.00x
BenchmarkStreamEncode50x20x1M-32          8640.73       8370.10       0.97x
BenchmarkStreamEncode17x3x16M-32          10473.17      10527.04      1.01x
BenchmarkStreamVerify10x2x10000-32        7032.23       7128.82       1.01x
BenchmarkStreamVerify50x5x50000-32        13023.46      13109.31      1.01x
BenchmarkStreamVerify10x2x1M-32           11941.63      11949.91      1.00x
BenchmarkStreamVerify5x2x1M-32            8029.93       8263.39       1.03x
BenchmarkStreamVerify10x4x1M-32           8137.82       8271.11       1.02x
BenchmarkStreamVerify50x20x1M-32          7378.87       7708.81       1.04x
BenchmarkStreamVerify10x4x16M-32          8973.18       8955.29       1.00x
```
@klauspost klauspost merged commit 653e76a into master Nov 10, 2020
@klauspost klauspost deleted the simplify-matrix-copy branch November 10, 2020 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant