-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster AVX2 encoding #153
Merged
Merged
Faster AVX2 encoding #153
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Owner
klauspost
commented
Nov 8, 2020
- Remove 50% of bounds checks when copying.
- Use RIP only addressing, free one register.
* Remove 50% of bounds checks when copying. * Use RIP only addressing, free one register. ``` benchmark old MB/s new MB/s speedup BenchmarkGalois128K-32 57663.49 58005.87 1.01x BenchmarkGalois1M-32 49479.31 49848.29 1.01x BenchmarkGaloisXor128K-32 46310.69 46501.88 1.00x BenchmarkGaloisXor1M-32 43804.86 43984.39 1.00x BenchmarkEncode10x2x10000-32 25926.93 27457.75 1.06x BenchmarkEncode100x20x10000-32 2635.82 2818.95 1.07x BenchmarkEncode17x3x1M-32 63215.11 61576.76 0.97x BenchmarkEncode10x4x16M-32 19551.54 19505.07 1.00x BenchmarkEncode5x2x1M-32 79612.06 81985.14 1.03x BenchmarkEncode10x2x1M-32 121478.29 127739.41 1.05x BenchmarkEncode10x4x1M-32 70757.61 74423.67 1.05x BenchmarkEncode50x20x1M-32 19811.96 20103.32 1.01x BenchmarkEncode17x3x16M-32 27202.10 27825.34 1.02x BenchmarkEncode_8x4x8M-32 19029.04 19701.31 1.04x BenchmarkEncode_12x4x12M-32 22449.87 22480.51 1.00x BenchmarkEncode_16x4x16M-32 24536.74 24672.24 1.01x BenchmarkEncode_16x4x32M-32 24381.34 24981.99 1.02x BenchmarkEncode_16x4x64M-32 24717.69 25086.94 1.01x BenchmarkEncode_8x5x8M-32 16763.51 17154.04 1.02x BenchmarkEncode_8x6x8M-32 15067.22 15205.87 1.01x BenchmarkEncode_8x7x8M-32 13156.38 13589.40 1.03x BenchmarkEncode_8x9x8M-32 11363.74 11523.70 1.01x BenchmarkEncode_8x10x8M-32 10359.37 10474.91 1.01x BenchmarkEncode_8x11x8M-32 9627.07 9463.24 0.98x BenchmarkEncode_8x8x05M-32 30104.80 32634.89 1.08x BenchmarkEncode_8x8x1M-32 36497.28 36425.88 1.00x BenchmarkEncode_8x8x8M-32 12186.19 11602.41 0.95x BenchmarkEncode_8x8x32M-32 11670.72 11413.71 0.98x BenchmarkEncode_24x8x24M-32 21709.83 21652.50 1.00x BenchmarkEncode_24x8x48M-32 22494.40 22280.59 0.99x BenchmarkVerify10x2x10000-32 10567.56 10483.91 0.99x BenchmarkVerify50x5x50000-32 28102.84 27923.63 0.99x BenchmarkVerify10x2x1M-32 30298.33 30106.18 0.99x BenchmarkVerify5x2x1M-32 16115.91 15847.03 0.98x BenchmarkVerify10x4x1M-32 15382.13 14852.68 0.97x BenchmarkVerify50x20x1M-32 8476.02 8466.24 1.00x BenchmarkVerify10x4x16M-32 15101.03 15434.71 1.02x BenchmarkReconstruct10x2x10000-32 26228.18 26960.19 1.03x BenchmarkReconstruct50x5x50000-32 31091.42 30975.82 1.00x BenchmarkReconstruct10x2x1M-32 58548.87 60281.92 1.03x BenchmarkReconstruct5x2x1M-32 39499.23 41791.80 1.06x BenchmarkReconstruct10x4x1M-32 41448.60 43053.15 1.04x BenchmarkReconstruct50x20x1M-32 17185.99 17354.67 1.01x BenchmarkReconstruct10x4x16M-32 18798.60 18847.43 1.00x BenchmarkReconstructData10x2x10000-32 27208.48 27538.38 1.01x BenchmarkReconstructData50x5x50000-32 32135.65 32078.91 1.00x BenchmarkReconstructData10x2x1M-32 63180.19 67332.17 1.07x BenchmarkReconstructData5x2x1M-32 47532.85 49932.17 1.05x BenchmarkReconstructData10x4x1M-32 50059.14 52323.15 1.05x BenchmarkReconstructData50x20x1M-32 26679.75 26714.11 1.00x BenchmarkReconstructData10x4x16M-32 24854.99 24527.23 0.99x BenchmarkReconstructP10x2x10000-32 115089.87 113229.75 0.98x BenchmarkReconstructP10x5x20000-32 129838.75 132871.10 1.02x BenchmarkParallel_8x8x64K-32 69951.43 69980.44 1.00x BenchmarkParallel_8x8x05M-32 11752.94 11724.35 1.00x BenchmarkParallel_20x10x05M-32 18553.93 18613.33 1.00x BenchmarkParallel_8x8x1M-32 11639.19 11746.86 1.01x BenchmarkParallel_8x8x8M-32 11799.36 11685.63 0.99x BenchmarkParallel_8x8x32M-32 11510.94 11791.72 1.02x BenchmarkParallel_8x3x1M-32 20268.92 20678.21 1.02x BenchmarkParallel_8x4x1M-32 17616.05 17856.17 1.01x BenchmarkParallel_8x5x1M-32 15590.87 15872.42 1.02x BenchmarkStreamEncode10x2x10000-32 14917.08 15408.39 1.03x BenchmarkStreamEncode100x20x10000-32 2014.81 2077.31 1.03x BenchmarkStreamEncode17x3x1M-32 11839.37 12434.80 1.05x BenchmarkStreamEncode10x4x16M-32 9151.14 9206.98 1.01x BenchmarkStreamEncode5x2x1M-32 13598.55 13663.56 1.00x BenchmarkStreamEncode10x2x1M-32 13192.91 13453.41 1.02x BenchmarkStreamEncode10x4x1M-32 12109.90 12050.68 1.00x BenchmarkStreamEncode50x20x1M-32 8640.73 8370.10 0.97x BenchmarkStreamEncode17x3x16M-32 10473.17 10527.04 1.01x BenchmarkStreamVerify10x2x10000-32 7032.23 7128.82 1.01x BenchmarkStreamVerify50x5x50000-32 13023.46 13109.31 1.01x BenchmarkStreamVerify10x2x1M-32 11941.63 11949.91 1.00x BenchmarkStreamVerify5x2x1M-32 8029.93 8263.39 1.03x BenchmarkStreamVerify10x4x1M-32 8137.82 8271.11 1.02x BenchmarkStreamVerify50x20x1M-32 7378.87 7708.81 1.04x BenchmarkStreamVerify10x4x16M-32 8973.18 8955.29 1.00x ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.