Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* [BEAM-13015, #21250] Optimize encoding to a ByteString This leverages the fact that all encoding is done from a thread safe manner allowing us to drop the syncrhonization that ByteString.Output adds and it also optimizes the max chunk size based upon performance measurements and the ratio for how full a byte[] should be for the final copy vs concatenate decision. Below are the results of several scenarios in which we compare the protobuf vs new solution: ``` Benchmark Mode Cnt Score Error Units ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewLargeWrites thrpt 25 1149267.797 ± 15366.677 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithReuse thrpt 25 832816.697 ± 4236.341 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewMixedWritesWithoutReuse thrpt 25 916629.194 ± 5669.323 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewSmallWrites thrpt 25 14175167.566 ± 88540.030 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamFewTinyWrites thrpt 25 22471597.238 ± 186098.311 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyLargeWrites thrpt 25 610.218 ± 5.019 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithReuse thrpt 25 484.413 ± 35.194 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyMixedWritesWithoutReuse thrpt 25 559.983 ± 6.228 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManySmallWrites thrpt 25 10969.839 ± 88.199 ops/s ByteStringOutputStreamBenchmark.testProtobufByteStringOutputStreamManyTinyWrites thrpt 25 40822.925 ± 191.402 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewLargeWrites thrpt 25 1167673.532 ± 9747.507 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithReuse thrpt 25 1576528.242 ± 15883.083 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewMixedWritesWithoutReuse thrpt 25 1009766.655 ± 8700.273 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewSmallWrites thrpt 25 33293140.679 ± 233693.771 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamFewTinyWrites thrpt 25 86841328.763 ± 729741.769 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyLargeWrites thrpt 25 1058.150 ± 15.192 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithReuse thrpt 25 937.249 ± 9.264 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyMixedWritesWithoutReuse thrpt 25 959.671 ± 13.989 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManySmallWrites thrpt 25 12601.065 ± 92.375 ops/s ByteStringOutputStreamBenchmark.testSdkCoreByteStringOutputStreamManyTinyWrites thrpt 25 65277.229 ± 3795.676 ops/s ``` The copy vs concatenate numbers come from these results which show that 256k seems to be a pretty good chunk size since the larger chunks seem to cost more per byte to allocate. They also show at what threshold should we currently copy the bytes vs concatenate a partially full buffer and allocate a new one: ``` Benchmark newSize copyVsNew Mode Cnt Score Error Units ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 512/1024 thrpt 25 19744209.563 ± 148287.185 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 640/1024 thrpt 25 15738981.338 ± 103684.000 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 768/1024 thrpt 25 12778194.652 ± 202212.679 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 896/1024 thrpt 25 11053602.109 ± 103120.446 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 4096/8192 thrpt 25 2961435.128 ± 25895.802 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 5120/8192 thrpt 25 2498594.030 ± 26051.674 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 6144/8192 thrpt 25 2173161.031 ± 20014.569 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 7168/8192 thrpt 25 1917545.913 ± 21470.719 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 20480/65536 thrpt 25 537872.049 ± 5525.024 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 24576/65536 thrpt 25 371312.042 ± 4450.715 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 28672/65536 thrpt 25 306027.442 ± 2830.503 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 32768/65536 thrpt 25 263933.096 ± 1833.603 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 131072/262144 thrpt 25 80224.558 ± 1192.994 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 163840/262144 thrpt 25 65311.283 ± 775.920 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 196608/262144 thrpt 25 54510.877 ± 797.775 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 229376/262144 thrpt 25 46808.185 ± 515.039 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 524288/1048576 thrpt 25 17729.937 ± 301.199 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 655360/1048576 thrpt 25 12996.953 ± 228.552 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 786432/1048576 thrpt 25 11383.122 ± 384.086 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testCopyArray N/A 917504/1048576 thrpt 25 9915.318 ± 285.995 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray 1024 N/A thrpt 25 10023631.563 ± 61317.055 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray 8192 N/A thrpt 25 2109120.041 ± 17482.682 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray 65536 N/A thrpt 25 318492.630 ± 3006.827 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray 262144 N/A thrpt 25 79228.892 ± 725.230 ops/s ByteStringOutputStreamBenchmark.NewVsCopy.testNewArray 1048576 N/A thrpt 25 13089.221 ± 73.535 ops/s ``` The difference is minor in the `ProcessBundleBenchmark` as there is not enough data being passed around for it to make a major difference: ``` Before Benchmark Mode Cnt Score Error Units ProcessBundleBenchmark.testLargeBundle thrpt 25 1156.159 ± 9.001 ops/s ProcessBundleBenchmark.testTinyBundle thrpt 25 29641.444 ± 125.041 ops/s After Benchmark Mode Cnt Score Error Units ProcessBundleBenchmark.testLargeBundle thrpt 25 1168.977 ± 25.848 ops/s ProcessBundleBenchmark.testTinyBundle thrpt 25 29664.783 ± 99.791 ops/s ``` * fixup comment and address analyzeClassDependencies failures
- Loading branch information