Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate benchmark changes between 1.16.7 and 1.17 #2116

Closed
mstoykov opened this issue Aug 19, 2021 · 3 comments
Closed

Investigate benchmark changes between 1.16.7 and 1.17 #2116

mstoykov opened this issue Aug 19, 2021 · 3 comments
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 help wanted performance
Milestone

Comments

@mstoykov
Copy link
Contributor

As usual when the final go version is release I run all the benchmarks and in this case (as expected due to compiler changes), unfortunately in some cases not in the direction we would want

Gist with the results and a benchstat comparison.
Notes:

  • both benches were run on my laptop during the night/evening but there is still a chance that something interfered with the tests, although some things are consistent while others are drastically (and somewhat) consistently changed in one or the other direction.
  • there is every possibility one or more of the benchmarks aren't well written and the new version just is more sensitive to something. By all that I meant that maybe the benchmark is at fault not the code that it's benchmarking
  • the "Proxy" tests are about common.BInd which now is in a lot of places slower in some case by a lot. But I would argue we care less about that as we are moving away from it and also the differences are still in ns so 🤷
  • The ExecutionSegmentScale is strange - some prefilled are a lot better some are worse (the worse case (usually and in this benchmarks) by 13x+) - this is worrying as this is used a lot, but probably not a deal breaker as it's mostly used during initialization and most of the cases are still pretty fast
  • On the one hand Cal (the calculation for ramping arrival rate) has improved on the other ... the RampingArrivalRateRun's iterations are down by quite a lot - which is likely the bigger problem so far.
  • VUHandleIterations are up 🤷
  • The marshaling seems slightly better, MetricMarshalGzipAll is the thing that is mostly close to what happens inside the actual code, so the others are mostly for reference if they become a lot better someday, although maybe we should remove the benchmarks as I doubt that will change before we move away from json.

Given the above I think we need to first confirm the current results and see whether we can improve the performance of the problematic benchmarks before releasing with go1.17, so I am against this being done for 0.34.0

@mstoykov mstoykov added help wanted performance evaluation needed proposal needs to be validated or tested before fully implementing it in k6 labels Aug 19, 2021
@na-- na-- added this to the v0.35.0 milestone Aug 24, 2021
@vearutop
Copy link

vearutop commented Aug 27, 2021

For additional context, here is benchstat at arm64 which supposedly does not have new regabi.

ARM64 result
name                                                                                                      1.16.7 time/op        1.17 time/op     delta
pkg:go.k6.io/k6/js goos:linux goarch:arm64
EmptyIteration-4                                                                                                1.65µs ± 2%        1.63µs ± 2%     ~     (p=0.257 n=9+9)
pkg:go.k6.io/k6/js/common goos:linux goarch:arm64
Proxy/Value/Fields/ToValue-4                                                                                    2.02µs ± 1%        2.11µs ± 1%   +4.89%  (p=0.000 n=10+9)
Proxy/Value/Fields/Bridge-4                                                                                      839ns ± 0%         926ns ± 0%  +10.35%  (p=0.000 n=10+10)
Proxy/Value/Methods/ToValue-4                                                                                   2.03µs ± 0%        2.13µs ± 0%   +4.94%  (p=0.000 n=10+8)
Proxy/Value/Methods/Bridge-4                                                                                     967ns ± 3%        1034ns ± 1%   +7.00%  (p=0.000 n=10+10)
Proxy/Value/Methods/Call-4                                                                                       183ns ± 1%         311ns ± 1%  +69.49%  (p=0.000 n=10+10)
Proxy/Value/Error/ToValue-4                                                                                     2.04µs ± 0%        2.13µs ± 1%   +4.58%  (p=0.000 n=9+10)
Proxy/Value/Error/Bridge-4                                                                                      1.02µs ± 1%        1.10µs ± 1%   +7.87%  (p=0.000 n=10+10)
Proxy/Value/Add/ToValue-4                                                                                       2.04µs ± 0%        2.14µs ± 0%   +4.92%  (p=0.000 n=10+9)
Proxy/Value/Add/Bridge-4                                                                                        1.11µs ± 1%        1.22µs ± 1%   +9.99%  (p=0.000 n=10+10)
Proxy/Value/Add/Call-4                                                                                           197ns ± 0%         356ns ± 1%  +80.25%  (p=0.000 n=10+10)
Proxy/Value/AddError/ToValue-4                                                                                  2.04µs ± 0%        2.14µs ± 0%   +4.63%  (p=0.000 n=10+8)
Proxy/Value/AddError/Bridge-4                                                                                   1.03µs ± 0%        1.08µs ± 0%   +5.51%  (p=0.000 n=10+10)
Proxy/Value/Context/ToValue-4                                                                                   2.05µs ± 0%        2.14µs ± 0%   +4.54%  (p=0.000 n=10+10)
Proxy/Value/Context/Bridge-4                                                                                     875ns ± 2%         923ns ± 1%   +5.51%  (p=0.000 n=10+10)
Proxy/Value/ContextAdd/ToValue-4                                                                                2.05µs ± 0%        2.14µs ± 1%   +4.82%  (p=0.000 n=10+10)
Proxy/Value/ContextAdd/Bridge-4                                                                                 1.01µs ± 1%        1.06µs ± 0%   +4.86%  (p=0.000 n=10+10)
Proxy/Value/ContextAddError/ToValue-4                                                                           2.05µs ± 0%        2.14µs ± 0%   +4.35%  (p=0.000 n=8+10)
Proxy/Value/ContextAddError/Bridge-4                                                                            1.07µs ± 0%        1.12µs ± 0%   +4.23%  (p=0.000 n=8+10)
Proxy/Value/Sum/ToValue-4                                                                                       2.05µs ± 0%        2.14µs ± 0%   +4.53%  (p=0.000 n=10+10)
Proxy/Value/Sum/Bridge-4                                                                                        1.10µs ± 1%        1.20µs ± 1%   +9.04%  (p=0.000 n=10+10)
Proxy/Value/Sum/Call-4                                                                                           240ns ± 0%         381ns ± 0%  +58.56%  (p=0.000 n=9+9)
Proxy/Value/SumContext/ToValue-4                                                                                2.05µs ± 0%        2.15µs ± 0%   +4.62%  (p=0.000 n=9+10)
Proxy/Value/SumContext/Bridge-4                                                                                 1.00µs ± 0%        1.05µs ± 1%   +4.27%  (p=0.000 n=8+10)
Proxy/Value/SumError/ToValue-4                                                                                  2.04µs ± 0%        2.14µs ± 0%   +4.86%  (p=0.000 n=10+10)
Proxy/Value/SumError/Bridge-4                                                                                   1.02µs ± 1%        1.06µs ± 1%   +4.19%  (p=0.000 n=10+10)
Proxy/Value/SumContextError/ToValue-4                                                                           2.05µs ± 0%        2.14µs ± 1%   +4.41%  (p=0.000 n=10+10)
Proxy/Value/SumContextError/Bridge-4                                                                            1.03µs ± 0%        1.09µs ± 0%   +5.75%  (p=0.000 n=10+9)
Proxy/Value/Constructor/ToValue-4                                                                               2.05µs ± 1%        2.14µs ± 0%   +4.25%  (p=0.000 n=10+10)
Proxy/Value/Constructor/Bridge-4                                                                                3.68µs ± 0%        3.90µs ± 0%   +5.80%  (p=0.000 n=9+10)
Proxy/Value/Constructor/Call-4                                                                                  4.55µs ± 0%        4.89µs ± 0%   +7.39%  (p=0.000 n=9+8)
Proxy/Pointer/Fields/ToValue-4                                                                                  2.02µs ± 0%        2.17µs ± 1%   +7.30%  (p=0.000 n=10+10)
Proxy/Pointer/Fields/Bridge-4                                                                                    936ns ± 0%        1038ns ± 0%  +10.85%  (p=0.000 n=10+9)
Proxy/Pointer/Methods/ToValue-4                                                                                 2.06µs ± 0%        2.19µs ± 3%   +6.20%  (p=0.000 n=8+8)
Proxy/Pointer/Methods/Bridge-4                                                                                  1.75µs ± 1%        1.94µs ± 2%  +10.90%  (p=0.000 n=10+10)
Proxy/Pointer/Methods/Call-4                                                                                     183ns ± 0%         313ns ± 1%  +70.72%  (p=0.000 n=10+10)
Proxy/Pointer/Error/ToValue-4                                                                                   2.05µs ± 0%        2.15µs ± 1%   +4.81%  (p=0.000 n=10+10)
Proxy/Pointer/Error/Bridge-4                                                                                    1.06µs ± 2%        1.14µs ± 1%   +7.89%  (p=0.000 n=10+10)
Proxy/Pointer/Add/ToValue-4                                                                                     2.06µs ± 0%        2.15µs ± 0%   +4.32%  (p=0.000 n=10+10)
Proxy/Pointer/Add/Bridge-4                                                                                      1.14µs ± 1%        1.26µs ± 1%  +10.20%  (p=0.000 n=10+10)
Proxy/Pointer/Add/Call-4                                                                                         198ns ± 0%         356ns ± 1%  +79.71%  (p=0.000 n=10+10)
Proxy/Pointer/AddError/ToValue-4                                                                                2.08µs ± 2%        2.15µs ± 0%   +3.44%  (p=0.000 n=10+9)
Proxy/Pointer/AddError/Bridge-4                                                                                 1.06µs ± 1%        1.11µs ± 1%   +4.59%  (p=0.000 n=10+10)
Proxy/Pointer/Context/ToValue-4                                                                                 2.11µs ± 1%        2.15µs ± 1%   +2.09%  (p=0.000 n=10+10)
Proxy/Pointer/Context/Bridge-4                                                                                  1.02µs ±15%        0.95µs ± 0%     ~     (p=1.000 n=10+9)
Proxy/Pointer/ContextAdd/ToValue-4                                                                              2.62µs ± 4%        2.15µs ± 1%  -17.98%  (p=0.000 n=8+10)
Proxy/Pointer/ContextAdd/Bridge-4                                                                               1.05µs ± 1%        1.08µs ± 1%   +3.13%  (p=0.000 n=8+10)
Proxy/Pointer/ContextAddError/ToValue-4                                                                         2.26µs ±21%        2.15µs ± 0%     ~     (p=0.470 n=10+10)
Proxy/Pointer/ContextAddError/Bridge-4                                                                          1.08µs ± 0%        1.14µs ± 0%   +5.32%  (p=0.000 n=10+9)
Proxy/Pointer/Sum/ToValue-4                                                                                     2.06µs ± 0%        2.16µs ± 1%   +4.75%  (p=0.000 n=10+9)
Proxy/Pointer/Sum/Bridge-4                                                                                      1.12µs ± 1%        1.23µs ± 1%   +9.15%  (p=0.000 n=9+10)
Proxy/Pointer/Sum/Call-4                                                                                         239ns ± 0%         382ns ± 1%  +59.49%  (p=0.001 n=6+8)
pkg:go.k6.io/k6/js/modules/k6/http goos:linux goarch:arm64
HandlingOfResponseBodies/text-4                                                                                 6.08ms ± 2%        6.24ms ± 2%   +2.56%  (p=0.000 n=10+10)
HandlingOfResponseBodies/binary-4                                                                               6.16ms ± 1%        6.31ms ± 1%   +2.39%  (p=0.000 n=10+10)
HandlingOfResponseBodies/none-4                                                                                 5.87ms ± 3%        6.03ms ± 2%   +2.58%  (p=0.001 n=9+10)
pkg:go.k6.io/k6/lib goos:linux goarch:arm64
GetStripedOffsets/length10,seed777-4                                                                            10.5µs ±20%        11.4µs ±50%     ~     (p=0.905 n=9+10)
GetStripedOffsets/length100,seed777-4                                                                            567µs ±16%         554µs ±15%     ~     (p=0.853 n=10+10)
GetStripedOffsetsEven/length10-4                                                                                2.49µs ± 1%        2.48µs ± 1%   -0.50%  (p=0.017 n=10+10)
GetStripedOffsetsEven/length100-4                                                                               17.0µs ± 0%        17.1µs ± 0%   +0.33%  (p=0.001 n=7+9)
GetStripedOffsetsEven/length1000-4                                                                               925µs ± 0%         930µs ± 0%   +0.47%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5)-4                                                          3.53ns ± 0%        3.54ns ± 0%   +0.52%  (p=0.000 n=9+8)
ExecutionSegmentScale/seq:;segment:/et.Scale(5)-4                                                               1.62µs ± 0%        1.60µs ± 0%   -1.15%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5)_prefilled-4                                                     3.60ns ± 1%        0.84ns ± 0%  -76.81%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5523)-4                                                       3.53ns ± 0%        3.52ns ± 1%   -0.39%  (p=0.012 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5523)-4                                                            1.62µs ± 0%        1.60µs ± 0%   -1.49%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:/et.Scale(5523)_prefilled-4                                                  3.60ns ± 1%        0.84ns ± 0%  -76.80%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:/segment.Scale(5000000)-4                                                    3.53ns ± 0%        3.55ns ± 0%   +0.37%  (p=0.001 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)-4                                                         1.62µs ± 0%        1.60µs ± 0%   -1.06%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:/et.Scale(5000000)_prefilled-4                                               3.60ns ± 1%        0.84ns ± 0%  -76.80%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:;segment:/segment.Scale(67280421310721)-4                                             3.53ns ± 0%        3.55ns ± 1%   +0.57%  (p=0.001 n=9+9)
ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)-4                                                  1.62µs ± 0%        1.60µs ± 0%   -1.26%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:/et.Scale(67280421310721)_prefilled-4                                        3.62ns ± 1%        0.84ns ± 0%  -76.89%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5)-4                                                        992ns ± 0%        1015ns ± 1%   +2.34%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)-4                                                            1.75µs ± 0%        1.74µs ± 0%   -0.48%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5)_prefilled-4                                                  3.57ns ± 1%        0.84ns ± 0%  -76.59%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5523)-4                                                     994ns ± 1%        1017ns ± 1%   +2.34%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)-4                                                         1.75µs ± 0%        1.74µs ± 0%   -0.58%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5523)_prefilled-4                                               3.61ns ± 1%        0.84ns ± 0%  -76.84%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(5000000)-4                                                 1.00µs ± 0%        1.02µs ± 1%   +2.09%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)-4                                                      1.75µs ± 0%        1.74µs ± 0%   -0.51%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(5000000)_prefilled-4                                            3.60ns ± 1%        0.84ns ± 0%  -76.76%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/segment.Scale(67280421310721)-4                                           994ns ± 0%        1014ns ± 0%   +2.07%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)-4                                               1.75µs ± 0%        1.74µs ± 0%   -0.78%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:;segment:0:1/et.Scale(67280421310721)_prefilled-4                                     3.59ns ± 1%        0.84ns ± 0%  -76.71%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5)-4                          1.30µs ± 0%        1.33µs ± 0%   +1.89%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)-4                               2.09µs ± 0%        2.08µs ± 0%     ~     (p=0.147 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5)_prefilled-4                     8.77ns ± 0%        6.08ns ± 0%  -30.69%  (p=0.000 n=8+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5523)-4                       1.19µs ± 0%        1.21µs ± 0%   +1.49%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)-4                            2.09µs ± 0%        2.08µs ± 0%   -0.31%  (p=0.023 n=9+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5523)_prefilled-4                  8.27ns ± 0%        5.37ns ± 0%  -35.02%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(5000000)-4                    1.12µs ± 0%        1.14µs ± 0%   +1.74%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)-4                         2.09µs ± 0%        2.08µs ± 1%   -0.38%  (p=0.012 n=10+8)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(5000000)_prefilled-4               6.89ns ± 0%        3.58ns ± 0%  -48.09%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/segment.Scale(67280421310721)-4             1.14µs ± 0%        1.16µs ± 0%   +1.57%  (p=0.000 n=8+10)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)-4                  2.10µs ± 0%        2.09µs ± 0%   -0.55%  (p=0.000 n=10+9)
ExecutionSegmentScale/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/et.Scale(67280421310721)_prefilled-4        16.0ns ± 0%        10.7ns ± 0%  -33.26%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5)-4              1.26µs ± 0%        1.27µs ± 0%   +1.04%  (p=0.000 n=10+8)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)-4                   2.33µs ± 0%        2.28µs ± 0%   -1.89%  (p=0.000 n=10+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/et.Scale(5)_prefilled-4         7.56ns ± 0%        4.40ns ± 0%  -41.74%  (p=0.000 n=9+10)
ExecutionSegmentScale/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/segment.Scale(5523)-4           1.13µs ± 1%        1.15µs ± 0%   +1.41%  (p=0.009 n=3+9)
pkg:go.k6.io/k6/lib/executor goos:linux goarch:arm64
Cal/1s-4                                                                                                        2.61µs ± 2%        2.62µs ± 1%     ~     (p=0.591 n=10+10)
Cal/1m0s-4                                                                                                       138µs ± 3%         138µs ± 0%     ~     (p=0.278 n=9+10)
CalRat/1s-4                                                                                                     20.0ms ± 2%        20.1ms ± 3%     ~     (p=0.393 n=10+10)
CalRat/1m0s-4                                                                                                    11.1s ± 0%         11.1s ± 0%   +0.23%  (p=0.000 n=9+8)
RampingVUsGetRawExecutionSteps/seq:;segment:/normal-4                                                            145µs ± 2%         139µs ± 3%   -4.53%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:/rollercoaster-4                                                    1.60ms ± 1%        1.61ms ± 1%   +0.83%  (p=0.011 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:0:1/normal-4                                                         146µs ± 2%         138µs ± 2%   -5.85%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:0:1/rollercoaster-4                                                 1.45ms ± 2%        1.63ms ± 1%  +12.99%  (p=0.000 n=9+10)
RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/normal-4                           44.3µs ± 3%        41.7µs ± 4%   -5.87%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:0,0.3,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.3/rollercoaster-4                     484µs ± 2%         472µs ± 3%   -2.57%  (p=0.002 n=10+10)
RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/normal-4               15.9µs ± 1%        15.2µs ± 1%   -3.87%  (p=0.000 n=10+9)
RampingVUsGetRawExecutionSteps/seq:0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;segment:0:0.1/rollercoaster-4         162µs ± 1%         157µs ± 2%   -3.36%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/normal-4                                                    59.4µs ± 1%        56.7µs ± 3%   -4.65%  (p=0.000 n=9+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2/5:4/5/rollercoaster-4                                              634µs ± 1%         613µs ± 2%   -3.25%  (p=0.000 n=9+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/normal-4                                              79.9µs ± 1%        76.8µs ± 1%   -3.94%  (p=0.000 n=10+10)
RampingVUsGetRawExecutionSteps/seq:;segment:2235/5213:4/5/rollercoaster-4                                        634µs ± 1%         615µs ± 3%   -3.04%  (p=0.000 n=9+10)
VUHandleIterations-4                                                                                             1.00s ± 0%         1.00s ± 0%     ~     (p=0.971 n=10+10)
pkg:go.k6.io/k6/lib/netext/httpext goos:linux goarch:arm64
WrapDecompressionError-4                                                                                        81.9ns ± 0%        76.8ns ± 0%   -6.15%  (p=0.000 n=10+10)
MeasureAndEmitMetrics/no_responseCallback-4                                                                     1.29µs ± 0%        1.33µs ± 0%   +2.65%  (p=0.000 n=8+10)
MeasureAndEmitMetrics/responseCallback-4                                                                        1.34µs ± 0%        1.39µs ± 0%   +3.28%  (p=0.000 n=10+9)
pkg:go.k6.io/k6/output/cloud goos:linux goarch:arm64
AggregateHTTP/tags:1-4                                                                                           253ms ± 1%         257ms ± 1%   +1.70%  (p=0.000 n=9+9)
AggregateHTTP/tags:5-4                                                                                           294ms ± 4%         293ms ± 4%     ~     (p=0.853 n=10+10)
AggregateHTTP/tags:35-4                                                                                          332ms ± 8%         341ms ± 9%     ~     (p=0.095 n=10+9)
AggregateHTTP/tags:315-4                                                                                         473ms ± 6%         490ms ±22%     ~     (p=0.720 n=9+10)
AggregateHTTP/tags:3645-4                                                                                        839ms ± 6%         834ms ± 2%     ~     (p=0.796 n=9+9)
MetricMarshal/10000-4                                                                                           19.9ms ± 1%        21.1ms ± 1%   +6.39%  (p=0.000 n=10+9)
MetricMarshal/100000-4                                                                                           200ms ± 1%         213ms ± 1%   +6.83%  (p=0.000 n=10+10)
MetricMarshal/500000-4                                                                                           1.01s ± 2%         1.10s ± 8%   +8.08%  (p=0.000 n=9+10)
MetricMarshalWriter/10000-4                                                                                     19.2ms ± 1%        21.3ms ± 1%  +10.89%  (p=0.000 n=10+10)
MetricMarshalWriter/100000-4                                                                                     195ms ± 1%         215ms ± 1%  +10.58%  (p=0.000 n=10+10)
MetricMarshalWriter/500000-4                                                                                     962ms ± 2%        1084ms ± 1%  +12.65%  (p=0.000 n=8+9)
pkg:go.k6.io/k6/output/influxdb goos:linux goarch:arm64
Influxdb1Second-4                                                                                                2.00s ± 0%         2.00s ± 0%     ~     (p=0.912 n=10+10)
Influxdb2Second-4                                                                                                4.00s ± 0%         4.00s ± 0%     ~     (p=0.353 n=10+10)
Influxdb100Milliseconds-4                                                                                        364µs ± 0%         366µs ± 0%   +0.37%  (p=0.015 n=9+8)

name                                                                                                      old iterations/s   new iterations/s   delta
pkg:go.k6.io/k6/lib/executor goos:linux goarch:arm64
RampingArrivalRateRun/VUs10-4                                                                                     773k ± 2%          653k ± 1%  -15.62%  (p=0.000 n=9+9)
RampingArrivalRateRun/VUs100-4                                                                                    885k ± 2%          787k ± 1%  -11.10%  (p=0.000 n=10+9)
RampingArrivalRateRun/VUs1000-4                                                                                   869k ± 1%          788k ± 2%   -9.28%  (p=0.000 n=9+10)
RampingArrivalRateRun/VUs10000-4                                                                                  808k ± 1%          766k ± 4%   -5.19%  (p=0.000 n=10+10)

name                                                                                                      old iterations/ns  new iterations/ns  delta
pkg:go.k6.io/k6/lib/executor goos:linux goarch:arm64
VUHandleIterations-4                                                                                              0.03 ± 0%          0.04 ± 0%  +33.00%  (p=0.000 n=10+10)

name                                                                                                      old alloc/op       new alloc/op       delta
pkg:go.k6.io/k6/lib/netext/httpext goos:linux goarch:arm64
WrapDecompressionError-4                                                                                         0.00B              0.00B          ~     (all equal)

name                                                                                                      old allocs/op      new allocs/op      delta
pkg:go.k6.io/k6/lib/netext/httpext goos:linux goarch:arm64
WrapDecompressionError-4                                                                                          0.00               0.00          ~     (all equal)

name                                                                                                      old speed          new speed          delta
pkg:go.k6.io/k6/output/cloud goos:linux goarch:arm64
MetricMarshal/10000-4                                                                                          208MB/s ± 1%       195MB/s ± 1%   -6.11%  (p=0.000 n=10+10)
MetricMarshal/100000-4                                                                                         207MB/s ± 1%       194MB/s ± 1%   -6.40%  (p=0.000 n=10+10)
MetricMarshal/500000-4                                                                                         204MB/s ± 3%       189MB/s ± 8%   -7.35%  (p=0.000 n=9+10)
MetricMarshalWriter/10000-4                                                                                    216MB/s ± 1%       194MB/s ± 1%   -9.82%  (p=0.000 n=10+10)
MetricMarshalWriter/100000-4                                                                                   212MB/s ± 1%       192MB/s ± 1%   -9.56%  (p=0.000 n=10+10)
MetricMarshalWriter/500000-4                                                                                   215MB/s ± 2%       191MB/s ± 1%  -11.24%  (p=0.000 n=8+9)

@mstoykov
Copy link
Contributor Author

I spent sometime on friday and over the weekend to bisect golang and:

for RampingArrivalRate

in the real example this seem to not matter at all or at least I can't find it mattering when I run like 10 runs which do have some variation, but in general no trend in 1.17 being slower(or faster for that matter)

for ExecutionSegmentScale

  • adb467ffd2d82b796de12bdd8effa2cfefe01f29 - does 80% better for some benchmarks and 90% worse for others
  • //go:noinline before the function on 706 "fixes it" but also removes the majority of the gains and the differences are more like +/- 10%

I will try to make a simpler example reproducing both and making go issues so we might get better results either way :)

From looking at the ARM benchmarks (thanks @vearutop 🙇 ) which doesn't have the ABI changes, it looks like they also play a role in combination with the above changes.

@mstoykov
Copy link
Contributor Author

mstoykov commented Oct 5, 2021

After some more internal tests ExecutionSegmentScale seems to be a particularly unfortunate "optimization" in the benchmark code after it is now inlined. Some internal tests though show that code that relies on it in real example hasn't decreased in performance and even there is some sub 10% improvement which likely is because of something else going even faster or this not getting inlined or at least not getting "optimized" badly after that.

All in all this all seems like a red herring and just a case of the compiler will do some optimizations in benchmark/test code that it won't do in real one or/and that + other optimizations nullifying any regressions we actually see here.

So we can update to 1.17 hopefully without any problems 🤞

I will close this and will reopen it if we need to investigate after some 1.17 usage

@mstoykov mstoykov closed this as completed Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 help wanted performance
Projects
None yet
Development

No branches or pull requests

3 participants