-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing: add testing.B.Loop for iteration #61515
Comments
It's not clear to me which optimizations those would be. Certainly an unused return value should not be eliminated, but should function arguments be inlined? I have seen benchmarks that explicitly want to check inlining behavior, and also benchmarks that are accidentally-inlinable. I expect that as inlining improves (#61502), the problem of accidentally-inlined arguments will only get worse — but if the (I assume the user would have to create a closure, and then call that closure within the |
I think we would implicitly apply Keep to every function argument and result in the closure passed directly to Loop. I'm pretty sure that's what @rsc was thinking, too.
I definitely agree that this problem is only going to get worse. I'm not sure we need to inhibit inlining, but we need to inhibit optimizations that propagate information across this inlining boundary. That's why I think it works to think of this as applying an implicit Keep to every function argument and result in the closure, and to think of Keep as having a used and non-constant result. (Obviously that's not a very precise specification, though!)
Right. I think if the user specifically wants to benchmark the effect of constant propagation into an inlined function, they would add another layer of function call. We'd only apply the implicit Keep to the direct argument to Loop. That's fairly subtle, but I think such benchmarks are also rare. My other concern with the deoptimization aspect of this is what to do if Loop is called with something other than a function literal. We could say the implicit Keep only applies if it's called directly with a function literal, but that feels.. even weirder. 😅 It may be that |
When b.Loop inlines, |
This proposal has been added to the active column of the proposals project |
The only real question about this was the auto-Keep. All the other benefits listed in the top comment are clear wins. Given that Keep is now likely accept, it seems like this one can be likely accept too. |
Note that the current proposal is used as |
Based on the discussion above, this proposal seems like a likely accept. |
b.Loop has real benefits separate from Keep, so it's worth doing even if we still have questions about Keep. So this seems like it can move to accept. |
If #61405 is accepted it will be possible to range over integers. From the description:
So there might soon be a new, better alternative for benchmark iteration. Does that argue for delaying seeing whether another alternative is needed? |
@timothy-king I don't think that matters here. Range-over-integer is fairly minor syntactic sugar. The issues that |
No change in consensus, so accepted. 🎉 |
I usually wrap a benchmarked op into a It looks like passing a closure to the Loop function would eliminate that requirement as the benchmarked body would be inside the function that can't be inlined anyway (unless the compiler would handle Loop in some special way). (As opposed to being "inlined" inside the |
I mentioned in the original post that For time spent in setup, this effect isn't terrible because it's only amplified by the number of times it needs to retry the benchmark. It's still nice that This effect is much worse if the timer is stopped during the benchmark loop itself, such as to do some per-iteration setup or cleanup. If the benchmark spends 90% of its real time with the timer stopped, it'll run for 10 times longer than expected. It may be that Benchtime has a few goals. One is to minimize the measurement effect by amortizing it to almost zero. That happens pretty quickly; probably well under a millisecond. But also, if you're starting and stopping the timer inside the benchmark loop, you've already given up on that amortization. Another is to roughly capture things that happen over a longer time scale, like garbage collection, scheduling, and cache effects. But these are likely to be captured nearly as well by 1 second of real time as they are by 1 second of benchmark time. Given all of this, I think we could make |
Change https://go.dev/cl/608798 mentions this issue: |
Are you imagining some sort of limit on this? It is fine if the timer is stopped only for short periods, but if it is stopped for long periods then you might only run a single iteration. Maybe that is fine? Obviously stopping the timer for a long time every iteration isn't practical, but they might do so every M iterations. For example, I recently wrote a map iter+delete benchmark that looked something like this.
(The STW cost of StopTimer made this impractical) |
The proposal also mentions
I.e. not calling the top-level BenchmarkXXX functions multiple times, but just handling it in b.Loop runs. @JunyangShao Would you like to look into that? (Feel free to reopen this one, or open a new issue for tracking) |
Change https://go.dev/cl/627755 mentions this issue: |
testing.B.Loop now does its own loop scheduling without interaction with b.N. b.N will be updated to the actual iterations b.Loop controls when b.Loop returns false. This CL also added tests for fixed iteration count (benchtime=100x case). This CL also ensured that b.Loop() is inlined. For #61515 Change-Id: Ia15f4462f4830ef4ec51327520ff59910eb4bb58 Reviewed-on: https://go-review.googlesource.com/c/go/+/627755 Reviewed-by: Michael Pratt <[email protected]> Commit-Queue: Junyang Shao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
I'm a bit surprised that |
We definitely considered this, as you can see from the original proposal, making it an iterator doesn't have any obvious benefits because B.Loop doesn't yield anything, but it does have drawbacks in terms of complexity. As a simple function, it's much easier to implement such that the compiler can clearly inline B.Loop and reduce its overhead to almost nothing. As an iterator, it can probably still do that, but you'd be involving quite a bit more complexity. |
Change https://go.dev/cl/632655 mentions this issue: |
A "-N" suffix is left out when GOMAXPROCS is 1. Also match at least 1 space (\s+ instead of \s*), remove trailing '.*' (it's a no-op), and make the test error message style more consistent while here. For #61515. Fixes #70627. Change-Id: Id0a17478ac31e2934a663dd0d3b1b37f24974989 Cq-Include-Trybots: luci.golang.try:gotip-plan9-386 Reviewed-on: https://go-review.googlesource.com/c/go/+/632655 Reviewed-by: Junyang Shao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
Reopen to track documentation, example, and release notes. Thanks. |
The RC is planned for next week, and we need a full draft of the release notes before then. Please prioritize writing the release notes for this. Thanks! |
Change https://go.dev/cl/633536 mentions this issue: |
Change https://go.dev/cl/634115 mentions this issue: |
Change https://go.dev/cl/635896 mentions this issue: |
Change https://go.dev/cl/635897 mentions this issue: |
Change https://go.dev/cl/635898 mentions this issue: |
Change https://go.dev/cl/635895 mentions this issue: |
This updates the testing documentation to frame B.Loop as the canonical way to write benchmarks. We retain documentation on b.N benchmarks because people will definitely continue to see them (and write them), but it's demoted to clearly second class. This also attempts to clarify and refine the B.Loop documentation itself. Updates #61515 Fixes #70787 Change-Id: If5123435bfe3a5883a753119ecdf7bbc41afd499 Reviewed-on: https://go-review.googlesource.com/c/go/+/635895 Reviewed-by: Junyang Shao <[email protected]> Reviewed-by: Caleb Spare <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Austin Clements <[email protected]>
The current b.Loop example doesn't focus on the basic usage of b.Loop. Replace this with a new example that uses (slightly) more realistic things to demonstrate the most salient points of b.Loop. We also move the example into an example file so that we can write a real Benchmark function and a real function to be benchmarks, which makes this much closer to what a user would actually write. Updates #61515. Change-Id: I4d830b3bfe3eb3cd8cdecef469fea0541baebb43 Reviewed-on: https://go-review.googlesource.com/c/go/+/635896 Auto-Submit: Austin Clements <[email protected]> Reviewed-by: Junyang Shao <[email protected]> Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
This moves the B.Loop test from package testing_test to package testing, where it can check on more of the internals of the benchmark state. Updates #61515. Change-Id: Ia32d7104526125c5e8a1e35dab7660008afcbf80 Reviewed-on: https://go-review.googlesource.com/c/go/+/635897 Auto-Submit: Austin Clements <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Junyang Shao <[email protected]>
B.Loop resets the timer on the first iteration so that setup code isn't measured, but it currently leaves the timer running after the last iteration, meaning that cleanup code will still be measured. Fix this by stopping the timer when B.Loop returns false to indicate the end of the benchmark. Updates #61515 Change-Id: I0e0502cb2ce3c24cf872682b88d74e8be2c4529b Reviewed-on: https://go-review.googlesource.com/c/go/+/635898 Reviewed-by: Junyang Shao <[email protected]> Auto-Submit: Austin Clements <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]>
Update July 26, 2023: See this comment for the latest Loop API. The motivation and arguments for this proposal still otherwise apply in full, but the API has switched to the
for b.Loop() { ... }
form proposed in the Alternatives section.Currently, Go benchmarks are required to repeat the body of the benchmark
(*testing.B).N
times. This approach minimizes measurement overhead, but it’s error-prone and has many limitations:As we discovered in cmd/vet: flag benchmarks that don’t use b #38677, it’s surprisingly common for benchmarks to simply forget to use
b.N
.While a vet check can pretty reliably detect forgotten uses of
b.N
, there’s some evidence that many benchmarks useb.N
incorrectly, such as using it to size the input to an algorithm, rather than as an iteration count.Because the benchmark framework doesn’t know when the b.N loop starts, if a benchmark has any non-trivial setup, it’s important for it to use
(*testing.B).ResetTimer
. It’s generally not clear what counts as non-trivial setup, and very hard to detect whenResetTimer
is necessary.Proposal
I propose that we add the following method to
testing.B
and encourage its use overb.N
:This API has several advantages over
b.N
loops:It cannot be misused for something other than an iteration count. It’s still possible for a benchmark to forget entirely to use
b.Loop
, but that can be detected reliably by vet.The benchmarking framework can record time and other metrics around only the benchmarked operation, so benchmarks no longer need to use
ResetTimer
or be careful about their setup.Iteration ramp-up can be done entirely within
b.Loop
, which means that benchmark setup beforeb.Loop
will happen once and only once, rather than at each ramp-up step. For benchmarks with non-trivial setup, this saves a lot of time. Notably, benchmarks with expensive setup can run for far longer than the specified-benchtime
because of the large number of ramp-up steps (setup time is not counted toward the-benchtime
threshold). It’s also less error-prone than using a globalsync.Once
to reduce setup cost, which can have side effects on GC timing and other benchmarks if the computed results are large.As suggested by @rsc,
b.Loop
could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.In the long term, we could collect distributions rather than just averages for benchmark metrics, which would enable deeper insights into benchmark results and far more powerful statistical methods, such as stationarity tests. The way this would work is that
b.Loop
would perform iteration ramp-up only to the point where it can amortize its measurement overhead (ramping up to, say, 1ms), and then repeat this short measurement loop many times until the total time reaches the specified-benchtime
. For short benchmarks, this could easily gather 1,000 samples, rather than just a mean.Alternatives
This proposal is complementary to
testing.Keep
(#61179). It’s an alternative totesting.B.Iterate
(originally in #48768, with discussion now merged into #61179), which essentially combines Keep and Loop. I believe Iterate could have all of the same benefits as Loop, but it’s much clearer how to make Loop low-overhead. If Loop implicitly inhibits compiler optimizations in the body of its callback, then it has similar deoptimization benefits as Iterate. I would argue that Loop has a shallower learning curve than Iterate, though probably once users get used to either they would have similar usability.If #61405 (range-over-func) is accepted, it may be that we want the signature of
Loop
to beLoop(op func() bool) bool
, which would allow benchmarks to be written as:It’s not clear to me what this form should do if the body attempts to
break
orreturn
.Another option is to mimic
testing.PB.Next
. Here, the signature of Loop would beLoop() bool
and it would be used as:This is slightly harder to implement, but perhaps more ergonomic to use. It's more possible to misuse than the version of Loop that takes a callback (e.g., code could do something wrong with the result, or break out of the loop early). But unlike
b.N
, which is easy to misuse, this seems much harder to use incorrectly than to use correctly.cc @bcmills @rsc
The text was updated successfully, but these errors were encountered: