testing: add testing.B.Loop for iteration #61515

aclements · 2023-07-21T20:11:10Z

Update July 26, 2023: See this comment for the latest Loop API. The motivation and arguments for this proposal still otherwise apply in full, but the API has switched to the for b.Loop() { ... } form proposed in the Alternatives section.

Currently, Go benchmarks are required to repeat the body of the benchmark (*testing.B).N times. This approach minimizes measurement overhead, but it’s error-prone and has many limitations:

As we discovered in cmd/vet: flag benchmarks that don’t use b #38677, it’s surprisingly common for benchmarks to simply forget to use b.N.
While a vet check can pretty reliably detect forgotten uses of b.N, there’s some evidence that many benchmarks use b.N incorrectly, such as using it to size the input to an algorithm, rather than as an iteration count.
Because the benchmark framework doesn’t know when the b.N loop starts, if a benchmark has any non-trivial setup, it’s important for it to use (*testing.B).ResetTimer. It’s generally not clear what counts as non-trivial setup, and very hard to detect when ResetTimer is necessary.

Proposal

I propose that we add the following method to testing.B and encourage its use over b.N:

// Loop invokes op repeatedly and reports the time and (optionally) allocations per invocation
// as the results of benchmark b.
// Loop must be called only once per benchmark or sub-benchmark.
//
// A benchmark should either use Loop or contain an explicit loop from 0 to b.N, but not both.
// After b.Loop returns, b.N will contain the total number of calls to op, so the benchmark
// may use b.N to compute other average metrics.
func (b *B) Loop(op func())

This API has several advantages over b.N loops:

It cannot be misused for something other than an iteration count. It’s still possible for a benchmark to forget entirely to use b.Loop, but that can be detected reliably by vet.
The benchmarking framework can record time and other metrics around only the benchmarked operation, so benchmarks no longer need to use ResetTimer or be careful about their setup.
Iteration ramp-up can be done entirely within b.Loop, which means that benchmark setup before b.Loop will happen once and only once, rather than at each ramp-up step. For benchmarks with non-trivial setup, this saves a lot of time. Notably, benchmarks with expensive setup can run for far longer than the specified -benchtime because of the large number of ramp-up steps (setup time is not counted toward the -benchtime threshold). It’s also less error-prone than using a global sync.Once to reduce setup cost, which can have side effects on GC timing and other benchmarks if the computed results are large.
As suggested by @rsc, b.Loop could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.
In the long term, we could collect distributions rather than just averages for benchmark metrics, which would enable deeper insights into benchmark results and far more powerful statistical methods, such as stationarity tests. The way this would work is that b.Loop would perform iteration ramp-up only to the point where it can amortize its measurement overhead (ramping up to, say, 1ms), and then repeat this short measurement loop many times until the total time reaches the specified -benchtime. For short benchmarks, this could easily gather 1,000 samples, rather than just a mean.

Alternatives

This proposal is complementary to testing.Keep (#61179). It’s an alternative to testing.B.Iterate (originally in #48768, with discussion now merged into #61179), which essentially combines Keep and Loop. I believe Iterate could have all of the same benefits as Loop, but it’s much clearer how to make Loop low-overhead. If Loop implicitly inhibits compiler optimizations in the body of its callback, then it has similar deoptimization benefits as Iterate. I would argue that Loop has a shallower learning curve than Iterate, though probably once users get used to either they would have similar usability.

If #61405 (range-over-func) is accepted, it may be that we want the signature of Loop to be Loop(op func() bool) bool, which would allow benchmarks to be written as:

func Benchmark(b *testing.B) {
    for range b.Loop {
        // … benchmark body …
    }
}

It’s not clear to me what this form should do if the body attempts to break or return.

Another option is to mimic testing.PB.Next. Here, the signature of Loop would be Loop() bool and it would be used as:

func Benchmark(b *testing.B) {
    for b.Loop() {
        // … benchmark body …
    }
}

This is slightly harder to implement, but perhaps more ergonomic to use. It's more possible to misuse than the version of Loop that takes a callback (e.g., code could do something wrong with the result, or break out of the loop early). But unlike b.N, which is easy to misuse, this seems much harder to use incorrectly than to use correctly.

cc @bcmills @rsc

The text was updated successfully, but these errors were encountered:

bcmills · 2023-07-21T20:17:57Z

b.Loop could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.

It's not clear to me which optimizations those would be. Certainly an unused return value should not be eliminated, but should function arguments be inlined?

I have seen benchmarks that explicitly want to check inlining behavior, and also benchmarks that are accidentally-inlinable. I expect that as inlining improves (#61502), the problem of accidentally-inlined arguments will only get worse — but if the Loop function inhibits inlining, it won't be as obvious how to allow particular arguments to be inlined.

(I assume the user would have to create a closure, and then call that closure within the Loop body?)

aclements · 2023-07-21T20:19:34Z

Since this is so closely related to testing.Keep and testing.B.Iterate, which are both being discussed on #61179, let's keep discussions of trade-offs over on #61179. (It's probably fine to discuss things specific to this Loop here.)

aclements · 2023-07-21T20:28:48Z

It's not clear to me which optimizations those would be. Certainly an unused return value should not be eliminated, but should function arguments be inlined?

I think we would implicitly apply Keep to every function argument and result in the closure passed directly to Loop. I'm pretty sure that's what @rsc was thinking, too.

I expect that as inlining improves (#61502), the problem of accidentally-inlined arguments will only get worse — but if the Loop function inhibits inlining, it won't be as obvious how to allow particular arguments to be inlined.

I definitely agree that this problem is only going to get worse. I'm not sure we need to inhibit inlining, but we need to inhibit optimizations that propagate information across this inlining boundary. That's why I think it works to think of this as applying an implicit Keep to every function argument and result in the closure, and to think of Keep as having a used and non-constant result. (Obviously that's not a very precise specification, though!)

(I assume the user would have to create a closure, and then call that closure within the Loop body?)

Right. I think if the user specifically wants to benchmark the effect of constant propagation into an inlined function, they would add another layer of function call. We'd only apply the implicit Keep to the direct argument to Loop. That's fairly subtle, but I think such benchmarks are also rare.

My other concern with the deoptimization aspect of this is what to do if Loop is called with something other than a function literal. We could say the implicit Keep only applies if it's called directly with a function literal, but that feels.. even weirder. 😅 It may be that for b.Loop() { ... } alternative I gave at the end is less weird in this context because the lexical relationship is much clearer.

rsc · 2023-07-25T19:34:17Z

When b.Loop inlines, for b.Loop() { ... } has almost no overhead while enabling multiple trials and any number of other interesting measurements. I really like it a lot. It's unfortunate that PB's method is Next, since Loop seems like a much better name. PB is not used much; it may be fine to be different.

rsc · 2023-07-26T18:26:16Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2023-09-06T17:26:30Z

The only real question about this was the auto-Keep. All the other benefits listed in the top comment are clear wins. Given that Keep is now likely accept, it seems like this one can be likely accept too.

rsc · 2023-09-06T17:31:05Z

Note that the current proposal is used as for b.Loop() { ... } - there is no callback anymore.

rsc · 2023-09-07T13:53:24Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2023-09-20T17:27:36Z

b.Loop has real benefits separate from Keep, so it's worth doing even if we still have questions about Keep. So this seems like it can move to accept.

timothy-king · 2023-09-21T19:53:36Z

If #61405 is accepted it will be possible to range over integers. From the description:

For example, the canonical benchmark iteration becomes:
for range b.N {
do the thing being benchmarked
}

So there might soon be a new, better alternative for benchmark iteration. Does that argue for delaying seeing whether another alternative is needed?

cespare · 2023-09-21T20:24:13Z

@timothy-king I don't think that matters here.

Range-over-integer is fairly minor syntactic sugar. The issues that b.Loop addresses are not about saving the user a few characters of typing; they are to do with the semantics of b.Loop compared with the semantics of using b.N directly. (See the bullet points in the original proposal.)

rsc · 2023-10-03T00:51:07Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

quasilyte · 2023-10-04T08:11:34Z

I usually wrap a benchmarked op into a go:noinline wrapper to disable some optimizations that can invalidate the benchmark.
This does add a constant overhead, but it works OK for the operations that run for more than 10-20ns.

It looks like passing a closure to the Loop function would eliminate that requirement as the benchmarked body would be inside the function that can't be inlined anyway (unless the compiler would handle Loop in some special way). (As opposed to being "inlined" inside the for loop.)

aclements · 2024-08-19T01:16:56Z

I mentioned in the original post that testing.B.Loop helps with issues around time spent doing benchmark setup, since it can easily ignore that time. Another aspect of this that just came out of a discussion with @mknyszek and @prattmic is that this also help avoid an "amplification" effect that comes out of any time spent with the benchmark timer stopped. Currently, the testing package attempts to find an iteration count that makes the measured benchmark time exceed the -benchtime target (by default, 1s). Since stopped time doesn't count toward this, if a benchmark spends a significant amount of time with the benchmark timer stopped, it can take arbitrarily more than 1 second of real time to run the benchmark.

For time spent in setup, this effect isn't terrible because it's only amplified by the number of times it needs to retry the benchmark. It's still nice that testing.B.Loop avoids this effect entirely, since it only runs benchmark setup once.

This effect is much worse if the timer is stopped during the benchmark loop itself, such as to do some per-iteration setup or cleanup. If the benchmark spends 90% of its real time with the timer stopped, it'll run for 10 times longer than expected. It may be that testing.B.Loop would let us improve this situation as well: it may make sense for it to run until the real time of the loop exceeds the -benchtime target. This would be pretty unreliable for b.N benchmarks because they can't reliably separate out setup time, but could be practical for testing.B.Loop benchmarks.

Benchtime has a few goals. One is to minimize the measurement effect by amortizing it to almost zero. That happens pretty quickly; probably well under a millisecond. But also, if you're starting and stopping the timer inside the benchmark loop, you've already given up on that amortization. Another is to roughly capture things that happen over a longer time scale, like garbage collection, scheduling, and cache effects. But these are likely to be captured nearly as well by 1 second of real time as they are by 1 second of benchmark time.

Given all of this, I think we could make testing.B.Loop target a real time rather than a benchmark time.

gopherbot · 2024-08-27T22:04:07Z

Change https://go.dev/cl/608798 mentions this issue: testing: implement testing.B.Loop

prattmic · 2024-08-28T15:27:50Z

Given all of this, I think we could make testing.B.Loop target a real time rather than a benchmark time.

Are you imagining some sort of limit on this? It is fine if the timer is stopped only for short periods, but if it is stopped for long periods then you might only run a single iteration. Maybe that is fine?

Obviously stopping the timer for a long time every iteration isn't practical, but they might do so every M iterations. For example, I recently wrote a map iter+delete benchmark that looked something like this.

func BenchmarkPop(b *testing.B) {
	m := make(map[int]int)
	for i := 0; i < b.N; i++ {
		if len(m) == 0 {
			b.StopTimer()
			for i := 0; i < 100; i++ {
				m[i] = i
			}
			b.StartTimer()
		}
		for k := range m {
			delete(m, k)
			break
		}
	}
}

(The STW cost of StopTimer made this impractical)

cherrymui · 2024-11-13T19:33:17Z

The proposal also mentions

Iteration ramp-up can be done entirely within b.Loop, which means that benchmark setup before b.Loop will happen once and only once, rather than at each ramp-up step. For benchmarks with non-trivial setup, this saves a lot of time. Notably, benchmarks with expensive setup can run for far longer than the specified -benchtime because of the large number of ramp-up steps (setup time is not counted toward the -benchtime threshold). It’s also less error-prone than using a global sync.Once to reduce setup cost, which can have side effects on GC timing and other benchmarks if the computed results are large.

I.e. not calling the top-level BenchmarkXXX functions multiple times, but just handling it in b.Loop runs. @JunyangShao Would you like to look into that? (Feel free to reopen this one, or open a new issue for tracking)

gopherbot · 2024-11-13T23:38:36Z

Change https://go.dev/cl/627755 mentions this issue: testing: implement one-time rampup logic for testing.B.Loop

testing.B.Loop now does its own loop scheduling without interaction with b.N. b.N will be updated to the actual iterations b.Loop controls when b.Loop returns false. This CL also added tests for fixed iteration count (benchtime=100x case). This CL also ensured that b.Loop() is inlined. For #61515 Change-Id: Ia15f4462f4830ef4ec51327520ff59910eb4bb58 Reviewed-on: https://go-review.googlesource.com/c/go/+/627755 Reviewed-by: Michael Pratt <[email protected]> Commit-Queue: Junyang Shao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]>

zigo101 · 2024-11-23T21:18:26Z

I'm a bit surprised that B.Loop is not an iterator.

aclements · 2024-11-25T20:10:30Z

I'm a bit surprised that B.Loop is not an iterator.

We definitely considered this, as you can see from the original proposal, making it an iterator doesn't have any obvious benefits because B.Loop doesn't yield anything, but it does have drawbacks in terms of complexity. As a simple function, it's much easier to implement such that the compiler can clearly inline B.Loop and reduce its overhead to almost nothing. As an iterator, it can probably still do that, but you'd be involving quite a bit more complexity.

gopherbot · 2024-12-01T22:25:59Z

Change https://go.dev/cl/632655 mentions this issue: testing: consider -N suffix after benchmark name optional

A "-N" suffix is left out when GOMAXPROCS is 1. Also match at least 1 space (\s+ instead of \s*), remove trailing '.*' (it's a no-op), and make the test error message style more consistent while here. For #61515. Fixes #70627. Change-Id: Id0a17478ac31e2934a663dd0d3b1b37f24974989 Cq-Include-Trybots: luci.golang.try:gotip-plan9-386 Reviewed-on: https://go-review.googlesource.com/c/go/+/632655 Reviewed-by: Junyang Shao <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Cherry Mui <[email protected]>

cherrymui · 2024-12-03T22:38:46Z

Reopen to track documentation, example, and release notes. Thanks.

mknyszek · 2024-12-04T17:18:25Z

The RC is planned for next week, and we need a full draft of the release notes before then. Please prioritize writing the release notes for this. Thanks!

gopherbot · 2024-12-05T19:00:09Z

Change https://go.dev/cl/633536 mentions this issue: testing: improve documentation, examples, release notes for

gopherbot · 2024-12-06T00:36:59Z

Change https://go.dev/cl/634115 mentions this issue: testing: fix release notes typo for testing.B.Loop.

gopherbot · 2024-12-13T02:30:39Z

Change https://go.dev/cl/635896 mentions this issue: testing: improve b.Loop example

gopherbot · 2024-12-13T02:30:40Z

Change https://go.dev/cl/635897 mentions this issue: testing: improve B.Loop test

gopherbot · 2024-12-13T02:30:41Z

Change https://go.dev/cl/635898 mentions this issue: testing: don't measure cleanup time after B.Loop

gopherbot · 2024-12-13T02:30:41Z

Change https://go.dev/cl/635895 mentions this issue: testing: improve B.Loop docs, use B.Loop in examples

This updates the testing documentation to frame B.Loop as the canonical way to write benchmarks. We retain documentation on b.N benchmarks because people will definitely continue to see them (and write them), but it's demoted to clearly second class. This also attempts to clarify and refine the B.Loop documentation itself. Updates #61515 Fixes #70787 Change-Id: If5123435bfe3a5883a753119ecdf7bbc41afd499 Reviewed-on: https://go-review.googlesource.com/c/go/+/635895 Reviewed-by: Junyang Shao <[email protected]> Reviewed-by: Caleb Spare <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Austin Clements <[email protected]>

The current b.Loop example doesn't focus on the basic usage of b.Loop. Replace this with a new example that uses (slightly) more realistic things to demonstrate the most salient points of b.Loop. We also move the example into an example file so that we can write a real Benchmark function and a real function to be benchmarks, which makes this much closer to what a user would actually write. Updates #61515. Change-Id: I4d830b3bfe3eb3cd8cdecef469fea0541baebb43 Reviewed-on: https://go-review.googlesource.com/c/go/+/635896 Auto-Submit: Austin Clements <[email protected]> Reviewed-by: Junyang Shao <[email protected]> Reviewed-by: Cherry Mui <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>

This moves the B.Loop test from package testing_test to package testing, where it can check on more of the internals of the benchmark state. Updates #61515. Change-Id: Ia32d7104526125c5e8a1e35dab7660008afcbf80 Reviewed-on: https://go-review.googlesource.com/c/go/+/635897 Auto-Submit: Austin Clements <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Junyang Shao <[email protected]>

B.Loop resets the timer on the first iteration so that setup code isn't measured, but it currently leaves the timer running after the last iteration, meaning that cleanup code will still be measured. Fix this by stopping the timer when B.Loop returns false to indicate the end of the benchmark. Updates #61515 Change-Id: I0e0502cb2ce3c24cf872682b88d74e8be2c4529b Reviewed-on: https://go-review.googlesource.com/c/go/+/635898 Reviewed-by: Junyang Shao <[email protected]> Auto-Submit: Austin Clements <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Cherry Mui <[email protected]>

aclements added the Proposal label Jul 21, 2023

gopherbot added this to the Proposal milestone Jul 21, 2023

aclements mentioned this issue Jul 21, 2023

proposal: testing: add Keep, to force evaluation in benchmarks #61179

Open

ianlancetaylor added this to Proposals Jul 21, 2023

ianlancetaylor moved this to Incoming in Proposals Jul 21, 2023

rsc moved this from Incoming to Active in Proposals Jul 26, 2023

rsc moved this from Active to Likely Accept in Proposals Sep 7, 2023

rsc added the Proposal-FinalCommentPeriod label Sep 7, 2023

rsc moved this from Likely Accept to Accepted in Proposals Oct 3, 2023

rsc changed the title ~~proposal: testing: add testing.B.Loop for iteration~~ testing: add testing.B.Loop for iteration Oct 3, 2023

rsc modified the milestones: Proposal, Backlog Oct 3, 2023

rsc added Proposal-Accepted and removed Proposal-FinalCommentPeriod labels Oct 3, 2023

JunyangShao closed this as completed Nov 11, 2024

seankhliao mentioned this issue Dec 2, 2024

testing: document best practices for avoiding compiler optimizations in benchmarks #27400

Open

cherrymui reopened this Dec 3, 2024

cherrymui added the release-blocker label Dec 3, 2024

cherrymui assigned JunyangShao Dec 3, 2024

cherrymui added the Documentation Issues describing a change to documentation. label Dec 3, 2024

gopherbot closed this as completed in 5213e1e Dec 5, 2024

gopherbot mentioned this issue Dec 5, 2024

api: audit for Go 1.24 #70701

Closed

dmitshur removed the Documentation Issues describing a change to documentation. label Dec 6, 2024

gabyhelp mentioned this issue Dec 11, 2024

testing: improve the text of the b.Loop documentation #70787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: add testing.B.Loop for iteration #61515

testing: add testing.B.Loop for iteration #61515

aclements commented Jul 21, 2023 •

edited

Loading

bcmills commented Jul 21, 2023

aclements commented Jul 21, 2023

aclements commented Jul 21, 2023

rsc commented Jul 25, 2023

rsc commented Jul 26, 2023

rsc commented Sep 6, 2023

rsc commented Sep 6, 2023

rsc commented Sep 7, 2023

rsc commented Sep 20, 2023

timothy-king commented Sep 21, 2023

cespare commented Sep 21, 2023

rsc commented Oct 3, 2023

quasilyte commented Oct 4, 2023

aclements commented Aug 19, 2024

gopherbot commented Aug 27, 2024

prattmic commented Aug 28, 2024 •

edited

Loading

cherrymui commented Nov 13, 2024

gopherbot commented Nov 13, 2024

zigo101 commented Nov 23, 2024

aclements commented Nov 25, 2024

gopherbot commented Dec 1, 2024

cherrymui commented Dec 3, 2024

mknyszek commented Dec 4, 2024

gopherbot commented Dec 5, 2024

gopherbot commented Dec 6, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

testing: add testing.B.Loop for iteration #61515

testing: add testing.B.Loop for iteration #61515

Comments

aclements commented Jul 21, 2023 • edited Loading

Proposal

Alternatives

bcmills commented Jul 21, 2023

aclements commented Jul 21, 2023

aclements commented Jul 21, 2023

rsc commented Jul 25, 2023

rsc commented Jul 26, 2023

rsc commented Sep 6, 2023

rsc commented Sep 6, 2023

rsc commented Sep 7, 2023

rsc commented Sep 20, 2023

timothy-king commented Sep 21, 2023

cespare commented Sep 21, 2023

rsc commented Oct 3, 2023

quasilyte commented Oct 4, 2023

aclements commented Aug 19, 2024

gopherbot commented Aug 27, 2024

prattmic commented Aug 28, 2024 • edited Loading

cherrymui commented Nov 13, 2024

gopherbot commented Nov 13, 2024

zigo101 commented Nov 23, 2024

aclements commented Nov 25, 2024

gopherbot commented Dec 1, 2024

cherrymui commented Dec 3, 2024

mknyszek commented Dec 4, 2024

gopherbot commented Dec 5, 2024

gopherbot commented Dec 6, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

gopherbot commented Dec 13, 2024

aclements commented Jul 21, 2023 •

edited

Loading

prattmic commented Aug 28, 2024 •

edited

Loading