-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: testing: add Keep, to force evaluation in benchmarks #61179
Comments
Bikeshedding the name aside ( It's simple and sufficient. It doesn't prevent us from working on a different API like the proposed |
If we planned to do Iterate, we might not want to also do Keep. That said, I think the drawbacks listed above for Iterate are quite serious and we should simply not do it. |
FWIW, I'm planning to file another proposal for an API that covers just the looping aspect of Iterate and would complement Keep. |
Doing |
This proposal has been added to the active column of the proposals project |
Note that that is also possible with the API proposed in #48768 by closing over the intentionally-constant values: func BenchmarkFibConstant10(b *testing.T) {
b.Iterate(func() int {
return Fib(10)
})
} It seems to me that this proposal and #48768 are equally expressive, and the key difference is just whether constant-propagation is opt-in (#48768) or opt-out (this proposal). |
As I have repeatedly stated on #48768, I believe that there are several viable ways to overcome that overhead. I am becoming somewhat frustrated that #48768 (comment) in particular seems to have been ignored. I may not be on the Go compiler team, but I am well acquainted with compiler optimization techniques, and so far nobody has explained why those techniques would not apply in this case. |
While that is true, any type mismatch errors would be diagnosed immediately if the benchmark is ever actually run, and a similar lack of type safety was not seen as a significant barrier for the closely-related fuzz testing API (#44551). |
@bcmills, my experience over >25 frustrating years of trying to benchmark things is that, in general, attempting to subtract out per-loop overhead sounds good in theory, but in practice that overhead can and often does include various random noise. And the more overhead there is, the louder the noise. This means if you are trying to benchmark a very short operation, then subtracting out a (different) reflect.Call measurement is very likely to destroy the measurement, perhaps even making it negative. The best approach we have for getting the most reliable numbers we can is to introduce as little overhead as possible to begin with. For the trivial loop for i := 0; i < b.N; i++, we just ignore the overhead of the i++, i < N entirely and include it as part of the thing being measured. This turns out to be far more accurate than trying to subtract it out. |
From what I can tell, this would require N+1 calls to |
The main place where testing.Keep is needed is around the overall result. I write code to work around that all the time. |
I see now that you also mentioned making b.Iterate a compiler intrinsic. I suppose that is possible, but it seems very special-case. At that point it's basically a back-door language change, since either you can't do |
I expect that that will become more common as the compiler gets better at inlining. That said, it is also more straightforward to work around (without new API) today, such as by alternating among multiple entries in a slice of inputs. |
I agree that subtracting out the overhead from a naive implementation based on Making I think probably the most promising approach is an implementation that sets up a stack frame with arguments and then repeatedly invokes the function starting from that frame. It isn't obvious to me whether the |
The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do. reflect.Caller would amortize the allocation but not the setup. |
All good points, @bcmills.
I'm not sure if you're referring to "line noise" here (which, I agree, this does introduce a fair amount of line noise) or measurement noise. For the latter, a naive implementation of
Another possible option is that we make sure I'm not that concerned about people capturing We already do some code generation for tests. Is there anything we could code-generate to help with this? We don't rewrite any test code right now, so this might require pushing that too far.
Not to mention, I would expect most or all of the arguments to be passed in registers. We would certainly have to re-initialze those. |
What I had in mind is something like two functions: a (somewhat expensive) setup hook that checks types and copies function arguments from a slice into a more compact argument block, and a (cheap) “invoke” hook that initializes the call stack frame, copies arguments into registers, and calls the function. The argument block might look something like:
The implementation of func (b *B) Iterate(f any, args ...any) {
call := reflectlite.NewCall(f, args...)
b.ResetTimer()
for i := 0; i < b.N; i++ {
call.Invoke()
}
} where That seems like it might be easier than teaching the compiler to inline through |
As a developer I would much prefer the compiler be taught not to optimize away calls within a _test.go file instead of me having to remember to write a bunch of wrapper calls. I didn't see that listed in the alternatives, so my apologies if that has been proposed previously. |
So I naively want the compiler not to optimize away things in a benchmark... but also some amount of the optimization happening would in fact be part of what the compiler would do running the code in reality, and thus, part of what I want to benchmark. The trick is distinguishing between optimizing-away the benchmark and optimizing-away some of the work inside the benchmark, which would also be optimized-away outside of the benchmark. |
Another alternative name for This function is not only useful for testing, but also for non-testing code. |
I belive that another problem with |
The existing ABI is such that if a function that returns a pointer to a new object is not inlined into the caller, the object to which it points must be heap-allocated. |
I am still concerned about the overhead of reflect in Iterate. We can't subtract it, and that means we can't reliably measure very short functions - which are the ones most likely to be affected by throwing away parts of the computation. The compiler is going to be involved no matter what. What if it's more involved? Specifically, suppose we have a function that just does looping and takes func(), like
or maybe
and the compiler would recognize testing.B.Loop and apply Keep to every function and every argument in every call in that closure. We could still provide Keep separately and explain in the docs for Loop what the compiler is doing, in terms of Keep. This would end up being like b.Iterate but (a) you get to write the actual calls, and (b) there is no reflect. On the other hand, the compiler does more work. But this is the Go compiler and the Go testing package; the compiler already knows about sync/atomic and math and other packages. For that matter we could also recognize for i := 0; i < b.N; i++ { ... } and do the same to that loop body (it might still help to have something like Iterate or Loop though). |
What is the effect of calling |
It should be the same whether it is in a test file or not. |
So it is a general purpose function in syntax/semantics, but a testing specific function subjectively. Not a big problem though. |
@cespare @aclements Thanks for the explanations.
I agree it's the same as #61515, although the Loop in this proposal seems to be different, not taking a function, and returning a boolean. Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.
Yes, assuming you mean the Loop from #61515, and not the Loop here, as explained just above.
Why do we need to expose Keep to explain what Loop is doing? I don't see why we can't explain what Loop does in the same way, e.g. "All function values, all arguments, and all function results are forced to be evaluated etc etc etc..." Why do users need this general power? What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type func Benchmark1(b *testing.B) {
b.Loop(func() { Fib(10) }) // Not optimized
}
func Benchmark2(b *testing.B) {
var notOptimized testing.BenchFunc = func() { Fib(10) }
var optimized func() = func() { Fib(10) }
b.Loop(notOptimized)
b.Loop(testing.BenchFunc(optimized))
} work as expected. @cespare's example would be func BenchmarkFoo(b *testing.B) {
for _, bb := range []struct{
name string
/* lots of testing parameters */
} {
{ /* test case 1 */ },
// ...
} {
// lots of setup code
b.Run(bb.name, func(b *testing.B) {
benchFoo(b, bb.x, bb.y, some, other, params)
})
}
}
func benchFoo(b *testing.B, x, y, z int) {
// ...
b.Loop(func() {
Foo(x, y, z)
}
} |
The #61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this. It may be that the way I wrote the update to #61515 wasn't clear enough, so I've tried to rewrite it.
You're right that we can explain how Loop deoptimizes without exposing Keep. However, not exposing Keep limits refactoring opportunities, and also makes it impossible to write examples the allow partial optimization like in @bcmills' comment. Granted, we expect both of these situations to be rare.
This seems strictly more complicated to me. Earlier you argued that "A function approach seems more consistent with how Go does things", but I'm not sure I agree with that. Go APIs tend not to reach for closures when simpler and more direct constructs will do. For example, |
Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the |
@aclements Adding an hr divider at the end, followed by an "Edit: Changed to [...], see these comments [...]" line(s) would be sufficient. This is comparable to the practice of reserving the first comment for FAQs and proposal updates that I've seen the Go team use elsewhere recently, which worked well.
Can you demonstrate an example using Keep?
Can you demonstrate an example using Keep? |
The auto-Keep during b.Loop could be applied the same in any loop that counts from 0 to b.N where b has type *testing.B. Then b.Loop is less special - the compiler just applies it to both kinds of benchmark loops. |
If we make Keep auto-apply inside b.N loops, then b.Loop is no longer special, and converting a b.N loop to a b.Loop loop is not a performance change at all, so it seems like we should make Keep auto-apply inside b.N loops. That will also fix a very large number of Go microbenchmarks, although it will look like it slowed them down. That leaves the question of whether to auto-Keep at all. If we don't, we leave a big mistake that users, even experienced ones, will make over and over. The compiler can do the right thing for us here. Maybe it would help if someone could try a compiler implementation and see how difficult this is. |
Suppose you have a function like func Benchmark(b *testing.B) {
for b.Loop() {
... complicated body ...
}
} In this form, we're proposing that we automatically apply Keep to "complicated body". However, if you were to refactor out parts of the complicated body into a new function: func Benchmark(b *testing.B) {
for b.Loop() {
... less complicated body, calls helper ...
}
}
func helper() {
... parts of complicated body ...
} The code in
I guess you don't actually need Keep to express partial optimization. Here's an example of partial optimization, where one argument to the benchmarked function can be constant-propagated, but the other is treated as a variable even though it's a constant. func BenchmarkPowPartialConstant(b *testing.B) {
xPow20 := func(a float64) float64 {
return math.Pow(a, 20)
}
for b.Loop() {
// Compute 5 ** 20. This allows the '20' to be constant-propagated,
// but treats the '5' as a variable.
xPow20(5)
}
} Another way to express this does use Keep: func BenchmarkPowPartialConstant(b *testing.B) {
xPow := func() float64 {
return math.Pow(math.Keep(5), 20)
}
for b.Loop() {
// Compute 5 ** 20. This is defined separately to avoid auto-Keep
// so the '20' can be constant-propagated.
xPow()
}
} In both of these, |
Here is an ergonomic thing that doesn't seem to have been considered in the above discussion. When the benchmarked function has multiple input parameters and return values (of different types): for i := 0; i < b.N; i++ {
bb, _ = bbhash.New(gamma, partitions, keys)
} It would be nice to be able to write this... for i := 0; i < b.N; i++ {
testing.Keep(bbhash.New(gamma, partitions, keys))
} However, since the return types may be different, the proposed A variadic variant like the one below would work for the return types: func Keep(v ...any) []any {
return v
} But it doesn't work for the input arguments if one wants to avoid |
This comment was marked as resolved.
This comment was marked as resolved.
@meling , that's an interesting point. Unfortunately, while I agree that makes Keep a little more ergonomic for that one case, it seems to make it less ergonomic for any case that needs to use the result of For example, to fully write out your example with the signature you proposed, you'd need to write: for i := 0; i < b.N; i++ {
r := testing.Keep(bbhash.New(gamma, partitions, keys))
bb = r[0].(something)
} That type assertion will also add a little benchmarking overhead. With the original proposed signature, you would write this as: for i := 0; i < b.N; i++ {
bb, _ = bbhash.New(gamma, partitions, keys)
testing.Keep(bb)
} That doesn't seem very bad to me. Also, the idea with auto-Keep is that in these cases you wouldn't have to write Keep at all. |
Yeah, I agree. The main reason I brought it up was that it didn't seem like it was discussed. Moreover, I had some initial idea that perhaps it would be valuable to have some form of "type matching" (for lack of a better term): func Keep[T any{0,3}](v T) T Where the However, I removed it since I didn't want to distract the first message with such a "controversial" proposal 😅... Don't know if this is worth writing up as a separate proposal. |
Placed on hold. |
To be clear, this is on hold pending an experimental implementation, per #61179 (comment) |
Just to loop back here, the proposal is to both:
Number 1 seems pretty easy. The question is, would we want to do 1 without also doing 2? Particularly, I think we could do 1 for 1.22 if we want. Getting 2 in for 1.22 looks more speculative. But then there is a problem: having 1 and not 2 will encourage people to add lots of explicit |
Isn't this the case for IMHO we can do (1) first, in 1.22. Whatever calls to |
It is for function results, true. It can't be used to solve the function arg problem. |
I'm concerned about losing momentum on this. Is there someone who is planning on doing the prototype work for auto-Keep? If it is going to be years before we get auto-Keep, I'd much rather we decouple these and land Keep now. Then we at least have a path forward to eliminate the hacky workarounds (KeepAlive, sinks) while we wait for the better thing which may or may not happen. |
Yes, my argument isn't that |
In rare cases, this affects test output also, where optimizations in the test code cause subtle changes in floating-point output. This is a reduced test case from a much larger program:
On ARM64, this outputs two different numbers (depending on whether
|
|
Benchmarks frequently need to prevent certain compiler optimizations that may optimize away parts of the code the programmer intends to benchmark. Usually, this comes up in two situations where the benchmark use of an API is slightly artificial compared to a “real” use of the API. The following example comes from @davecheney's 2013 blog post, How to write benchmarks in Go, and demonstrates both issues:
Most commonly, the result of the function under test is not used because we only care about its timing. In the example, since
Fib
is a pure function, the compiler could optimize away the call completely. Indeed, in “real” code, the compiler would often be expected to do exactly this. But in benchmark code, we’re interested only in the side-effect of the function’s timing, which this optimization would destroy.An argument to the function under test may be unintentionally constant-folded into the function. In the example, even if we addressed the first issue, the compiler may compute Fib(10) entirely at compile time, again destroying the benchmark. This is more subtle because sometimes the intent is to benchmark a function with a particular constant-valued argument, and sometimes the constant argument is simply a placeholder.
There are ways around both of these, but they are difficult to use and tend to introduce overhead into the benchmark loop. For example, a common workaround is to add the result of the call to an accumulator. However, there’s not always a convenient accumulator type, this introduces some overhead into the loop, and the benchmark must then somehow ensure the accumulator itself doesn’t get optimized away.
In both cases, these optimizations can be partial, where part of the function under test is optimized away and part isn’t, as demonstrated in @eliben’s example. This is particularly subtle because it leads to timings that are incorrect but also not obviously wrong.
Proposal
I propose we add the following function to the testing package:
(This proposal is an expanded and tweaked version of @randall77’s comment.)
The
Keep
function can be used on the result of a function under test, on arguments, or even on the function itself. UsingKeep
, the corrected version of the example would be:(Or
testing.Keep(Fib)(10)
, but this is subtle enough that I don’t think we should recommend this usage.)Unlike various other solutions,
Keep
also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.Alternatives
Keep
may not be the best name. This is essentially equivalent to Rust’sblack_box
, and we could call ittesting.BlackBox
. Other options includeOpaque
,NoOpt
,Used
, andSink
.testing: document best practices for avoiding compiler optimizations in benchmarks #27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.
proposal: testing: a less error-prone API for benchmark iteration #48768 proposes
testing.Iterate
, which forces evaluation of all arguments and results of a function, in addition to abstracting away the b.N loop, which is another common benchmarking mistake. However, its heavy use of reflection would be difficult to make zero or even low overhead, and it lacks static type-safety. It also seems likely that users would often just pass afunc()
with the body of the benchmark, negating its benefits for argument and result evaluation.runtime.KeepAlive
can be used to force evaluation of the result of a function under test. However, this isn’t the intended use and it’s not clear how this might interact with future optimizations toKeepAlive
. It also can’t be used for arguments because it doesn’t return anything. @cespare has some arguments againstKeepAlive
in this comment.The text was updated successfully, but these errors were encountered: