RFC: hint::bench_black_box #2360

gnzlbg · 2018-03-12T17:20:44Z

Adds bench_black_box to core::hint.

text/0000-bench-utils.md

hanna-kruppe · 2018-03-12T18:03:28Z

text/0000-bench-utils.md

+pub fn clobber() -> ();
+```
+
+flushes all pending writes to memory. Memory managed by block scope objects must


This wording (flushing pending writes) makes me uncomfortable because it's remniscient of memory consistency models, including hardware ones, when this is just a single-threaded and compiler-level restriction. Actually, come to think of it, I'd like to know the difference between this and compiler_fence(SeqCst). I can't think of any off-hand.

when this is just a single-threaded and compiler-level restriction

This is correct.

compiler_fence(SeqCst). I can't think of any off-hand.

I can't either, but for some reason they do generate different code: https://godbolt.org/g/G2UoZC

I'll give this some more thought.

The difference between asm! with a memory clobber and compiler_fence exists in the fact, that memory clobber requires compiler to actually reload the memory if they want to use it again (as memory is… clobbered – considered changed), whereas compiler_fence only enforces that memory accesses are not reordered and the compiler still may use the usual rules to figure that it needn’t to reload stuff.

@nagisa the only thing that clobber should do is flush pending writes to memory. It doesn't need to require that the compiler reloads memory on reuse. Maybe the volatile asm! with memory clobber is not the best way to implement that.

@nagisa Thank you. I was mislead by the fact that fences prohibit some load-store optimization to thinking they'd also impact things like store-load forwarding on the same address with no intervening writes.

@gnzlbg asm! with memory is simply assumed to read from and write to all memory and all its effects follow from that. However, to be precise, this does not mean any reloads are introduced after a clobbering inline asm, it just means that all the loads that is already there (of which are a lot given that every local and many temporaries are stack slots) can't be replaced with values loaded from the same address before the clobber.

If you want something less strong, you need to be precise. "Flushing ~~loads~~ writes" is not really something that makes intuitive sense at the compiler level (and it surely has no effect on runtime, for example caches or store buffers?).

@rkruppe mem::clobber() should be assumed to read from all memory with effects, so that any pending memory stores must have completed before the clobber. All the loads that are already there in registers, temporary stack slots, etc, should not be invalidated by the clobber.

Okay that is a coherent concept. I'm not sure off-hand how to best implement that in LLVM. (compiler_fence is probably not enough, since it permits dead store elimination if the memory location is known to not escape.)

However, come to think of it, what's the difference between black_box(x) (which is currently stated to just force a write of x to memory) and let tmp = x; clobber(); (which writes x to the stack slot of tmp and then forces that store to be considered live)?

That's a good question and this relates to what I meant with "block scope" . In

{ let tmp = x; clobber(); }

this {} block is the only part of the program that knows the address of tmp so no other code can actually read it without invoking undefined behavior. So in this case, clobber does not make the store of tmp live, because nothing outside of this scope is able to read from it, and this scope doesn't read from it but executes clobber instead (clobbering "memory" does not clobber temporaries AFAICT).

However, if one shares the address of tmp with the rest of the program:

{ let tmp = x; black_box(&tmp); clobber(); }

then clobber will force the store of tmp to be considered live.

text/0000-bench-utils.md

hanna-kruppe · 2018-03-13T16:43:28Z

I was already concerned about the interaction with the memory model, and this comment basically confirmed my worst suspicions: There are extremely subtle interactions between the memory model and what these function can do.

I'm starting to believe that we'll be unable to give any guarantes that are of any use to benchmark authors. But if this is the case, these functions become a matter of chasing the optimizer and benchmarking reliably requires checking the asm to see your work wasn't optimize out. That's clearly very unappealing, but has always been true of micro benchmarks, so maybe that's just unavoidable.

This also raises the question of why this needs to be in std, if it's not compiler magic (like compiler_fence is) but a natural consequence of the memory model and inline asm. Aside from the stability of inline asm, which is not a very convincing argument to me.

gnzlbg · 2018-03-13T17:53:52Z

I'm temporarily closing this till we resolve the semantics of these functions in the memory model repo.

gnzlbg · 2018-03-13T18:08:12Z

The issue in the memory model repo is: https://github.com/nikomatsakis/rust-memory-model/issues/45

gnzlbg · 2018-06-11T08:50:33Z

Update. The summary of the discussion on the memory model repo is that clobber should be scrapped, and the RFC updated with the definition of black box agreed on there.

Manishearth · 2018-08-28T20:55:33Z

Any updates on opening the new RFC?

gnzlbg · 2018-08-29T10:06:28Z

I've updated the RFC with the discussion of the memory model repo.

It now just proposes pub unsafe fn core::hint::black_box<T>(x: T) -> T, which is specified as an unknown unsafe function that returns x. It is a no-op in the virtual machine, but the compiler has to assume that it can perform any valid operation on x that unsafe Rust code is allowed to perform.

Other changes:

moved the function from core::mem to core::hint since that appears to be a more suitable place for this functionality which is a no-op.

Open questions:

do we need a "safe" alternative? What safe and unsafe Rust are allowed to do differs, e.g., if the user passes it a raw pointer unsafe Rust is allowed to dereference it while safe Rust is not.
In the memory model repo discussion it was a bit unclear (to me) whether this function should take a reference or not. I've keep the interface from the test crate which takes a value, which allows users to pass it a &/&mut/*mut... or whatever they want, but I am not sure this is the right call.
should we call this black_box or something else ? (e.g. unknown)

cc @RalfJung @ubsan @rkruppe

Manishearth · 2018-08-29T14:01:50Z

Unsure if we should be opening a new RFC or reopening this one, cc @rust-lang/libs

text/0000-bench-utils.md

RalfJung · 2018-08-30T19:23:42Z

Seems fine to me! I was worried you'd try to talk about which memory black_box could touch, but you actually are using the spec I was hoping for. :)

text/0000-bench-utils.md

cbeck88 · 2019-07-24T23:04:59Z

As a user let me contribute my 2 cents

In C++, there is the same need for some kind of compiler barrier that prevents the compiler from hoisting benchmark code out of loops. There is a really great talk by Chandler Carruth about how to do this using inline assembly, where he goes into detail about how gtest implements this, and what it means from compiler's point of view: https://www.youtube.com/watch?v=nXaxk27zwlk

test::black_box(x) should mean for the compiler, this statement compiles to a no-op, but the surrounding code must be optimized in such a way that it would still be correct if test::black_box changed x in some arbitrary and unpredictable way.

Please don't block this PR based on arguments about the semantics of this.

The need for a barrier like this is obvious in any language that allows the optimizer to do things like hoisting expressions out of loops. It is an important practical need to be able to do benchmarks in rust, and this is not a "novel" solution to the problem, it is the same one that has been actively deployed in google's gtest C++ library for many years, which is used by millions of users with gcc and clang. Many experts have thought extensively about the best way to create this kind of barrier, with special attention to llvm. Real users of rust have the same need to use this benchmarking tool, without using a nightly compiler. Stabilizing test::black_box is the only way to do that AFAIU.

It's really sad that useful stuff like this is blocked for months by what appear to be pedantic concerns about having a completely formal definition of the semantics. It should be adequate to say "test::black_box is an appropriate barrier to use for microbenchmarks, to prevent the compiler from making optimizations about a particular variable over multiple iterations of a loop", and if in the future it is determined that the implementation is inadequate for this in some way, then it is treated as a bug and fixed.

Honestly, it's probably adequate to just track whatever gtest is doing -- if similar small microbenchmarks in C++ and rust, one using gtest and one using core::black_box, produce similar LLVM IR when compiled with clang vs. rustc, then the implementation is correct, and this is all that users need.

eddyb · 2019-07-25T05:40:55Z

@cbeck88 That's why I attempted to bypass the whole thing and have bench_{input,output}, which can be more weakly specified (I hope?), and it should be clearer from the names that these are for benchmarks.

Whereas we've seen people (want to) misuse black_box to write incorrect/unsound code.

Pretty much the only way we can specify something like this is to make it a correct implementation to define those functions as noops that don't prevent optimization.
It's a fool's errand to try to specify almost anything about an optimizer, especially if you mention LLVM, I just wish it were more obvious. The only winning move is not to play.

Centril · 2019-07-25T07:50:56Z

There is a really great talk by Chandler Carruth about how to do this using inline assembly, where he goes into detail about how gtest implements this, and what it means from compiler's point of view: https://www.youtube.com/watch?v=nXaxk27zwlk

Here's a transcript from imo the most relevant part of that talk:

"First of all, I'm about to pause for a minute. I'm about to tell you all how to defeat the optimizer. As a compiler and optimizer writer, please use these powers for good. Like... Don't defeat the optimizer because you have undefined behavior in your source code and you're like 'No no no optimizer, you're not going to miscompile my code today, nah ah!'. That's not gonna actually work for you. It goes very badly.

Okay, but how do we actually defeat it when we need to for a good reason, we need to for some performance analysis? We need two functions. Alright, these are kind of magical functions. I would never expect anyone to know how to write these functions. You might reasonably argue that they shouldn't... like I shouldn't be writing them and they should be in my library but I wanted to should you guys how these things actually work. Okay, we're already in trouble, I'm using void* pointers, I know, I know, type safety and all that. But but... there's actually a reason I'm using void* and it's gonna get a lot worse.

I'm going to next type the two scariest tokens for my compiler implementer world. Okay, literally, the two scariest tokens. Okay. [types asm volatile("" : : "g"(p") : "memory");] So let's see, what on earth have I done? Okay so what is this...? This is a magical escape function. It is truly magical, okay.. It has these unique and mysterious properties. The first property here is that it accepts an arbitrary pointer and it discards all type information because this escape function isn't about playing nice with your type system. It's about turning off the optimizer, right? We don't need a type system to do that. Okay so then we get really crazy and we do an asm volatile. [...]"

It's really sad that useful stuff like this is blocked for months by what appear to be pedantic concerns about having a completely formal definition of the semantics.

We have a definition that is adequate for benchmarking purposes and completely formal already. (Well as much as fn black_box<T>(x: T) -> T { x } is formally defined at least.)

It seems to me that the only things this RFC is actually blocked on right now is a bikeshed and to consider whether a change towards bench_{input,output} would be better for benchmarking purposes. Anything else, e.g. guarantees regarding freezing or constant time operations, is really out of scope for this RFC.

From my perspective it would be fine to offer all three of hint::bench_{black_box,input,output} and we can start with these in any order. I think that's fine because these are all best effort tools that are trivial to formally specify when they truly are hints.

Sometimes the functions will also be wrapped in a benchmarking framework which means that those users won't need to care about which ones to use. I think which ones to use will be largely driven by tutorials and convention as they are all rather opaque (at least from my perspective). If it turns out we need to deprecate some of them in the future that's also fine because again, these functions are trivial to specify.

@eddyb

@cbeck88 That's why I attempted to bypass the whole thing and have bench_{input,output}, which can be more weakly specified (I hope?), and it should be clearer from the names that these are for benchmarks.

I'm not sure I follow this part. fn bench_black_box<T>(x: T) -> T { x } being a valid definition of bench_black_box for any Rust compiler/interpreter seems like as weak a specification as it gets?

Whereas we've seen people (want to) misuse black_box to write incorrect/unsound code.

Sure. That's unfortunate... but isn't good documentation the solution here?

RalfJung · 2019-07-25T09:00:19Z

It's really sad that useful stuff like this is blocked for months by what appear to be pedantic concerns about having a completely formal definition of the semantics.

This is not what this is blocked on. Please re-read the discussion. Nobody, not even from the formal camp, is opposed to having something that says "this encourages the compiler to not optimize some things, but code cannot rely on this for functional correctness".

But some people don't like calling a thing that does this black_box, and I think that name is the only open concern.

@Centril thanks a lot for that transcript! I am relieved and happy that I can now even quote Chandler any time someone asks about using black_box for "defeating UB" purposes. It's not just us academic eggheads that argue this way. ;)

For those wanting to watch the video, the timecode-link is https://youtu.be/nXaxk27zwlk?t=2445.

Seems like the two functions Chandler defines are escape and clobber. black_box corresponds to escape and IIRC we discarded clobber because Rust's strong aliasing guarantees make it much less useful.

gnzlbg · 2019-07-25T09:05:13Z

I've renamed the intrinsic to bench_black_box but still maintained the unresolved question that we might want to tune naming before stabilization, since it is unclear whether we might want to add bench_input/bench_output at some point, and if that might be the case we might want to put these in a core::hint::bench::.. submodule where we can explain how these intrinsics are supposed to be holistically used. Otherwise we need to explain that in the docs of each of the intrinsics.

I don't think that resolving this should block the RFC from making progress, and we probably should just implement bench_input/bench_output on nightly behind a feature gate and let users gain experience with them and let us gain experience with them as well about how to explain how to use them together, and see how well that works out in practice.

This should resolve @cramertj concern about naming, let me know if this isn't the case.

I've also tried to document @eddyb proposal of adding bench_input/output in the alternative section. We tried to work that alternative out in dozens of comments so I might have missed some, please review the diff of the last commit.

The pros of that proposal is that they are easier to specify with stronger guarantees than bench_black_box, and the cons is that they are currently not properly implementable in LLVM, and without a proper implementation we can't really show users what their effect on optimizations actually is, which makes teaching how to use them together hard (at least harder than bench_black_box, since we can really show examples of things going wrong when they are used improperly).

eddyb · 2019-07-25T09:26:49Z

@gnzlbg Would you be happier with bench_{input,output} if we added optimizations at the MIR level to demonstrate their distinct semantics?

eddyb · 2019-07-25T09:29:06Z

Also, keep in mind none of the LLVM limitations would prevent miri from demonstrating UB.

gnzlbg · 2019-07-25T10:01:54Z

Would you be happier with bench_{input,output} if we added optimizations at the MIR level to demonstrate their distinct semantics?

I'm happy with bench_input and bench_output as proposed.

bench_black_box has a steep-ish learning curve and might disable too many optimizations, and I do think we should add bench_intput/output as well so that we can try to experiment how to implement them more strictly, and for those users with more time and wanting more finer grained control can learn how to use them and try them out. I just don't think it is worth blocking bench_black_box on figuring this out.

Also, keep in mind none of the LLVM limitations would prevent miri from demonstrating UB.

The main use case for these is benchmarks and when users try to put what they learn into practice, there is no self-correcting mechanism that will let them know they are using these incorrectly unless they run their benchmarks under miri. Maybe there is a way to encourage that somehow, but that is going to require a bit of experimentation because currently miri is quite slow. I suppose libtest could help here by reducing the number of iterations to 1 when run under miri by default and omit all the timing stuff or something like that.

Manishearth · 2019-07-25T12:13:22Z

Sure. That's unfortunate... but isn't good documentation the solution here?

I mean, function name choice (and whether you have one or two functions) kind of is part of "good documentation". Self-documenting functions are better than something people may misuse if they don't read the docs.

cramertj · 2019-07-25T17:24:20Z

@rfcbot resolve actually-a-black-box

The new name bench_black_box seems pretty clear about how this is supposed to be used.

cramertj · 2019-07-25T17:25:21Z

Ping @SimonSapin @Zoxc @michaelwoerister @varkor for your checkboxes here.

hanna-kruppe · 2019-08-23T15:37:42Z

Ping @SimonSapin @Zoxc @michaelwoerister for your checkboxes #2360 (comment)

rfcbot · 2019-08-23T15:46:28Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2019-09-02T15:59:51Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

Centril · 2019-09-02T20:45:45Z

🎉 Huzzah! This RFC has been merged! 🎉

Tracking issue: rust-lang/rust#64102

sfackler reviewed Mar 12, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

gnzlbg force-pushed the black_box branch 4 times, most recently from b3f4441 to 8a9ae3f Compare March 12, 2018 17:39

hanna-kruppe reviewed Mar 12, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

RFC: mem::black_box and mem::clobber

ec42fd0

gnzlbg force-pushed the black_box branch from 8a9ae3f to ec42fd0 Compare March 12, 2018 17:58

hanna-kruppe reviewed Mar 12, 2018

View reviewed changes

gnzlbg mentioned this pull request Mar 12, 2018

Move test::black_box to std and stabilize it #1484

Closed

leoyvens reviewed Mar 12, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Mar 12, 2018

black_box stores back to memory, not registers

68b4dde

gnzlbg closed this Mar 13, 2018

frewsxcv mentioned this pull request Jun 11, 2018

Benchmarking / cargo bench #2287

Closed

update with the memory model discussion

8eae737

gnzlbg reopened this Aug 29, 2018

update examples with unsafe code

6ccddcc

hanna-kruppe reviewed Aug 29, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

RalfJung reviewed Aug 30, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

RalfJung reviewed Aug 30, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

RalfJung reviewed Aug 30, 2018

View reviewed changes

text/0000-bench-utils.md Outdated Show resolved Hide resolved

RalfJung mentioned this pull request Jul 23, 2019

Improve the documentation for std::hint::black_box. rust-lang/rust#62891

Merged

Rename to bench_black_box and add bench_input/output alternative

1b0f647

varkor changed the title ~~RFC: hint::black_box~~ RFC: hint::bench_black_box Jul 28, 2019

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Aug 23, 2019

rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Sep 2, 2019

Centril mentioned this pull request Sep 2, 2019

Tracking issue for std::hint::black_box rust-lang/rust#64102

Closed

5 tasks

RFC 2360

bda0ba6

Centril merged commit 356cb57 into rust-lang:master Sep 2, 2019

Centril mentioned this pull request Jan 15, 2020

Enable leak sanitizer test case rust-lang/rust#68244

Merged

jonhoo mentioned this pull request Jul 28, 2020

New name for hint::black_box: pretend_used rust-lang/rust#74853

Closed

RalfJung mentioned this pull request Aug 8, 2020

miri no longer builds after rust-lang/rust#75048 rust-lang/rust#75274

Closed

comex mentioned this pull request Jul 15, 2021

Rust unstable features needed for the kernel Rust-for-Linux/linux#2

Open

90 tasks

gnzlbg mentioned this pull request Nov 29, 2021

semantics of black_box and clobber in the memory model rust-lang/unsafe-code-guidelines#311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: hint::bench_black_box #2360

RFC: hint::bench_black_box #2360

gnzlbg commented Mar 12, 2018 •

edited by Centril

Loading

hanna-kruppe Mar 12, 2018

gnzlbg Mar 12, 2018

nagisa Mar 12, 2018 •

edited

Loading

gnzlbg Mar 13, 2018

hanna-kruppe Mar 13, 2018 •

edited

Loading

gnzlbg Mar 13, 2018

hanna-kruppe Mar 13, 2018

hanna-kruppe Mar 13, 2018

gnzlbg Mar 13, 2018 •

edited

Loading

hanna-kruppe commented Mar 13, 2018

gnzlbg commented Mar 13, 2018

gnzlbg commented Mar 13, 2018

gnzlbg commented Jun 11, 2018

Manishearth commented Aug 28, 2018

gnzlbg commented Aug 29, 2018 •

edited

Loading

Manishearth commented Aug 29, 2018

RalfJung commented Aug 30, 2018

cbeck88 commented Jul 24, 2019 •

edited

Loading

eddyb commented Jul 25, 2019 •

edited

Loading

Centril commented Jul 25, 2019 •

edited

Loading

RalfJung commented Jul 25, 2019

gnzlbg commented Jul 25, 2019 •

edited

Loading

eddyb commented Jul 25, 2019

eddyb commented Jul 25, 2019

gnzlbg commented Jul 25, 2019 •

edited

Loading

Manishearth commented Jul 25, 2019

cramertj commented Jul 25, 2019

cramertj commented Jul 25, 2019

hanna-kruppe commented Aug 23, 2019

rfcbot commented Aug 23, 2019

rfcbot commented Sep 2, 2019

Centril commented Sep 2, 2019

RFC: hint::bench_black_box #2360

RFC: hint::bench_black_box #2360

Conversation

gnzlbg commented Mar 12, 2018 • edited by Centril Loading

hanna-kruppe Mar 12, 2018

Choose a reason for hiding this comment

gnzlbg Mar 12, 2018

Choose a reason for hiding this comment

nagisa Mar 12, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg Mar 13, 2018

Choose a reason for hiding this comment

hanna-kruppe Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg Mar 13, 2018

Choose a reason for hiding this comment

hanna-kruppe Mar 13, 2018

Choose a reason for hiding this comment

hanna-kruppe Mar 13, 2018

Choose a reason for hiding this comment

gnzlbg Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

hanna-kruppe commented Mar 13, 2018

gnzlbg commented Mar 13, 2018

gnzlbg commented Mar 13, 2018

gnzlbg commented Jun 11, 2018

Manishearth commented Aug 28, 2018

gnzlbg commented Aug 29, 2018 • edited Loading

Manishearth commented Aug 29, 2018

RalfJung commented Aug 30, 2018

cbeck88 commented Jul 24, 2019 • edited Loading

eddyb commented Jul 25, 2019 • edited Loading

Centril commented Jul 25, 2019 • edited Loading

RalfJung commented Jul 25, 2019

gnzlbg commented Jul 25, 2019 • edited Loading

eddyb commented Jul 25, 2019

eddyb commented Jul 25, 2019

gnzlbg commented Jul 25, 2019 • edited Loading

Manishearth commented Jul 25, 2019

cramertj commented Jul 25, 2019

cramertj commented Jul 25, 2019

hanna-kruppe commented Aug 23, 2019

rfcbot commented Aug 23, 2019

rfcbot commented Sep 2, 2019

Centril commented Sep 2, 2019

gnzlbg commented Mar 12, 2018 •

edited by Centril

Loading

nagisa Mar 12, 2018 •

edited

Loading

hanna-kruppe Mar 13, 2018 •

edited

Loading

gnzlbg Mar 13, 2018 •

edited

Loading

gnzlbg commented Aug 29, 2018 •

edited

Loading

cbeck88 commented Jul 24, 2019 •

edited

Loading

eddyb commented Jul 25, 2019 •

edited

Loading

Centril commented Jul 25, 2019 •

edited

Loading

gnzlbg commented Jul 25, 2019 •

edited

Loading

gnzlbg commented Jul 25, 2019 •

edited

Loading