-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mem::replace
simpler in codegen
#111010
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
_4 = &raw mut (*_1); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL | ||
StorageLive(_6); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL | ||
(*_4) = _2; // scope 8 at $SRC_DIR/core/src/ptr/mod.rs:LL:COL | ||
StorageDead(_6); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These un-useful StorageLive(_6)
+StorageDead(_6)
will go away with #110702
This comment has been minimized.
This comment has been minimized.
The Miri subtree was changed cc @rust-lang/miri |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 328e1db036ffa25505d1c758566535b7daee3b29 with merge 585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Thanks, @est31 ! I think the difference that makes sense here is related to this note from that PR:
If this PR needed to update the borrow checker particularly, but even just codegen or CTFE or Miri code, then it'd absolutely not be worth doing. But using a new intrinsic that lowers to existing MIR functionality (instead of using |
|
||
pub fn replace_byte(dst: &mut u8, src: u8) -> u8 { | ||
std::mem::replace(dst, src) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "direct memcpy" name for this test no longer made sense, as it doesn't call memcpy
any more.
What it was testing is subsumed by the new tests/codegen/mem-replace-simple-type.rs
test below.
328e1db
to
a0a0d69
Compare
Finished benchmarking commit (585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
a0a0d69
to
3456f77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
Perf wise it looks like the codegen_crate
is now slower (I assume due to more lowering happening?) while LLVM is faster (I assume due to better llvm-ir passed to it), this is along with a lot of noise, probably because of the changes of how compiler itself is compiled. ripgrep regression is weird, but overall I think this looks fine.
r=me, with or without the nit.
// to `dst` while `src` is owned by this function. | ||
unsafe { | ||
copy_nonoverlapping::<T>(&value, ptr, 1); | ||
forget(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but could you use ManuallyDrop
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably could, but the preexisting implementation of write
went out of its way to use intrinsics::forget
rust/library/core/src/ptr/mod.rs
Line 1369 in 4b87ed9
intrinsics::forget(src); |
even though mem::forget
doesn't
rust/library/core/src/mem/mod.rs
Lines 148 to 150 in 4b87ed9
pub const fn forget<T>(t: T) { | |
let _ = ManuallyDrop::new(t); | |
} |
which is probably just because it has much better codegen, but in case it actually matters I'd rather just leave it like this since it sounds like you don't feel particularly strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay 👍🏻
@bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (6db1e5e): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 656.574s -> 656.548s (-0.00%) |
--> $DIR/null_pointer_write_zst.rs:LL:CC | ||
| | ||
LL | unsafe { std::ptr::null_mut::<[u8; 0]>().write(zst_val) }; | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: null pointer is a dangling pointer (it has no provenance) | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: null pointer is a dangling pointer (it has no provenance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost seems like for nice error messages in Miri it'd be better not to lower this to an assignment. Currently it looks like *ptr = ...
, so Miri sees the deref in *ptr
and emits the error accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I didn't think of it being a copy instead of a deref as meaningful here.
One other thing I tried was implementing write(p, x)
as *p.cast() = ManuallyDrop::new(x)
which I think is also a perfectly reasonable no-intrinsic implementation -- more obviously a typed write, which I think this is, given passing the parameter to the function is typed -- but would have the same "dereferencing pointer" error in MIRI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I guess really this is about write
implementation details where the user can't tell whether a deref happens before the write or not. I guess 'dereferencing' is not so bad here, I was just primed because of rust-lang/miri#2859.
The few perf improvements match or outweigh the few perf regressions. @rustbot label: +perf-regression-triaged |
Since they'd mentioned more intrinsics for simplifying stuff recently,
r? @WaffleLapkin
This is a continuation of me looking at foundational stuff that ends up with more instructions than it really needs. Specifically I noticed this one because
Range::next
isn't MIR-inlining, and one of the largest parts of it is areplace::<usize>
that's a good dozen instructions instead of the two it could be.So this means that
ptr::write
with aCopy
type no longer generates worse IR than manually dereferencing (well, at least in LLVM -- MIR still has bonus pointer casts), and in doing so means that we're finally down to just the two essentialmemcpy
s when emittingmem::replace
for a large type, rather than the bonus-alloca
and threememcpy
s we emitted before this (or the 6 we currently emit in 1.69 stable). That said, LLVM does usually manage to optimize the extra code away. But it's still nice for it not to have to do as much, thanks to (for example) not going through analloca
whenreplace
ing a primitive like ausize
.(This is a new intrinsic, but one that's immediately lowered to existing MIR constructs, so not anything that MIRI or the codegen backends or MIR semantics needs to do work to handle.)