Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mem::replace simpler in codegen #111010

Merged
merged 3 commits into from
May 1, 2023

Conversation

scottmcm
Copy link
Member

Since they'd mentioned more intrinsics for simplifying stuff recently,
r? @WaffleLapkin

This is a continuation of me looking at foundational stuff that ends up with more instructions than it really needs. Specifically I noticed this one because Range::next isn't MIR-inlining, and one of the largest parts of it is a replace::<usize> that's a good dozen instructions instead of the two it could be.

So this means that ptr::write with a Copy type no longer generates worse IR than manually dereferencing (well, at least in LLVM -- MIR still has bonus pointer casts), and in doing so means that we're finally down to just the two essential memcpys when emitting mem::replace for a large type, rather than the bonus-alloca and three memcpys we emitted before this (or the 6 we currently emit in 1.69 stable). That said, LLVM does usually manage to optimize the extra code away. But it's still nice for it not to have to do as much, thanks to (for example) not going through an alloca when replaceing a primitive like a usize.

(This is a new intrinsic, but one that's immediately lowered to existing MIR constructs, so not anything that MIRI or the codegen backends or MIR semantics needs to do work to handle.)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 30, 2023
@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

_4 = &raw mut (*_1); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL
StorageLive(_6); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL
(*_4) = _2; // scope 8 at $SRC_DIR/core/src/ptr/mod.rs:LL:COL
StorageDead(_6); // scope 3 at $SRC_DIR/core/src/mem/mod.rs:LL:COL
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These un-useful StorageLive(_6)+StorageDead(_6) will go away with #110702

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2023

The Miri subtree was changed

cc @rust-lang/miri

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2023
@bors
Copy link
Contributor

bors commented Apr 30, 2023

⌛ Trying commit 328e1db036ffa25505d1c758566535b7daee3b29 with merge 585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d...

@bors
Copy link
Contributor

bors commented Apr 30, 2023

☀️ Try build successful - checks-actions
Build commit: 585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d (585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d)

@rust-timer

This comment has been minimized.

@est31
Copy link
Member

est31 commented Apr 30, 2023

This could maybe also be used inside the vec macro, in stead of #[rustc_box]. see the discussion in #110715.

CC also #80290 which this PR is a reversal of I think. To be clear, I'm in favour of this PR.

@scottmcm
Copy link
Member Author

Thanks, @est31 !

I think the difference that makes sense here is related to this note from that PR:

This means we can also remove move_val_init implementations in codegen and Miri, and its special handling in the borrow checker.

If this PR needed to update the borrow checker particularly, but even just codegen or CTFE or Miri code, then it'd absolutely not be worth doing.

But using a new intrinsic that lowers to existing MIR functionality (instead of using intrinsics::forget!) seems reasonable to me.


pub fn replace_byte(dst: &mut u8, src: u8) -> u8 {
std::mem::replace(dst, src)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "direct memcpy" name for this test no longer made sense, as it doesn't call memcpy any more.

What it was testing is subsumed by the new tests/codegen/mem-replace-simple-type.rs test below.

@scottmcm scottmcm force-pushed the mem-replace-simpler branch from 328e1db to a0a0d69 Compare April 30, 2023 19:06
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (585ceb46c7c35b1ad9ade2ba4c3ebea1512ad64d): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-1.5%, -1.0%] 3
Improvements ✅
(secondary)
-0.6% [-0.6%, -0.6%] 1
All ❌✅ (primary) -0.8% [-1.5%, 0.6%] 4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.0% [3.1%, 10.3%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.4% [-3.8%, -0.1%] 3
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) 2.4% [-3.8%, 10.3%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.6% [-1.8%, -1.3%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.6% [-1.8%, -1.3%] 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 30, 2023
library/core/src/ptr/mod.rs Outdated Show resolved Hide resolved
@scottmcm scottmcm force-pushed the mem-replace-simpler branch from a0a0d69 to 3456f77 Compare May 1, 2023 05:33
Copy link
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

Perf wise it looks like the codegen_crate is now slower (I assume due to more lowering happening?) while LLVM is faster (I assume due to better llvm-ir passed to it), this is along with a lot of noise, probably because of the changes of how compiler itself is compiled. ripgrep regression is weird, but overall I think this looks fine.

r=me, with or without the nit.

// to `dst` while `src` is owned by this function.
unsafe {
copy_nonoverlapping::<T>(&value, ptr, 1);
forget(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important, but could you use ManuallyDrop instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably could, but the preexisting implementation of write went out of its way to use intrinsics::forget

intrinsics::forget(src);

even though mem::forget doesn't

pub const fn forget<T>(t: T) {
let _ = ManuallyDrop::new(t);
}

which is probably just because it has much better codegen, but in case it actually matters I'd rather just leave it like this since it sounds like you don't feel particularly strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay 👍🏻

@WaffleLapkin
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented May 1, 2023

📌 Commit 3456f77 has been approved by WaffleLapkin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 1, 2023
@bors
Copy link
Contributor

bors commented May 1, 2023

⌛ Testing commit 3456f77 with merge 6db1e5e...

@bors
Copy link
Contributor

bors commented May 1, 2023

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing 6db1e5e to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 1, 2023
@bors bors merged commit 6db1e5e into rust-lang:master May 1, 2023
@rustbot rustbot added this to the 1.71.0 milestone May 1, 2023
@scottmcm scottmcm deleted the mem-replace-simpler branch May 1, 2023 17:35
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6db1e5e): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 0.6%] 3
Regressions ❌
(secondary)
0.3% [0.2%, 0.5%] 2
Improvements ✅
(primary)
-1.3% [-1.8%, -1.0%] 3
Improvements ✅
(secondary)
-0.5% [-0.7%, -0.4%] 3
All ❌✅ (primary) -0.5% [-1.8%, 0.6%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
4.6% [3.0%, 8.5%] 4
Regressions ❌
(secondary)
2.6% [2.2%, 3.0%] 2
Improvements ✅
(primary)
-2.9% [-5.0%, -0.1%] 3
Improvements ✅
(secondary)
-1.0% [-1.0%, -0.9%] 2
All ❌✅ (primary) 1.4% [-5.0%, 8.5%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-1.6%, -1.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-1.6%, -1.3%] 2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.4%] 12
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-1.0%, -0.0%] 37
Improvements ✅
(secondary)
-0.0% [-0.1%, -0.0%] 12
All ❌✅ (primary) -0.1% [-1.0%, 0.4%] 49

Bootstrap: 656.574s -> 656.548s (-0.00%)

--> $DIR/null_pointer_write_zst.rs:LL:CC
|
LL | unsafe { std::ptr::null_mut::<[u8; 0]>().write(zst_val) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ memory access failed: null pointer is a dangling pointer (it has no provenance)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: null pointer is a dangling pointer (it has no provenance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost seems like for nice error messages in Miri it'd be better not to lower this to an assignment. Currently it looks like *ptr = ..., so Miri sees the deref in *ptr and emits the error accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess I didn't think of it being a copy instead of a deref as meaningful here.

One other thing I tried was implementing write(p, x) as *p.cast() = ManuallyDrop::new(x) which I think is also a perfectly reasonable no-intrinsic implementation -- more obviously a typed write, which I think this is, given passing the parameter to the function is typed -- but would have the same "dereferencing pointer" error in MIRI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I guess really this is about write implementation details where the user can't tell whether a deref happens before the write or not. I guess 'dereferencing' is not so bad here, I was just primed because of rust-lang/miri#2859.

@nnethercote
Copy link
Contributor

The few perf improvements match or outweigh the few perf regressions.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.