-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arc::new
duplicates stack memory
#111603
Comments
@rustbot label +A-codegen +I-slow |
Godbolt: https://rust.godbolt.org/z/GKWY8nT78 |
It seems this is not unique to It may be useful to note that without the |
Fix duplicate `arcinner_layout_for_value_layout` calls when using the uninit `Arc` constructors What this fixes is the duplicate calls to `arcinner_layout_for_value_layout` seen here: https://godbolt.org/z/jr5Gxozhj The issue was discovered alongside rust-lang#111603 but is otherwise unrelated to the duplicate `alloca`s, which remain unsolved. Everything I tried to solve said main issue has failed. As for the duplicate layout calculations, I also tried slapping `#[inline]` and `#[inline(always)]` on everything in sight but the only thing that worked in the end is to dedup the calls by hand.
This applies to a lot of things that takes an array as input. For example, Box and Vec https://rust.godbolt.org/z/rzoPneT3q I have a potential fix in https://reviews.llvm.org/D151638. TLDR, marking a tail call makes Alias Analysis smarter, a memcpy can be optimized, and a dead alloca can be removed. I'll elaborate below.
Interesting is that if you take the optimized code that @nikic posted and run it through |
The duplicate alloca could also have been removed if rust_alloc was set to only access inaccessible memory. It is set for libc malloc here. If its correct, maybe rustc can set it for rust_alloc (Big emphasis on if its correct. I don't know about the internals). |
I also want to note that this is the rustc code that emits code for the arrays. It intentionally iterates from the start to end pointer. It would probably not be hard to change the code to iterate over array indices instead, which would help LLVM optimizations, but I'm not sure if that is a desirable change. |
Compares of the same object do not leak any bits. This patch introduces getUnderlyingObjectLookThrough. It looks at the output of getUnderlyingObject. If it is a PHI, it looks at all the incoming underlying objects. If all those objects are the same, or the original PHI, we determine that there is a new underlying object. This is similar to getUnderlyingObjects, but provides a more efficient way to find a single underlying object. This is an attempt at solving huge compile time regressions in https://reviews.llvm.org/D152082. First, we only look through a single PHI, not nested PHIs. Second, we only use one callsite. There are likely other callsites that could take advantage of this over the vanilla getUnderlyingObjects. We need to be careful about compile times. Adding this to BasicAA::aliasCheck increases compile times by 3% on local builds. This can hopefully lead to improved rustc generated code in rust-lang/rust#111603. rustc generates pointers comparisons that this patch can identify as non capturing. Differential Revision: https://reviews.llvm.org/D152241
Huh, I was under the impression that iterating over GEPs ( |
@the8472 I am not 100% which is better, maybe there are drawbacks to iterating over indices that I am not aware of. Could you share some examples of where else there might be GEP iterating? I have a change in this LLVM PR which should make LLVM smarter about comparisons against GEP inductions, but I'm questioning whether we should merge it. Examples could help next steps. |
This lead me to experiment with various loop shapes for slice::Iter::fold in #106343 and an index-based do-while loop + a zero-len precheck did yield the largest wins out of the approaches I tried. |
cg_llvm: use index-based loop in write_operand_repeatedly This should be easier for LLVM to analyze. Fixes #111603 This needs a perf run. [cc](rust-lang/rust#111603 (comment)) `@caojoshua`
Compares of the same object do not leak any bits. This patch introduces getUnderlyingObjectLookThrough. It looks at the output of getUnderlyingObject. If it is a PHI, it looks at all the incoming underlying objects. If all those objects are the same, or the original PHI, we determine that there is a new underlying object. This is similar to getUnderlyingObjects, but provides a more efficient way to find a single underlying object. This is an attempt at solving huge compile time regressions in https://reviews.llvm.org/D152082. First, we only look through a single PHI, not nested PHIs. Second, we only use one callsite. There are likely other callsites that could take advantage of this over the vanilla getUnderlyingObjects. We need to be careful about compile times. Adding this to BasicAA::aliasCheck increases compile times by 3% on local builds. This was inspired by the issues in rust-lang/rust#111603. rustc used to generate pointers comparisons that this patch can identify as non capturing. The code emission was changed in rustc to avoid this behavior, but this patch can help with other cases of pointer inductions.
cg_llvm: use index-based loop in write_operand_repeatedly This should be easier for LLVM to analyze. Fixes #111603 This needs a perf run. [cc](rust-lang/rust#111603 (comment)) `@caojoshua`
cg_llvm: use index-based loop in write_operand_repeatedly This should be easier for LLVM to analyze. Fixes #111603 This needs a perf run. [cc](rust-lang/rust#111603 (comment)) `@caojoshua`
Running the following through Godbolt, using rustc 1.71.0-nightly (cba1407 2023-05-10) with
-C opt-level=3
:I expected rustc to create an 8 KB stack allocation and a memcpy from it to the heap allocation, instead it creates 2 identical stack allocations and 2 memcpys to go along with them:
Trying to narrow the problem down further, when eliminating the stack allocations entirely:
The code duplication still persists, and is limited to the
arcinner_layout_for_value_layout
call:The text was updated successfully, but these errors were encountered: