Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary copy when constructing arrays from returned arrays? #62446

Open
bwesterb opened this issue Jul 6, 2019 · 13 comments
Open

Unnecessary copy when constructing arrays from returned arrays? #62446

bwesterb opened this issue Jul 6, 2019 · 13 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by the Named Return Value Opt. (NRVO) C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bwesterb
Copy link

bwesterb commented Jul 6, 2019

I want to construct a big array by concatenating smaller arrays returned by other functions. As a simple example:

pub fn g(f: &Fn() -> [u64;40]) -> [u64;80] {
    let a = f();
    let b = f();
    let mut ret = [0u64;80];
    ret[0..40].copy_from_slice(&a[..]);
    ret[40..80].copy_from_slice(&b[..]);
    ret
}

Rust nightly generates a call to memcpy.

Is there a way to prevent this memcpy? Am I missing obvious other way to write this function?

Of course one could rewrite the called function f to take a &mut [u64] instead of returning the array, but that removes compile-time checks on the length and introduces bounds checks. Using &mut [u64;40] as an "out" argument solves that problem, but then I don't see a safe way to get two &mut [u64;40] into [u64;80] without using transmute.

(Background: I'm implementing the XMSSMT hash-based signature in Rust, which involves concatenating lots of hashes. The usual Rust hash library returns an array (actually a GenericArray) instead of using a &mut [u64;...] parameter which led me to believe that the copy could be optimised away.)

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Jul 7, 2019

What you're looking for is commonly known as "named return value optimization". It's currently not implemented in rust but progress is being tracked in #32966.

In the meantime, if you are willing to take an out-param (&mut [u64; 80]), it's not too hard to convert to an array reference with an arbitrary size and offset. I've used arrayref for this purpose in the (distant) past. You would also want to change your function parameter to use an out-param (f: impl Fn(&mut [u64; 40])).

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 7, 2019
@nikic
Copy link
Contributor

nikic commented Jul 7, 2019

LLVM isn't very good at optimizing away memcpy's. I think it would be possible to eliminate one of the memcpy's by extending call slot optimization to work with a destination that is a GEP of alloca.

@jonas-schievink jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 17, 2020
@jonas-schievink jonas-schievink added A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Apr 13, 2020
@jonas-schievink jonas-schievink added the A-mir-opt-nrvo Fixed by the Named Return Value Opt. (NRVO) label May 14, 2020
@nikic
Copy link
Contributor

nikic commented Oct 17, 2020

Together with a few other changes, https://reviews.llvm.org/D89623 will remove the last memcpy in g1 and g4. It also removes one of the memcpys in g3.

g2 would require mutable noalias to be reenabled.

@nikic nikic self-assigned this Oct 17, 2020
@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

On beta we generate 1, 2, 2, 1 memcpys. On nightly we generate 1, 2, 1, 0 memcpys. With -Z mutable-noalias we generate 1, 0, 1, 0 memcpys.

Not sure why the one memcpy in the first function is still there. It gets dropped when I run -memcpyopt on the final IR, so presumably a phase ordering issue.

@nikic
Copy link
Contributor

nikic commented Mar 12, 2021

For g the IR after -memcpyopt is:

; Function Attrs: nounwind nonlazybind
define void @_ZN6test111g17h45fb2453d8cc5713E([80 x i64]* noalias nocapture sret([80 x i64]) dereferenceable(640) %0, {}* nonnull align 1 %1, [3 x i64]* noalias nocapture readonly align 8 dereferenceable(24) %2) unnamed_addr #0 {
  %4 = alloca [40 x i64], align 8
  %5 = alloca [40 x i64], align 8
  %6 = bitcast [40 x i64]* %5 to i8*
  call void @llvm.lifetime.start.p0i8(i64 320, i8* nonnull %6)
  %7 = getelementptr inbounds [3 x i64], [3 x i64]* %2, i64 0, i64 3
  %8 = bitcast i64* %7 to void ([40 x i64]*, {}*)**
  %9 = load void ([40 x i64]*, {}*)*, void ([40 x i64]*, {}*)** %8, align 8, !invariant.load !2, !nonnull !2
  %10 = bitcast [80 x i64]* %0 to [40 x i64]*
  call void %9([40 x i64]* noalias nocapture nonnull sret([40 x i64]) dereferenceable(320) %10, {}* nonnull align 1 %1) #5
  %11 = bitcast [40 x i64]* %4 to i8*
  call void @llvm.lifetime.start.p0i8(i64 320, i8* nonnull %11)
  call void %9([40 x i64]* noalias nocapture nonnull sret([40 x i64]) dereferenceable(320) %4, {}* nonnull align 1 %1) #5
  %12 = bitcast [80 x i64]* %0 to i8*
  %13 = getelementptr i8, i8* %12, i64 320
  call void @llvm.memset.p0i8.i64(i8* align 8 %13, i8 0, i64 320, i1 false)
  %14 = getelementptr inbounds [80 x i64], [80 x i64]* %0, i64 0, i64 40
  %15 = bitcast i64* %14 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(320) %15, i8* nonnull align 8 dereferenceable(320) %11, i64 320, i1 false) #5
  call void @llvm.lifetime.end.p0i8(i64 320, i8* nonnull %11)
  call void @llvm.lifetime.end.p0i8(i64 320, i8* nonnull %6)
  ret void
}

The problem is that we have a clobbering memset which is dead, but will only be eliminated by the following DSE pass.

@nikic nikic assigned nikic and unassigned nikic Mar 12, 2021
@nikic
Copy link
Contributor

nikic commented Mar 13, 2021

The memcpy in the first example should be eliminated by llvm/llvm-project@9080444 and llvm/llvm-project@2902bde (possibly the first one isn't needed) by side-stepping the phase ordering issue. MemCpyOpt happened to implement partial DSE for this particular case already.

@nikic
Copy link
Contributor

nikic commented Aug 22, 2021

After #87570 we're now down to 0, 0, 1, 0 memcpys.

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Feb 19, 2022
@nikic nikic removed their assignment Feb 19, 2022
@nikic
Copy link
Contributor

nikic commented Feb 19, 2022

I think this is about as fixed as it's going to get, but it may be worthwhile to add some codegen tests for it.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@faptc
Copy link

faptc commented May 20, 2024

This now regresses in first 3 cases: https://godbolt.org/z/W6ExP7hdc
@rustbot label -E-needs-test

@rustbot rustbot removed the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label May 20, 2024
@nikic
Copy link
Contributor

nikic commented May 21, 2024

I believe the regression in g2 is actually a correctness fix in that we don't have consensus that the optimization would be valid (cc @RalfJung for another example where we need "spurious store" on &mut for optimization). Previously this worked due to an LLVM bug, which failed to consider whether introducing spurious stores is valid for call slot optimization.

@RalfJung
Copy link
Member

RalfJung commented May 21, 2024

What's the g2 you are referring to? Some comment above also mentions a g1 and a g4. But the issue description only has a g. Does the issue description need updating?

Anyway I added this to rust-lang/unsafe-code-guidelines#133.

@nikic
Copy link
Contributor

nikic commented May 21, 2024

@RalfJung It refers to the godbolt links, e.g. https://godbolt.org/z/W6ExP7hdc.

@RalfJung
Copy link
Member

Okay so the issue description is extremely outdated then. That's quite confusing when coming anew to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations A-mir-opt-nrvo Fixed by the Named Return Value Opt. (NRVO) C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants