Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining is broken in ARMv7+NEON #427

Closed
gnzlbg opened this issue Apr 12, 2018 · 17 comments
Closed

Inlining is broken in ARMv7+NEON #427

gnzlbg opened this issue Apr 12, 2018 · 17 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 12, 2018

MWE:

#![feature(stdsimd,target_feature)]
extern crate stdsimd;
use ::stdsimd::arch::arm::int8x8_t;
#[inline]
#[target_feature(enable = "neon,v7")]
pub unsafe fn bar(x: int8x8_t) -> int8x8_t {
    ::stdsimd::arch::arm::vadd_s8(x, x)
}
pub unsafe fn foo(x: int8x8_t) -> int8x8_t {
    bar(x)
}

produces:

foo::foo:
 push    {r11, lr}
 vldr    d0, [r1]
 bl      foo::bar
 pop     {r11, pc}

built using RUSTFLAGS="-C target-feature=+neon,+v7 -C lto=fat -C codegen-units=1 --emit asm -C debuginfo=0" CARGO_INCREMENTAL=0 cargo build --release --target=armv7-unknown-linux-gnueabihf.

@alexcrichton
Copy link
Member

My guess is that this has to do with rustc's feature list containg -neon and then we're adding on +neon later. I can't reproduce locally with the opt tool so I'm not really sure what's going on. Solving this will likely be fiddling with LLVM.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Apr 12, 2018

But you can reproduce with rustc right? @hsivonen has a PR with targets for armv7neon-unknown-linux-gnu and friends, so those might solve this problem.

@alexcrichton
Copy link
Member

Yes I can reproduce this with rustc. The LLVM IR emitted for this is:

; ModuleID = 'wut0-50fb5e8e8d375368727bf982ba5dd3c0.rs'
source_filename = "wut0-50fb5e8e8d375368727bf982ba5dd3c0.rs"
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-unknown-linux-gnueabihf"

; wut::bar
; Function Attrs: inlinehint norecurse nounwind
define internal fastcc void @_ZN3wut3bar17h1815328d92c8a501E(<8 x i8>* noalias nocapture dereferenceable(8), <8 x i8> %x.val) unnamed_addr #0 {
start:
  %1 = shl <8 x i8> %x.val, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  store <8 x i8> %1, <8 x i8>* %0, align 8
  ret void
}

; Function Attrs: nounwind
define <8 x i8> @foo(<8 x i8> %x) unnamed_addr #1 {
start:
  %tmp_ret = alloca <8 x i8>, align 8
  %0 = getelementptr inbounds <8 x i8>, <8 x i8>* %tmp_ret, i32 0, i32 0
; call wut::bar
  call fastcc void @_ZN3wut3bar17h1815328d92c8a501E(<8 x i8>* noalias nocapture nonnull dereferenceable(8) %tmp_ret, <8 x i8> %x)
  %1 = load <8 x i8>, <8 x i8>* %tmp_ret, align 8
  ret <8 x i8> %1
}

attributes #0 = { inlinehint norecurse nounwind "target-features"="+neon,+v7" }
attributes #1 = { nounwind }
attributes #2 = { argmemonly nounwind }
attributes #3 = { norecurse nounwind readnone "no-frame-pointer-elim"="true" }

Notably foo does not have "target-features"="+neon,+v7", meaning LLVM can't inline it. I think LLVM is supposed to be able to inline it based on the ambient target features enabled in the target machine, but this may not be communicated to the optimization passes or something like that. Or maybe it's getting confused? I'm not sure. This would require digging around in LLVM to figure out why the inlining isn't happening.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Apr 12, 2018

ambient target features enabled in the target machine, but this may not be communicated to the optimization passes or something like that.

If when compiling with llc works, maybe we are communicating this incorrectly?

@tstellar
Copy link

This works for me with the example above: opt -mattr=+neon,+v7 test.ll -inline -S -o -

Does rustc use setFunctionAttributes(CPUStr, FeaturesStr, *M) or does it just pass the target-features in when creating the TargetMachine?

@alexcrichton
Copy link
Member

@gnzlbg I'm not sure. I can't get opt to not inline it and then inline it with +neon on the command line. I think LLVM's internal defaults for armv7-unknown-linux-gnueabihf are different than rustc's internal defaults for that target.

@tstellar currently we pass -C target-feature-specified features when creating the TargetMachine, and then #[target_feature] per function uses setFunctionAttributes (I think)

@tstellar
Copy link

@alexcrichton Ok, rustc will need to manually add the -C target-feature-specified features to each function as well. The TargetMachine does not do this automatically.

@cuviper
Copy link
Member

cuviper commented Apr 12, 2018

Alternatively, maybe we should filter the function attributes we set to just those that are different than the target machine?

@alexcrichton
Copy link
Member

Ok thanks for the info @tstellar! And @cuviper yeah I'd imagine we could probably do small optimizations like that.

@tstellar
Copy link

tstellar commented Apr 12, 2018

I also should mention that when you specify the target-features attribute for a function, the LLVM backends are supposed to ignore the FeatureString that you passed to the TargetMachine when doing codegen for that function, so @cuviper 's suggestion wouldn't work.

@cuviper
Copy link
Member

cuviper commented Apr 12, 2018

Then we really do need to set complete features for all functions. Does clang do that too?

@tstellar
Copy link

Yes it does:

[tstellar@tstellar clang-build]$ echo 'void main(){}' | clang -w -S -emit-llvm -x c - -o -
; ModuleID = '-'
source_filename = "-"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @main() #0 {
entry:
  ret void
}

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 7.0.0 (trunk 329854) (llvm/trunk 329852)"}

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Apr 13, 2018

@tstellar thanks, I think we are not currently doing that anywhere.

@parched
Copy link

parched commented Apr 14, 2018

AFAIR, LLVM will not inline if the caller doesn't have a superset of features, so isn't this expected behaviour?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Apr 14, 2018

@parched note that I am compiling with RUSTFLAGS="-C target-feature=+v7,+neon" which should enable these features for all functions in the program and therefore, both functions should have the exact same of features. The bug is that they don't.

@parched
Copy link

parched commented Apr 14, 2018

@gnzlbg, ah yes I misread, ignore my comment then.

@alexcrichton
Copy link
Member

I've opened a PR for this at rust-lang/rust#50188

bors added a commit to rust-lang/rust that referenced this issue Apr 28, 2018
Add `-C target-feature` to all functions

Previously the features specified to LLVM via `-C target-feature` were only
reflected in the `TargetMachine` but this change *also* reflects these and the
base features inside each function itself. This change matches clang and...

Closes rust-lang/stdarch#427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants