Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next_back for a Vec is optimized incorrectly on i686 #52026

Closed
neivv opened this issue Jul 3, 2018 · 9 comments
Closed

next_back for a Vec is optimized incorrectly on i686 #52026

neivv opened this issue Jul 3, 2018 · 9 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*)

Comments

@neivv
Copy link

neivv commented Jul 3, 2018

> rustc --version -v
rustc 1.28.0-nightly (e3bf634e0 2018-06-28)
binary: rustc
commit-hash: e3bf634e060bc2f8665878288bcea02008ca346e
commit-date: 2018-06-28
host: i686-pc-windows-msvc
release: 1.28.0-nightly
LLVM version: 6.0

Optimized code of the following program panics if -C codegen-units is not 1.

struct S([u32; 2]);
fn main() {
    let x = vec![S([1, 2])];
    let mut it = x.iter();
    if let Some(_) = it.next_back() {
        assert!(it.next_back().is_none());
    }
}
@kennytm kennytm added O-x86 C-bug Category: This is a bug. labels Jul 3, 2018
@neivv
Copy link
Author

neivv commented Jul 4, 2018

I bisected the change to be between nightlies rustc 1.24.0-nightly (0077d12 2017-12-14) (good) and rustc 1.24.0-nightly (77efd68 2017-12-15) (bad)

I saw two differences in LLVM IR between good and bad versions:
(Disclaimer: I'm certainly not sure if I missed something else and this is just a red herring)
https://gist.github.com/neivv/c7e536269885d3b3fac8ae1a93086920
Bad one has struct definitions as

%S = type { [0 x i32], [2 x i32], [0 x i32] }
%"alloc::vec::Vec<S>" = type { [0 x i32], { i32*, i32 }, [0 x i32], i32, [0 x i32] }

and good one has

%S = type { [0 x i8], [2 x i32], [0 x i8] }
%"alloc::vec::Vec<S>" = type { [0 x i8], { i32*, i32 }, [0 x i8], i32, [0 x i8] }

bad IR also has an additional load instruction in the assertion panic branch
https://gist.github.com/neivv/c7e536269885d3b3fac8ae1a93086920#file-bad-ll-L496

@Mark-Simulacrum
Copy link
Member

cc @eddyb

@neivv
Copy link
Author

neivv commented Jul 8, 2018

So, I think this is an LLVM issue:

I tried to examine how the LLVM IR is optimized, and one transformation is quite.. suspicious
https://gist.github.com/neivv/daf93ad90c0d475c3abb119b1d01c584#file-neat-ll-L136
(The gist has main before-after the suspicious pass)

It's comparing vec's length to 0xe000_0001 - comparing to 0x1 would be correct, but somehow the three highest bits of the constant are set??
Interestingly multiplying 0xe000_0001 by sizeof(S) -- 8, is same as 0x1 * sizeof(S).

I don't know if there's anything that would allow LLVM do that transformation intentionally, but I don't really have any experience with LLVM either.

So I tried to reduce this to something that would reproduce, but modifying that IR to be something that opt accepts and running it through opt -instcombine, just does the expected transform, comparing slice length to 0x1. And since this bug doesn't occur with -C codegen-units=1, rustc --emit llvm-ir doesn't seem to produce anything that would reproduce the bug either.

@hanna-kruppe
Copy link
Contributor

I've not examine the IR in detail but that does sound quite suspiciously like something is deeply broken in LLVM. It might be a miscompilation in our build, or a misuse of the APIs when running passes in-process, or a wild write somewhere messing up data structures.

But since that sort of bug is very difficult to diagnose, it would be good to make sure whether we really can't reproduce it outside of rustc. For starters, you could add -print-module-pass to the LLVM arguments when running under -print-after-all. That will make it print the entire module even after function passes such as SimplifyCFG and InstCombine, and should give you something that you can feed into vanilla LLVM 6.0's opt unmodified.

@neivv
Copy link
Author

neivv commented Jul 9, 2018

Okay, I've got LLVM-only reproduction =)

https://gist.github.com/neivv/ae449690d80e0aad853251caa5a3863b#file-thing-ll
Running this through opt -S -instcombine will cause the odd transformation to length == 0xe000_0001 (-536870911). At least with the opt that is part of Rust's current LLVM.

Haven't tried to reduce from there / find if anything in the input module is actually suspicious yet.

@hanna-kruppe
Copy link
Contributor

@neivv
Copy link
Author

neivv commented Jul 9, 2018

Figured it out.

I haven't had a chance to test this hypothesis but noticed this while looking around how InstCombine works, and I'm confident that this is it:
https://github.com/llvm-mirror/llvm/blob/ed10960040a110b16e0415cc5ec320247384f721/lib/Transforms/InstCombine/InstCombineCompares.cpp#L528
In this issue, the code is comparing roughly (S *)slice_start == (S *)slice_start + slice_len - 1.
This function is extracting the constant byte offset, -8, and dividing it by 8, sizeof(S). However, before that, compiling to a 32-bit target, the values are masked by PtrSizeMask 0xFFFFFFFF, which discards the sign bit of int64 and gives +4294967288, which divided by 8 is the same 536870911 which ends up being in the result.

Here's also the best I was able to reduce IR while still reproducing the issue.

; ModuleID = 'tmp1-317d481089b8c8fe83113de504472633.rs'
source_filename = "tmp1-317d481089b8c8fe83113de504472633.rs"
target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
target triple = "i686-pc-windows-msvc"

%S = type { [0 x i32], [2 x i32], [0 x i32] }

; Function Attrs: uwtable
define internal void @_ZN3tmp4main17hcd108164c8e55929E() unnamed_addr {
  %1 = call { [0 x %S]*, i32 } @"_ZN68_$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..ops..deref..Deref$GT$5deref17h968a5bfabac70869E"()
  %2 = extractvalue { [0 x %S]*, i32 } %1, 0
  %3 = extractvalue { [0 x %S]*, i32 } %1, 1
  %4 = getelementptr inbounds [0 x %S], [0 x %S]* %2, i32 0, i32 0, i32 0, i32 0
  %5 = getelementptr inbounds [0 x %S], [0 x %S]* %2, i32 0, i32 %3, i32 0, i32 0
  %6 = bitcast i32* %4 to %S*
  %7 = bitcast i32* %5 to %S*
  %8 = getelementptr inbounds %S, %S* %7, i32 -1
  %9 = bitcast %S* %8 to i32*
  %10 = bitcast i32* %9 to %S*
  %11 = icmp eq %S* %10, %6
  br i1 %11, label %12, label %13

; <label>:12:                                     ; preds = %24
  ret void

; <label>:13:                                     ; preds = %12, %50
  ret void
}

declare hidden { [0 x %S]*, i32 } @"_ZN68_$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..ops..deref..Deref$GT$5deref17h968a5bfabac70869E"() unnamed_addr

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 9, 2018

Incredibly work! 👏

I've managed to reduce your test case a bit further. As should be expected from the root cause you identified, changing the pointer size to 64 bit makes the issue disappear. Trying to simplify the types and GEPs further runs into the problem that other (simpler and correct) GEP optimizations fire first.

target datalayout = "p:32:32"

%S = type { [2 x i32] }

define i1 @foo([0 x %S]* %p, i32 %n) {
  %start.cast = bitcast [0 x %S]* %p to %S*
  %end = getelementptr inbounds [0 x %S], [0 x %S]* %p, i32 0, i32 %n, i32 0, i32 0
  %end.cast = bitcast i32* %end to %S*
  %last = getelementptr inbounds %S, %S* %end.cast, i32 -1
  %cmp = icmp eq %S* %last, %start.cast
  ret i1 %cmp
}

Someone with a http://bugs.llvm.org/ account should report this issue upstream (I should really request one soon, but, well, I'm not doing that now).

@mati865
Copy link
Contributor

mati865 commented Dec 7, 2018

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Dec 7, 2018
This was referenced Dec 13, 2018
Centril added a commit to Centril/rust that referenced this issue Dec 16, 2018
@Noratrieb Noratrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*)
Projects
None yet
Development

No branches or pull requests

7 participants