Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In latest nightly observing a to-be returned variable changes its value. #84667

Closed
JAicewizard opened this issue Apr 28, 2021 · 15 comments
Closed
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@JAicewizard
Copy link

Code

I have not been able to reproduce this with pure c code. I found this while working with cgo+rust, and managed to make a small-ish reproducible setup.
I tried this code:

lib.rs

use softposit::{P32, P16};

#[no_mangle] pub extern "C" fn positdiv32(a: u32, b:u32) -> u32{
    let a = P32::from_bits(a);
    let b = P32::from_bits(b);
    let c = a/b;
    let x = c.to_bits();
    // if x >> 31 == 0{
    //     print!("hey!\n");  
    // }
    // print!("{:}\n",x);
    x
}

toml file

[package]
name = "positrs"
version = "0.1.0"
authors = ["jaap aarts <[email protected]>"]
edition = "2018"

[lib]
crate-type = ["cdylib"]

[dependencies]
softposit = "0.3.9"

positrs.h

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

uint32_t positdiv32(uint32_t a, uint32_t b);

c file

#include <stdio.h>
#include <stdint.h>
#include "positrs.h"

void binprintf(uint32_t v)
{
    unsigned int mask=1<<((sizeof(int)<<3)-1);
    while(mask) {
        printf("%d", (v&mask ? 1 : 0));
        mask >>= 1;
    }
}

void main(){
	uint32_t a = 0b11000100001110111011110011001000;
	uint32_t b = 0b01011111011001010011000111110001;
	uint32_t c = positdiv32(a,b);
	binprintf(c);
}

run

cargo build --release
cp target/release/libpositrs.so .
gcc -g main.c -o call_rust -L. -lpositrs
LD_LIBRARY_PATH=. ./call_rust

And you will see that this prints 00011110000111010100101001000110⏎
This is (to me) clearly wrong since dividing a negative number with a positive number should be a negative number, and this is a positive number.
So I tried to print the variable in rust, see where it went wrong...

Uncomment the printf!(...) line and run the following commands

cargo build --release
cp target/release/libpositrs.so .
LD_LIBRARY_PATH=. ./call_rust

and this luckely prints the exact same binary value....

3789731258
11100001111000101011010110111010⏎

oh.

somehow adding the print statement changes the return value of the function when exported to C.
The second one is correct, the first is not.

I also tried just observing x, without actually printing it, using the if statement above the print! and it had the same result as the print!

I expected to see this happen:

I expected the value to be the same no-matter what if the value is being observed, and this to be the correct value of 0b11100001111000101011010110111010

Instead, this happened: explanation

Instead I got the wrong number after updating to the latest nightly.

Version it worked on

Currently I know for sure it works on rustc 1.52.0-beta.6 (f97769a2b 2021-04-27)
It also previously worked on nightly, but I dont think I can see the version history.

Version with regression

current broken nightly version: rustc 1.53.0-nightly (42816d61e 2021-04-24)

rustc --version --verbose:

rustc 1.52.0-beta.6 (f97769a2b 2021-04-27)
binary: rustc
commit-hash: f97769a2b7bc5a6b3d8f7140613cf26352f2b518
commit-date: 2021-04-27
host: x86_64-unknown-linux-gnu
release: 1.52.0-beta.6
LLVM version: 12.0.0

The compiler did not crash.

@jonas-schievink jonas-schievink added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Apr 28, 2021
@tmiasko
Copy link
Contributor

tmiasko commented Apr 28, 2021

The binprintf contains undefined behaviour because result of a bitwise shift is not representable (which may or may not be relevant to the issue if any).

@JAicewizard
Copy link
Author

note that the wrong behaviour is without the printf, and that simply observing x is enough to "fix" it. even comparing to 0 makes it work.

@JAicewizard
Copy link
Author

JAicewizard commented Apr 28, 2021

Some more possibly usefull information, I have a bunch of very similar functions, with the only diference being the operation being done on a and b (*,+,sqrt, round, truncate) and this is the only one showing this behaviour.

The bitpattern itself is also not interesting, as long as the result of the operation is negative it gives this result. Wen pluging in 2 positive posit bit values or 2 negative ones, it works as expected.

@tmiasko
Copy link
Contributor

tmiasko commented Apr 28, 2021

The mask calculation in binprintf on the C side, which as far as I can see is consistently used to demonstrate the issue, shifts 1 by 31 which is undefined behaviour. Anything it prints should be considered untrustworthy until it is fixed.

@JAicewizard
Copy link
Author

JAicewizard commented Apr 28, 2021

oh that. Well the bug originally came from go, where there is well-defined binary printing, and the bug shows there too.
I can try to just print the value and just check if its negative as an easy test, will do in a minute.

I just copy-pasted this code from google because c doesnt have a binary print formatter.

@JAicewizard
Copy link
Author

I dont even think shifting is undefined for non-negative integers, but here you go:

#include <stdio.h>
#include <stdint.h>
#include "positrs.h"

void main(){
	uint32_t a = 0b11000100001110111011110011001000;
	uint32_t b = 0b01011111011001010011000111110001;
	uint32_t c = positdiv32(a,b);
	printf("%d", c);
}

same problem same "fix".

@tmiasko
Copy link
Contributor

tmiasko commented Apr 28, 2021

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Anyway, bisection points to 635ccfe.

@tmiasko
Copy link
Contributor

tmiasko commented Apr 29, 2021

The relevant part of diff looks as follows:

--- a	2021-04-29 09:08:20.057255690 +0200
+++ b	2021-04-29 09:08:21.221282758 +0200
@@ -1,47 +1,47 @@
 <positdiv32>:
 	                   b8 00 00 00 80       	mov    $0x80000000,%eax
 	                   81 ff 00 00 00 80    	cmp    $0x80000000,%edi
 	            /----- 74 10                	je     <positdiv32+0x1d>
 	            |      89 f1                	mov    %esi,%ecx
 	            |      81 c9 00 00 00 80    	or     $0x80000000,%ecx
 	            |      81 f9 00 00 00 80    	cmp    $0x80000000,%ecx
 	            |  /-- 75 01                	jne    <positdiv32+0x1e>
 	            \--|-> c3                   	ret    
 	               \-> 85 ff                	test   %edi,%edi
 	      /----------- 74 4e                	je     <positdiv32+0x70>
 	      |            55                   	push   %rbp
 	      |            41 57                	push   %r15
 	      |            41 56                	push   %r14
 	      |            53                   	push   %rbx
-	      |            50                   	push   %rax
+	      |            48 83 ec 08          	sub    $0x8,%rsp
 	      |            41 0f 98 c7          	sets   %r15b
 	      |            85 f6                	test   %esi,%esi
 	      |            41 0f 98 c6          	sets   %r14b
 	      |            89 fa                	mov    %edi,%edx
 	      |            f7 da                	neg    %edx
 	      |            0f 4c d7             	cmovl  %edi,%edx
 	      |            89 f0                	mov    %esi,%eax
 	      |            f7 d8                	neg    %eax
 	      |            0f 4c c6             	cmovl  %esi,%eax
 	      |            8d 0c 95 00 00 00 00 	lea    0x0(,%rdx,4),%ecx
 	      |            f7 c2 00 00 00 40    	test   $0x40000000,%edx
-	      |  /-------- 0f 85 1f 00 00 00    	jne    <positdiv32+0x73>
+	      |  /-------- 75 20                	jne    <positdiv32+0x73>
 	      |  |         40 b5 ff             	mov    $0xff,%bpl
 	      |  |         85 c9                	test   %ecx,%ecx
-	      |  |  /----- 0f 88 09 00 00 00    	js     <positdiv32+0x68>
-	      |  |  |      90                   	nop
+	      |  |  /----- 0f 88 0a 00 00 00    	js     <positdiv32+0x68>
+	      |  |  |      66 90                	xchg   %ax,%ax
 	      |  |  |  /-> 40 80 c5 ff          	add    $0xff,%bpl
 	      |  |  |  |   01 c9                	add    %ecx,%ecx
 	      |  |  |  \-- 79 f8                	jns    <positdiv32+0x60>
 	      |  |  \----> 81 e1 fe ff ff 7f    	and    $0x7ffffffe,%ecx
 	      |  |  /----- eb 18                	jmp    <positdiv32+0x88>
 	      \--|--|----> 31 c0                	xor    %eax,%eax
 	         |  |      c3                   	ret    
 	         \--|----> 31 ed                	xor    %ebp,%ebp
 	            |      85 c9                	test   %ecx,%ecx
 	            +----- 0f 89 0b 00 00 00    	jns    <positdiv32+0x88>
 	            |      0f 1f 00             	nopl   (%rax)
 	            |  /-> 40 80 c5 01          	add    $0x1,%bpl
 	            |  |   01 c9                	add    %ecx,%ecx
 	            |  \-- 78 f8                	js     <positdiv32+0x80>
 	            \----> 89 cb                	mov    %ecx,%ebx

The change of push %rax to sub $0x8,%rsp, clobbers SF which is used by sets %r15b.

@rustbot modify labels: +A-LLVM

@rustbot rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 29, 2021
@nikic
Copy link
Contributor

nikic commented Apr 29, 2021

cc @cuviper @nagisa More stack probing fun.

@tmiasko
Copy link
Contributor

tmiasko commented Apr 29, 2021

LLVM bug report https://bugs.llvm.org/show_bug.cgi?id=50165.

@rustbot modify labels: +regression-from-stable-to-beta

@rustbot
Copy link
Collaborator

rustbot commented Apr 29, 2021

Error: Label +regression-from-stable-to-beta can only be set by Rust team members

Please let @rust-lang/release know if you're having trouble with this bot.

@rustbot rustbot added regression-from-stable-to-beta Performance or correctness regression from stable to beta. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Apr 29, 2021
@pnkfelix pnkfelix added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 29, 2021
@Mark-Simulacrum Mark-Simulacrum added this to the 1.52.0 milestone Apr 29, 2021
@pnkfelix
Copy link
Member

Discussed in T-compiler triage meeting.

We are going to revert PR #77885 on nightly and beta. probe-stack=inline-asm seems like its a little bit too buggy for us to rely on as of right now.

@pnkfelix pnkfelix self-assigned this Apr 29, 2021
This was referenced Apr 29, 2021
bors pushed a commit to rust-lang-ci/rust that referenced this issue Apr 29, 2021
…in, for now.

We had already reverted the change on stable back in PR rust-lang#83412.

Since then, we've had some movement on issue rust-lang#83139, but not a 100% fix.

But also since then, we had bug reported, issue rust-lang#84667, that looks like outright
codegen breakage, rather than problems confined to debuginfo issues.

So we are reverting PR rust-lang#77885 on stable and beta. We'll reland PR rust-lang#77885 (or some
variant) switching back to an LLVM-dependent selection of out-of-line call vs
inline-asm, after these other issues have been resolved.
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 29, 2021
…ark-Simulacrum

Revert PR 77885 everywhere

Change to probe-stack=call (instead of inline-or-call) everywhere again, for now.

We had already reverted the change on stable back in PR rust-lang#83412.

Since then, we've had some movement on issue rust-lang#83139, but not a 100% fix.

But also since then, we had bug reported, issue rust-lang#84667, that looks like outright codegen breakage, rather than problems confined to debuginfo issues.    So we are reverting PR rust-lang#77885 on stable and beta. We'll reland PR rust-lang#77885 (or some    variant) switching back to an LLVM-dependent selection of out-of-line call vs    inline-asm, after these other issues have been resolved.
@pnkfelix
Copy link
Member

(also, I will look into making regression test for this issue, in order to ensure that future attempts to land PR #77885 do not forget to deal with this problem.)

@pietroalbini
Copy link
Member

A fix for this is available from 1.53 onwards (nightly and soon enough beta), and it's been backported to 1.52 stable. Closing the issue. If this is still happening please leave a comment and we'll reopen it!

nbdd0121 added a commit to nbdd0121/linux that referenced this issue Jun 6, 2021
We do not support function-call stack probes, and LLVM inline asm
stack probe is currently broken [1][2]. Switching target's `stack-probes`
to "none" allows us to avoid broken codegen and the intrinsic call.
We could switch the `stack-probes` property to `inline` if we want once
the LLVM codegen issue is resolved.

[1]: https://bugs.llvm.org/show_bug.cgi?id=50165
[2]: rust-lang/rust#84667

Signed-off-by: Gary Guo <[email protected]>
nbdd0121 added a commit to nbdd0121/linux that referenced this issue Jun 6, 2021
We do not support function-call stack probes, and LLVM inline asm
stack probe is currently broken [1][2]. Switching target's `stack-probes`
to "none" allows us to avoid broken codegen and the intrinsic call.
We could switch the `stack-probes` property to `inline` if we want once
the LLVM codegen issue is resolved.

[1]: https://bugs.llvm.org/show_bug.cgi?id=50165
[2]: rust-lang/rust#84667

Signed-off-by: Gary Guo <[email protected]>
nbdd0121 added a commit to nbdd0121/linux that referenced this issue Jun 6, 2021
We do not support function-call stack probes, and LLVM inline asm stack
probe is currently broken [1][2]. Switching target's `stack-probes`
to "none" allows us to avoid broken codegen and the intrinsic call.
We could switch the `stack-probes` property to `inline` if we want once
the LLVM codegen issue is resolved.

[1]: https://bugs.llvm.org/show_bug.cgi?id=50165
[2]: rust-lang/rust#84667

Signed-off-by: Gary Guo <[email protected]>
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 10, 2022
@cuviper
Copy link
Member

cuviper commented Sep 23, 2022

This problem with inline stack probes should be fixed in LLVM 16 by llvm/llvm-project@26c37b4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants