Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to elide bound checks for multiple slice writes by index after length check that can't overflow #55147

Closed
hsivonen opened this issue Oct 17, 2018 · 8 comments · Fixed by #134892
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@hsivonen
Copy link
Member

hsivonen commented Oct 17, 2018

Nightly Rust Godbolt link.

These two functions should logically compile to the same code but don't:

pub fn unchecked(dst: &mut [u8], offset: usize) {
    let mut i = offset;
    if i.checked_add(4).unwrap() <= dst.len() {
        unsafe {
            *(dst.get_unchecked_mut(i)) = 1;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 2;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 3;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 4;
        }
    }
}

pub fn checked(dst: &mut [u8], offset: usize) {
    let mut i = offset;
    if i.checked_add(4).unwrap() <= dst.len() {
        dst[i] = 1;
        i += 1;
        dst[i] = 2;
        i += 1;
        dst[i] = 3;
        i += 1;
        dst[i] = 4;
    } 
}

The output is:

example::unchecked:
        push    rax
        mov     rax, rdx
        add     rax, 4
        jb      .LBB0_4
        cmp     rax, rsi
        ja      .LBB0_3
        mov     dword ptr [rdi + rdx], 67305985
.LBB0_3:
        pop     rax
        ret
.LBB0_4:
        lea     rdi, [rip + .L__unnamed_1]
        call    qword ptr [rip + _ZN4core9panicking5panic17hd62333a8bd86ba63E@GOTPCREL]
        ud2

example::checked:
        push    rax
        mov     rcx, rdx
        add     rcx, 4
        jb      .LBB1_14
        mov     rax, rsi
        cmp     rcx, rsi
        ja      .LBB1_7
        cmp     rdx, rax
        jae     .LBB1_8
        mov     byte ptr [rdi + rdx], 1
        lea     rsi, [rdx + 1]
        cmp     rsi, rax
        jae     .LBB1_11
        mov     byte ptr [rdi + rdx + 1], 2
        lea     rsi, [rdx + 2]
        cmp     rsi, rax
        jae     .LBB1_12
        mov     byte ptr [rdi + rdx + 2], 3
        add     rdx, 3
        cmp     rdx, rax
        jae     .LBB1_13
        mov     byte ptr [rdi + rdx], 4
.LBB1_7:
        pop     rax
        ret
.LBB1_14:
        lea     rdi, [rip + .L__unnamed_1]
        call    qword ptr [rip + _ZN4core9panicking5panic17hd62333a8bd86ba63E@GOTPCREL]
        ud2
.LBB1_8:
        lea     rdi, [rip + .L__unnamed_2]
        mov     rsi, rdx
        mov     rdx, rax
        call    qword ptr [rip + _ZN4core9panicking18panic_bounds_check17h8f5d51613726af8dE@GOTPCREL]
        ud2
.LBB1_11:
        lea     rdi, [rip + .L__unnamed_3]
        mov     rdx, rax
        call    qword ptr [rip + _ZN4core9panicking18panic_bounds_check17h8f5d51613726af8dE@GOTPCREL]
        ud2
.LBB1_12:
        lea     rdi, [rip + .L__unnamed_4]
        mov     rdx, rax
        call    qword ptr [rip + _ZN4core9panicking18panic_bounds_check17h8f5d51613726af8dE@GOTPCREL]
        ud2
.LBB1_13:
        lea     rdi, [rip + .L__unnamed_5]
        mov     rsi, rdx
        mov     rdx, rax
        call    qword ptr [rip + _ZN4core9panicking18panic_bounds_check17h8f5d51613726af8dE@GOTPCREL]
        ud2

.L__unnamed_6:
        .ascii  "called `Option::unwrap()` on a `None` value"

.L__unnamed_7:
        .ascii  "libcore/option.rs"

.L__unnamed_1:
        .quad   .L__unnamed_6
        .asciz  "+\000\000\000\000\000\000"
        .quad   .L__unnamed_7
        .asciz  "\021\000\000\000\000\000\000\000c\001\000\000\025\000\000"

str.0:
        .ascii  "/tmp/compiler-explorer-compiler118917-56-2jgk8f.vziuf/example.rs"

.L__unnamed_2:
        .quad   str.0
        .quad   64
        .long   19
        .long   9

.L__unnamed_3:
        .quad   str.0
        .quad   64
        .long   21
        .long   9

.L__unnamed_4:
        .quad   str.0
        .quad   64
        .long   23
        .long   9

.L__unnamed_5:
        .quad   str.0
        .quad   64
        .long   25
        .long   9

That is, the safe function emits bound checks for each write via slice subscript despite there being enough information to decide that the bound checks cannot fail.

In encoding_rs the use of unsafe as seen in the first function is necessary for the UTF-16 to UTF-8 encoder to be competitive with C++.

@RReverser
Copy link
Contributor

Your godbolt link seems to have incorrect code:

pub fn unchecked(dst: &mut [u8], offset: usize) {
    let mut i = offset;
    if i.checked_add(4).unwrap() > dst.len() {
        unsafe {
            *(dst.get_unchecked_mut(i)) = 1;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 2;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 3;
            i += 1;
            *(dst.get_unchecked_mut(i)) = 4;
        }
    }
}

pub fn checked(dst: &mut [u8], offset: usize) {
    let mut i = offset;
    if i.checked_add(4).unwrap() > dst.len() {
        dst[i] = 1;
        i += 1;
        dst[i] = 2;
        i += 1;
        dst[i] = 3;
        i += 1;
        dst[i] = 4;
    } 
}

I'm pretty sure these > should be <= as in the issue description.

@RReverser
Copy link
Contributor

As another side note: have you considered preslicing the input like below?

pub fn checked(dst: &mut [u8], offset: usize) {
    let dst = &mut dst[offset..];
    if dst.len() >= 4 {
        dst[0] = 1;
        dst[1] = 2;
        dst[2] = 3;
        dst[3] = 4;
    }
}

I agree that optimiser should be smarter with range analysis in your example, but something like this seems more idiomatic than checked in the issue description, performs same required checks and compiles down to an optimised code equivalent to the unchecked example.

@hsivonen
Copy link
Member Author

Your godbolt link seems to have incorrect code:

Thanks. Fixed.

I have you considered preslicing the input like below?

It had occurred to me that there probably is a pattern that optimizes properly, but I hadn't tried that one. Thanks!

It seems an additional check is needed to avoid panic when taking the subslice:

pub fn checked(dst: &mut [u8], offset: usize) {
    if offset <= dst.len() {
        let dst = &mut dst[offset..];
        if dst.len() >= 4 {
            dst[0] = 1;
            dst[1] = 2;
            dst[2] = 3;
            dst[3] = 4;
        }
    }
}

hsivonen added a commit to hsivonen/encoding_rs that referenced this issue Oct 19, 2018
Per suggestion by @RReverser in rust-lang/rust#55147

This change also makes the output buffer size requirement for
UTF-16 to UTF-8 encode normal (number of input code units times
three instead of the previous input code units times three plus one
where the last code unit was never written into but had to be
there for space checks).
@hsivonen
Copy link
Member Author

hsivonen commented Nov 2, 2018

Another simple case of adding a constant breaking LLVM's range analysis. It should be possible to compile this without a bound check.

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 27, 2019
@nikic
Copy link
Contributor

nikic commented Apr 13, 2019

@hsivonen Your second example fails to fold due to the one-use check in https://github.com/llvm-mirror/llvm/blob/fb1e04ff3c51a617c4944b2e05e8aa1b8c543e22/lib/Transforms/InstCombine/InstCombineCompares.cpp#L2389. Recent discussion on why it can't just be dropped: https://reviews.llvm.org/D58633

There's a couple other ways in which it could be made to fold, e.g. by handling icmp (add X, C), C2 for recursive simplification in InstSimplify, or by getting CVP/LVI to do a range-based evaluation of the condition.

@alex
Copy link
Member

alex commented Dec 29, 2024

Looks like these have had the same codegen since rust 1.81: https://rust.godbolt.org/z/88cdKf11W

Should this be closed, or is it preferable to add a codegen test?

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 29, 2024
@nikic
Copy link
Contributor

nikic commented Dec 29, 2024

We should add a codegen test.

@alex
Copy link
Member

alex commented Dec 29, 2024

Ok! #134892

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 29, 2024
Added codegen test for elidings bounds check when indexes are manually checked

Closes rust-lang#55147
jieyouxu added a commit to jieyouxu/rust that referenced this issue Dec 29, 2024
Added codegen test for elidings bounds check when indexes are manually checked

Closes rust-lang#55147
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Dec 29, 2024
Added codegen test for elidings bounds check when indexes are manually checked

Closes rust-lang#55147
@bors bors closed this as completed in dab1c57 Dec 30, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 30, 2024
Rollup merge of rust-lang#134892 - alex:codegen-55147, r=durin42

Added codegen test for elidings bounds check when indexes are manually checked

Closes rust-lang#55147
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants