Poor codegen on x86 when comparing bitwise AND to 0 #256

Pjottos · 2022-03-07T13:51:05Z

I am getting strange assembly output when compiling the following code on Godbolt:

#![feature(portable_simd)]

type SimdVec = std::simd::u16x16;

pub fn portable(a: SimdVec, b: SimdVec) -> bool {
    (a & b) == SimdVec::splat(0)
}

Compiling with -C opt-level=3 -C target-cpu=native, the output is:

example::portable:
        sub     rsp, 72
        vmovaps ymm0, ymmword ptr [rsi]
        vandps  ymm0, ymm0, ymmword ptr [rdi]
        vmovups ymmword ptr [rsp], ymm0
        vxorps  xmm0, xmm0, xmm0
        vmovups ymmword ptr [rsp + 32], ymm0
        vmovdqu ymm0, ymmword ptr [rsp]
        vpxor   ymm0, ymm0, ymmword ptr [rsp + 32]
        vptest  ymm0, ymm0
        sete    al
        add     rsp, 72
        vzeroupper
        ret

The following equivalent code using an x86 intrinsic gives a more reasonable result:

pub fn intrinsic(a: SimdVec, b: SimdVec) -> bool {
    unsafe { _mm256_testz_si256(a.into(), b.into()) != 0 }
}

example::intrinsic:
        vmovdqa ymm0, ymmword ptr [rdi]
        vptest  ymm0, ymmword ptr [rsi]
        sete    al
        vzeroupper
        ret

Strangely, changing the SimdVec to u8x32 improves the codegen a little bit:

example::portable:
        sub     rsp, 72
        vmovaps ymm0, ymmword ptr [rsi]
        vandps  ymm0, ymm0, ymmword ptr [rdi]
        vmovups ymmword ptr [rsp], ymm0
        vmovdqu ymm0, ymmword ptr [rsp]
        vptest  ymm0, ymm0
        sete    al
        add     rsp, 72
        vzeroupper
        ret

The text was updated successfully, but these errors were encountered:

calebzulawski · 2022-03-07T15:00:27Z

This is probably related to #209, PartialEq is not implemented for vectors with SIMD intrinsics yet.

elichai · 2022-09-25T12:05:00Z

Now that #274 is in, what's the right way to generate a vptest instruction? (I'm mainly interested in a zero check via: _mm256_testz_si256(res, res) == 1)

calebzulawski · 2023-07-17T12:43:10Z

Sorry for the delay, but I just came back to this. Enabling the avx target feature allows emitting vptest today: https://rust.godbolt.org/z/rMnWGjjKa

calebzulawski closed this as completed Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor codegen on x86 when comparing bitwise AND to 0 #256

Poor codegen on x86 when comparing bitwise AND to 0 #256

Pjottos commented Mar 7, 2022

calebzulawski commented Mar 7, 2022

elichai commented Sep 25, 2022

calebzulawski commented Jul 17, 2023

Poor codegen on x86 when comparing bitwise AND to 0 #256

Poor codegen on x86 when comparing bitwise AND to 0 #256

Comments

Pjottos commented Mar 7, 2022

calebzulawski commented Mar 7, 2022

elichai commented Sep 25, 2022

calebzulawski commented Jul 17, 2023