Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm neon left shift intrinsics - incorrect immediate value #106348

Open
Jamesbarford opened this issue Aug 28, 2024 · 1 comment
Open

Arm neon left shift intrinsics - incorrect immediate value #106348

Jamesbarford opened this issue Aug 28, 2024 · 1 comment

Comments

@Jamesbarford
Copy link

Summary
For, seemingly, all Arm left shift neon intrinsics that accept as a first argument a vector and a constant integer as a second argument, the second argument gets turned into a const vector.

This is problematic for languages that wish to use the intrinsics but do not natively support const vectors. For example they needed to be implemented in Rust.

How to reproduce:

#include <stdio.h>
#include <stdint.h>
#ifdef __ARM_NEON
#include <arm_neon.h>
#else
#error "arm only"
#endif

int
main(void)
{
    #define N 1
    int8x8_t a = {15, 14, 13, 12, 11, 10, 9, 8};
    /* Example */
    uint8x8_t retval = vqshlu_n_s8(a, N);
    for (int i = 0; i < 8; ++i) {
        printf("[%d] => %d\n", i, retval[i]);
    }
    return 0;
}

According to the documentation it should produce something like the following ir/llvm call:

<8 x i8>  @llvm.aarch64.neon.sqshlu.v8i8(<8 x i8>, i32)

However something is wrong meaning it actually expects a constant vector as it's second argument like so:

<8 x i8>  @llvm.aarch64.neon.sqshlu.v8i8(<8 x i8>, <8 x i8>)

However the resulting Arm assembly does produce the correct call:

sqshlu  v0.8b, v0.8b, #1

If you supply something like the following as a second argument in the IR the llc will crash as all values need to be the same

call @llvm.aarch64.neon.sqshlu.v8i8(
    <8 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8>
    <8 x i8> <i8 1, i8 2, i8, 3, i8 4, i8 5, i8 6, i8 7, i8 8>
)

As it is unable to create the correct Arm assembly code.

This observation is the same for a large portion, if not all, neon intrinsics requiring a left shift by a constant immediate value.

The intrinsics I have observed this with but are potentially not limited to:

  • vqshlu_n_<size>
  • vqshl_n_<size>
  • vqshlb_n_<size>
  • vqshlq_n_<size>
    (where <size> denotes s8, u8 etc...)

Exploration
This seems to be caused by the incorrect type being used in the tablegen file: llvm/include/llvm/IR/IntrinsicsAArch64.td. For example the sqshlu intrinsic is using the type AdvSIMD_2IntArg_Intrinsic which, through me fumbling around, seems to be remedied with the use of a type like the bellow along with removing the intrinsic out of the NEONMAP in clang/lib/CodeGen/CGBuiltin.cpp however I was mostly looking at the IR as opposed to clang and this is a fairly surface level dive into the problem

/* Example type that seemed to work */
class AdvSIMD_2VectorArg_Scalar
  : DefaultAttrsIntrinsic<[llvm_anyint_ty],
              [LLVMMatchType<0>, llvm_i32_ty],
              [IntrNoMem]>;
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/issue-subscribers-backend-aarch64

Author: James (Jamesbarford)

**Summary** For, seemingly, all Arm left shift neon intrinsics that accept as a first argument a vector and a constant integer as a second argument, the second argument gets turned into a const vector.

This is problematic for languages that wish to use the intrinsics but do not natively support const vectors. For example they needed to be implemented in Rust.

How to reproduce:

#include &lt;stdio.h&gt;
#include &lt;stdint.h&gt;
#ifdef __ARM_NEON
#include &lt;arm_neon.h&gt;
#else
#error "arm only"
#endif

int
main(void)
{
    #define N 1
    int8x8_t a = {15, 14, 13, 12, 11, 10, 9, 8};
    /* Example */
    uint8x8_t retval = vqshlu_n_s8(a, N);
    for (int i = 0; i &lt; 8; ++i) {
        printf("[%d] =&gt; %d\n", i, retval[i]);
    }
    return 0;
}

According to the documentation it should produce something like the following ir/llvm call:

&lt;8 x i8&gt;  @<!-- -->llvm.aarch64.neon.sqshlu.v8i8(&lt;8 x i8&gt;, i32)

However something is wrong meaning it actually expects a constant vector as it's second argument like so:

&lt;8 x i8&gt;  @<!-- -->llvm.aarch64.neon.sqshlu.v8i8(&lt;8 x i8&gt;, &lt;8 x i8&gt;)

However the resulting Arm assembly does produce the correct call:

sqshlu  v0.8b, v0.8b, #<!-- -->1

If you supply something like the following as a second argument in the IR the llc will crash as all values need to be the same

call @<!-- -->llvm.aarch64.neon.sqshlu.v8i8(
    &lt;8 x i8&gt; &lt;i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8&gt;
    &lt;8 x i8&gt; &lt;i8 1, i8 2, i8, 3, i8 4, i8 5, i8 6, i8 7, i8 8&gt;
)

As it is unable to create the correct Arm assembly code.

This observation is the same for a large portion, if not all, neon intrinsics requiring a left shift by a constant immediate value.

The intrinsics I have observed this with but are potentially not limited to:

  • vqshlu_n_&lt;size&gt;
  • vqshl_n_&lt;size&gt;
  • vqshlb_n_&lt;size&gt;
  • vqshlq_n_&lt;size&gt;
    (where &lt;size&gt; denotes s8, u8 etc...)

Exploration
This seems to be caused by the incorrect type being used in the tablegen file: llvm/include/llvm/IR/IntrinsicsAArch64.td. For example the sqshlu intrinsic is using the type AdvSIMD_2IntArg_Intrinsic which, through me fumbling around, seems to be remedied with the use of a type like the bellow along with removing the intrinsic out of the NEONMAP in clang/lib/CodeGen/CGBuiltin.cpp however I was mostly looking at the IR as opposed to clang and this is a fairly surface level dive into the problem

/* Example type that seemed to work */
class AdvSIMD_2VectorArg_Scalar
  : DefaultAttrsIntrinsic&lt;[llvm_anyint_ty],
              [LLVMMatchType&lt;0&gt;, llvm_i32_ty],
              [IntrNoMem]&gt;;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants