-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add arm/aarch64 table lookup and vector combine intrinsics #546
Conversation
@Amanieu please do a review of this when you have time just in case, I can fix things in a subsequent PR. |
One thing that concerns me is whether these intrinsics work correctly on big-endian targets (see this link). For example, Clang's arm-neon.h has the following code: #ifdef __LITTLE_ENDIAN__
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
int8x8_t __ret;
__ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__p0, (int8x8_t)__p1, 0);
return __ret;
}
#else
__ai int8x8_t vtbl1_s8(int8x8_t __p0, int8x8_t __p1) {
int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, 7, 6, 5, 4, 3, 2, 1, 0);
int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, 7, 6, 5, 4, 3, 2, 1, 0);
int8x8_t __ret;
__ret = (int8x8_t) __builtin_neon_vtbl1_v((int8x8_t)__rev0, (int8x8_t)__rev1, 0);
__ret = __builtin_shufflevector(__ret, __ret, 7, 6, 5, 4, 3, 2, 1, 0);
return __ret;
}
#endif |
@Amanieu they definitely won't work correctly in big endian systems :/ Do we have any big endian arm and aarch64 targets in rustc ? I can't find any. If one can target these via Without a system to test on, this would be something that I cannot really fix. |
int8x16x2_t(vcombine_s8(b.0, b.1), vcombine_s8(b.2, ::mem::zeroed())), | ||
::mem::transmute(c), | ||
); | ||
let m: int8x8_t = simd_lt(c, ::mem::transmute(i8x8::splat(24))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an unsigned comparison instead of a signed one? Negative index values are still out of range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This bug affects the arm and aarch64 vtbx{N}_s8
intrinsics - the tests should be updated to test these with a negative index value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that this doesn't really matter.
Note how the signed integer vector is transmuted above into an unsigned integer vector when calling the vqbtx
aarch64 wider intrinsic. The negative indices will be out-of-bounds, and will fetch the appropriate value into the result. Only those integers larger than the original vector size and less than twice the original vector size will have an incorrect value, which is what has to be corrected with the mask.
You are right that there are no big-endian targets at the moment, and it is somewhat a hassle to set one up since big-endian ARM systems are very rare. So we don't forget about this in the future, maybe use |
I think we should be able to set up one on qemu as long as we had a proper target right ?
I am not sure if preventing Panicking isn't a good idea either, so.. ideally, I think we should emit a warning if somebody tries to use these while targetting big-endian. Maybe I could just use the deprecated attribute behind a cfg on arm/aarch64+big endian to warn that these are broken? Or should I use compile error for this and prevent cc @alexcrichton thoughts? |
The tricky part is getting a big-endian toolchain (GCC and libc) to build libstd and libtest. These aren't available in distro repositories (AFAIK) but you can probably use one from Linaro. |
Elsewhere in the standard library for modules like this we'd just omit the intrinsics, so could the ones only available on little-endian only be available for little-endian targets? |
That sounds good, I'll send a PR to put them behind a cfg macro for little endian. |
No description provided.