-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM64] Vector128<T>.ExtractMostSignificantBits()
+ TrailingZeroCount
has better alternative than MoveMask emulation
#87735
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionAs of now, However, there is a better way recommended by ARM to achieve the same outcome. Regression?No DataCurrent assembly produced by public static uint IndexOf(Vector128<byte> vec, byte value)
{
var mask = Vector128.Equals(vec, Vector128.Create(value));
return (uint)BitOperations.TrailingZeroCount(
mask.ExtractMostSignificantBits());
} G_M000_IG01: ;; offset=0000H
stp fp, lr, [sp, #-0x10]!
mov fp, sp
G_M000_IG02: ;; offset=0008H
uxtb w0, w0
dup v16.16b, w0
cmeq v16.16b, v0.16b, v16.16b
ldr q17, [@RWD00]
and v16.16b, v16.16b, v17.16b
ldr q17, [@RWD16]
ushl v16.16b, v16.16b, v17.16b
mov v17.16b, v16.16b
addv b17, v17.8b
umov w0, v17.b[0]
ext v16.16b, v16.16b, v16.16b, #8
addv b16, v16.8b
umov w1, v16.b[0]
orr w0, w0, w1, LSL #8
rbit w0, w0
clz w0, w0
G_M000_IG03: ;; offset=0048H
ldp fp, lr, [sp], #0x10
ret lr
RWD00 dq 8080808080808080h, 8080808080808080h
RWD16 dq 00FFFEFDFCFBFAF9h, 00FFFEFDFCFBFAF9h
; Total bytes of code 80 This code can be manually rewritten to public static uint IndexOfOptimized(Vector128<byte> vec, byte value)
{
var mask = AdvSimd.CompareEqual(vec, Vector128.Create(value)).AsUInt16();
var res = AdvSimd.ShiftRightLogicalNarrowingLower(mask, 4);
var matches = res.AsUInt64().ToScalar();
return (uint)BitOperations.TrailingZeroCount(matches) >> 2;
} which is more compact and does not perform two vector loads G_M000_IG01: ;; offset=0000H
stp fp, lr, [sp, #-0x10]!
mov fp, sp
G_M000_IG02: ;; offset=0008H
uxtb w0, w0
dup v16.16b, w0
cmeq v16.16b, v0.16b, v16.16b
shrn v16.8b, v16.8h, #4
umov x0, v16.d[0]
rbit x0, x0
clz x0, x0
lsr w0, w0, #2
G_M000_IG03: ;; offset=0028H
ldp fp, lr, [sp], #0x10
ret lr
; Total bytes of code 48 AnalysisGiven the prevalence of
|
Duplicates #76047 |
Description
As of now,
BitOperations.TrailingZeroCount(vector.ExtractMostSignificantBits());
used across CoreLib for finding element index produces sse2neon style movemask emulation.However, there is a better way recommended by ARM to achieve the same outcome.
Regression?
No
Data
Current assembly produced by
This code can be manually rewritten to
which is more compact and does not perform two vector loads
Analysis
Given the prevalence of
.ExtractMostSignificantBits()
in vectorized code and the trend towards using cross-platform vector methods whenever possible, it would be ideal if the idiom described above was recognized and optimized for ARM64 by the compiler.The text was updated successfully, but these errors were encountered: