Skip to content

Commit

Permalink
x64: Specialize shuffle with an all-zeros immediate
Browse files Browse the repository at this point in the history
Instead of loading the all-zeros immediate from a rip-relative address
at the end of the function instead generate a zero with a `pxor`
instruction and then use `pshufb` to do the broadcast.
  • Loading branch information
alexcrichton committed Mar 8, 2023
1 parent 36a65c7 commit 29f7e7a
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 0 deletions.
8 changes: 8 additions & 0 deletions cranelift/codegen/src/isa/x64/lower.isle
Original file line number Diff line number Diff line change
Expand Up @@ -3592,6 +3592,14 @@
(rule 6 (lower (shuffle a b (u128_from_immediate 0x1716151413121110_0706050403020100)))
(x64_punpcklqdq a b))

;; If the vector shift mask is all 0s then that means the first byte of the
;; first operand is broadcast to all bytes. Falling through would load an
;; all-zeros constant from a rip-relative location but it should be slightly
;; more efficient to execute the `pshufb` here-and-now with an xor'd-to-be-zero
;; register.
(rule 6 (lower (shuffle a _ (u128_from_immediate 0)))
(x64_pshufb a (xmm_zero $I8X16)))

;; Special case for the `shufps` instruction which will select two 32-bit values
;; from the first operand and two 32-bit values from the second operand. Note
;; that there is a second case here as well for when the operands can be
Expand Down
27 changes: 27 additions & 0 deletions cranelift/filetests/filetests/isa/x64/shuffle.clif
Original file line number Diff line number Diff line change
Expand Up @@ -616,3 +616,30 @@ block0(v0: i16x8, v1: i16x8):
; popq %rbp
; retq

function %shuffle_all_zeros(i8x16, i8x16) -> i8x16 {
block0(v0: i8x16, v1: i8x16):
v2 = shuffle v0, v1, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
return v2
}

; VCode:
; pushq %rbp
; movq %rsp, %rbp
; block0:
; pxor %xmm3, %xmm3, %xmm3
; pshufb %xmm0, %xmm3, %xmm0
; movq %rbp, %rsp
; popq %rbp
; ret
;
; Disassembled:
; block0: ; offset 0x0
; pushq %rbp
; movq %rsp, %rbp
; block1: ; offset 0x4
; pxor %xmm3, %xmm3
; pshufb %xmm3, %xmm0
; movq %rbp, %rsp
; popq %rbp
; retq

7 changes: 7 additions & 0 deletions cranelift/filetests/filetests/runtests/simd-shuffle.clif
Original file line number Diff line number Diff line change
Expand Up @@ -251,3 +251,10 @@ block0(v0: i16x8, v1: i16x8):
return v5
}
; run: %pshufhw_rhs_3131([1 2 3 4 5 6 7 8], [9 10 11 12 13 14 15 16]) == [9 10 11 12 16 14 16 14]

function %shuffle_all_zeros(i8x16, i8x16) -> i8x16 {
block0(v0: i8x16, v1: i8x16):
v2 = shuffle v0, v1, [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
return v2
}
; run: %shuffle_all_zeros([5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]) == [5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5]

0 comments on commit 29f7e7a

Please sign in to comment.