Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Combine vslidedown_vl with known VL and offset to a smaller LMUL #66267

Closed
wants to merge 1 commit into from

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 13, 2023

If we know the VL and offset of a vslidedown_vl, we can work out the minimum
number of registers it's going to operate across. We can reuse the logic from
extract_vector_elt to perform it in a smaller type and reduce the LMUL.

The aim is to generalize #65598 and hopefully extend this to vslideup_vl too so
that we can get the same optimisation for insert_subvector and
insert_vector_elt.

One observation from adding this is that the vslide*_vl nodes all take a mask
operand, but currently anything other than vmset_vl will fail to select, as all
the patterns expect true_mask. So we need to create a new vmset_vl instead of
using extract_subvector on the existing vmset_vl.

@lukel97 lukel97 requested a review from a team as a code owner September 13, 2023 18:13
@llvmbot
Copy link
Member

llvmbot commented Sep 13, 2023

@llvm/pr-subscribers-backend-risc-v

Changes If we know the VL of a vslidedown_vl, we can work out the minimum number of registers it's going to operate across. We can reuse the logic from extract_vector_elt to perform it in a smaller type and reduce the LMUL.

One observation from adding this is that the vslide*_vl nodes all take a mask
operand, but currently anything other than vmset_vl will fail to select, as all
the patterns expect true_mask. So we need to create a new vmset_vl instead of
using extract_subvector on the existing vmset_vl.

This could maybe also be done in InsertVSETVLI, but I haven't explored it yet.

--

Patch is 308.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66267.diff

14 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+34)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll (+30-24)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll (+20-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i-sat.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+274-227)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+220-220)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-scatter.ll (+711-700)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/splat-vector-split-i64-vl-sdnode.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-splice.ll (+24-24)

<pre>
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index a89961dccc70f3d..d16264c8f1f5f7f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -14099,6 +14099,40 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
if (SDValue V = performCONCAT_VECTORSCombine(N, DAG, Subtarget, *this))
return V;
break;

  • case RISCVISD::VSLIDEDOWN_VL: {
  • MVT OrigVT = N-&gt;getSimpleValueType(0);
  • auto *CVL = dyn_cast&lt;ConstantSDNode&gt;(N-&gt;getOperand(4));
  • if (!CVL)
  •  break;
    
  • // We can try and reduce the LMUL that a vslidedown uses if we know what VL
  • // is. For example, if the target has Zvl128b, a vslidedown of e32 with VL=5
  • // is only going to change at most the first 2 registers. So if we were
  • // operating at LMUL=4 (nxv8i32), we can reduce it to LMUL=2(nxv4i32).
  • if (auto ShrunkVT = getSmallestVTForIndex(OrigVT, CVL-&gt;getZExtValue(), DL,
  •                                          DAG, Subtarget)) {
    
  •  SDValue ShrunkPassthru =
    
  •      DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, *ShrunkVT, N-&amp;gt;getOperand(0),
    
  •                  DAG.getVectorIdxConstant(0, DL));
    
  •  SDValue ShrunkInVec =
    
  •      DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, *ShrunkVT, N-&amp;gt;getOperand(1),
    
  •                  DAG.getVectorIdxConstant(0, DL));
    
  •  // The only mask ever used in vslide*_vl nodes is vmset_vl, and the only
    
  •  // patterns on vslide*_vl only accept vmset_vl. So create a new vmset
    
  •  // since using an extract_subvector breaks patterns.
    
  •  assert(N-&amp;gt;getOperand(3).getOpcode() == RISCVISD::VMSET_VL);
    
  •  SDValue ShrunkMask =
    
  •      DAG.getNode(RISCVISD::VMSET_VL, SDLoc(N), getMaskTypeFor(*ShrunkVT),
    
  •                  N-&amp;gt;getOperand(4));
    
  •  SDValue ShrunkSlidedown =
    
  •      DAG.getNode(RISCVISD::VSLIDEDOWN_VL, DL, *ShrunkVT,
    
  •                  {ShrunkPassthru, ShrunkInVec, N-&amp;gt;getOperand(2),
    
  •                   ShrunkMask, N-&amp;gt;getOperand(4), N-&amp;gt;getOperand(5)});
    
  •  return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, OrigVT, N-&amp;gt;getOperand(0),
    
  •                     ShrunkSlidedown, DAG.getVectorIdxConstant(0, DL));
    
  • }
  • break;
  • }
    case RISCVISD::VFMV_V_F_VL: {
    const MVT VT = N-&gt;getSimpleValueType(0);
    SDValue Passthru = N-&gt;getOperand(0);
    diff --git a/llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll b/llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll
    index 36dfd631b7664fe..e0a8430c9670d05 100644
    --- a/llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll
    +++ b/llvm/test/CodeGen/RISCV/rvv/extractelt-fp.ll
    @@ -124,7 +124,7 @@ define half @extractelt_nxv8f16_imm(&lt;vscale x 8 x half&gt; %v) {
    define half @extractelt_nxv8f16_idx(&lt;vscale x 8 x half&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv8f16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -156,7 +156,7 @@ define half @extractelt_nxv16f16_imm(&lt;vscale x 16 x half&gt; %v) {
    define half @extractelt_nxv16f16_idx(&lt;vscale x 16 x half&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv16f16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -188,7 +188,7 @@ define half @extractelt_nxv32f16_imm(&lt;vscale x 32 x half&gt; %v) {
    define half @extractelt_nxv32f16_idx(&lt;vscale x 32 x half&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv32f16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -284,7 +284,7 @@ define float @extractelt_nxv4f32_imm(&lt;vscale x 4 x float&gt; %v) {
    define float @extractelt_nxv4f32_idx(&lt;vscale x 4 x float&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv4f32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -316,7 +316,7 @@ define float @extractelt_nxv8f32_imm(&lt;vscale x 8 x float&gt; %v) {
    define float @extractelt_nxv8f32_idx(&lt;vscale x 8 x float&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv8f32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -348,7 +348,7 @@ define float @extractelt_nxv16f32_imm(&lt;vscale x 16 x float&gt; %v) {
    define float @extractelt_nxv16f32_idx(&lt;vscale x 16 x float&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv16f32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -401,7 +401,7 @@ define double @extractelt_nxv2f64_0(&lt;vscale x 2 x double&gt; %v) {
    define double @extractelt_nxv2f64_imm(&lt;vscale x 2 x double&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv2f64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -412,7 +412,7 @@ define double @extractelt_nxv2f64_imm(&lt;vscale x 2 x double&gt; %v) {
    define double @extractelt_nxv2f64_idx(&lt;vscale x 2 x double&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv2f64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -433,7 +433,7 @@ define double @extractelt_nxv4f64_0(&lt;vscale x 4 x double&gt; %v) {
    define double @extractelt_nxv4f64_imm(&lt;vscale x 4 x double&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv4f64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -444,7 +444,7 @@ define double @extractelt_nxv4f64_imm(&lt;vscale x 4 x double&gt; %v) {
    define double @extractelt_nxv4f64_idx(&lt;vscale x 4 x double&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv4f64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -465,7 +465,7 @@ define double @extractelt_nxv8f64_0(&lt;vscale x 8 x double&gt; %v) {
    define double @extractelt_nxv8f64_imm(&lt;vscale x 8 x double&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv8f64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -476,7 +476,7 @@ define double @extractelt_nxv8f64_imm(&lt;vscale x 8 x double&gt; %v) {
    define double @extractelt_nxv8f64_idx(&lt;vscale x 8 x double&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv8f64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    @@ -663,7 +663,7 @@ define double @extractelt_nxv16f64_neg1(&lt;vscale x 16 x double&gt; %v) {
    define double @extractelt_nxv16f64_imm(&lt;vscale x 16 x double&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv16f64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    ; CHECK-NEXT: vfmv.f.s fa0, v8
    ; CHECK-NEXT: ret
    diff --git a/llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll b/llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll
    index ba8486780197e48..c6bfac88ccd8ecd 100644
    --- a/llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll
    +++ b/llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll
    @@ -78,7 +78,7 @@ define i1 @extractelt_nxv16i1(&lt;vscale x 16 x i8&gt;* %x, i64 %idx) nounwind {
    ; CHECK-NEXT: vmseq.vi v0, v8, 0
    ; CHECK-NEXT: vmv.v.i v8, 0
    ; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
    -; CHECK-NEXT: vsetivli zero, 1, e8, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -96,7 +96,7 @@ define i1 @extractelt_nxv32i1(&lt;vscale x 32 x i8&gt;* %x, i64 %idx) nounwind {
    ; CHECK-NEXT: vmseq.vi v0, v8, 0
    ; CHECK-NEXT: vmv.v.i v8, 0
    ; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
    -; CHECK-NEXT: vsetivli zero, 1, e8, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -114,7 +114,7 @@ define i1 @extractelt_nxv64i1(&lt;vscale x 64 x i8&gt;* %x, i64 %idx) nounwind {
    ; CHECK-NEXT: vmseq.vi v0, v8, 0
    ; CHECK-NEXT: vmv.v.i v8, 0
    ; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
    -; CHECK-NEXT: vsetivli zero, 1, e8, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    diff --git a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll
    index fd2f89e26e59809..8b061e20641f357 100644
    --- a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll
    +++ b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv32.ll
    @@ -157,7 +157,7 @@ define signext i8 @extractelt_nxv16i8_imm(&lt;vscale x 16 x i8&gt; %v) {
    define signext i8 @extractelt_nxv16i8_idx(&lt;vscale x 16 x i8&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv16i8_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e8, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -189,7 +189,7 @@ define signext i8 @extractelt_nxv32i8_imm(&lt;vscale x 32 x i8&gt; %v) {
    define signext i8 @extractelt_nxv32i8_idx(&lt;vscale x 32 x i8&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv32i8_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e8, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -221,7 +221,7 @@ define signext i8 @extractelt_nxv64i8_imm(&lt;vscale x 64 x i8&gt; %v) {
    define signext i8 @extractelt_nxv64i8_idx(&lt;vscale x 64 x i8&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv64i8_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e8, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -349,7 +349,7 @@ define signext i16 @extractelt_nxv8i16_imm(&lt;vscale x 8 x i16&gt; %v) {
    define signext i16 @extractelt_nxv8i16_idx(&lt;vscale x 8 x i16&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv8i16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -381,7 +381,7 @@ define signext i16 @extractelt_nxv16i16_imm(&lt;vscale x 16 x i16&gt; %v) {
    define signext i16 @extractelt_nxv16i16_idx(&lt;vscale x 16 x i16&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv16i16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -413,7 +413,7 @@ define signext i16 @extractelt_nxv32i16_imm(&lt;vscale x 32 x i16&gt; %v) {
    define signext i16 @extractelt_nxv32i16_idx(&lt;vscale x 32 x i16&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv32i16_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e16, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e16, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -509,7 +509,7 @@ define i32 @extractelt_nxv4i32_imm(&lt;vscale x 4 x i32&gt; %v) {
    define i32 @extractelt_nxv4i32_idx(&lt;vscale x 4 x i32&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv4i32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -541,7 +541,7 @@ define i32 @extractelt_nxv8i32_imm(&lt;vscale x 8 x i32&gt; %v) {
    define i32 @extractelt_nxv8i32_idx(&lt;vscale x 8 x i32&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv8i32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -573,7 +573,7 @@ define i32 @extractelt_nxv16i32_imm(&lt;vscale x 16 x i32&gt; %v) {
    define i32 @extractelt_nxv16i32_idx(&lt;vscale x 16 x i32&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv16i32_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e32, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e32, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -638,12 +638,13 @@ define i64 @extractelt_nxv2i64_0(&lt;vscale x 2 x i64&gt; %v) {
    define i64 @extractelt_nxv2i64_imm(&lt;vscale x 2 x i64&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv2i64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    -; CHECK-NEXT: li a0, 32
    -; CHECK-NEXT: vsrl.vx v10, v8, a0
    -; CHECK-NEXT: vmv.x.s a1, v10
    ; CHECK-NEXT: vmv.x.s a0, v8
    +; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsrl.vx v8, v8, a1
    +; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    %r = extractelement &lt;vscale x 2 x i64&gt; %v, i32 2
    ret i64 %r
    @@ -652,10 +653,11 @@ define i64 @extractelt_nxv2i64_imm(&lt;vscale x 2 x i64&gt; %v) {
    define i64 @extractelt_nxv2i64_idx(&lt;vscale x 2 x i64&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv2i64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m2, ta, ma
    ; CHECK-NEXT: vsrl.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    @@ -679,12 +681,13 @@ define i64 @extractelt_nxv4i64_0(&lt;vscale x 4 x i64&gt; %v) {
    define i64 @extractelt_nxv4i64_imm(&lt;vscale x 4 x i64&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv4i64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    -; CHECK-NEXT: li a0, 32
    -; CHECK-NEXT: vsrl.vx v12, v8, a0
    -; CHECK-NEXT: vmv.x.s a1, v12
    ; CHECK-NEXT: vmv.x.s a0, v8
    +; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
    +; CHECK-NEXT: vsrl.vx v8, v8, a1
    +; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    %r = extractelement &lt;vscale x 4 x i64&gt; %v, i32 2
    ret i64 %r
    @@ -693,10 +696,11 @@ define i64 @extractelt_nxv4i64_imm(&lt;vscale x 4 x i64&gt; %v) {
    define i64 @extractelt_nxv4i64_idx(&lt;vscale x 4 x i64&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv4i64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
    ; CHECK-NEXT: vsrl.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    @@ -720,12 +724,13 @@ define i64 @extractelt_nxv8i64_0(&lt;vscale x 8 x i64&gt; %v) {
    define i64 @extractelt_nxv8i64_imm(&lt;vscale x 8 x i64&gt; %v) {
    ; CHECK-LABEL: extractelt_nxv8i64_imm:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vi v8, v8, 2
    -; CHECK-NEXT: li a0, 32
    -; CHECK-NEXT: vsrl.vx v16, v8, a0
    -; CHECK-NEXT: vmv.x.s a1, v16
    ; CHECK-NEXT: vmv.x.s a0, v8
    +; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m8, ta, ma
    +; CHECK-NEXT: vsrl.vx v8, v8, a1
    +; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    %r = extractelement &lt;vscale x 8 x i64&gt; %v, i32 2
    ret i64 %r
    @@ -734,10 +739,11 @@ define i64 @extractelt_nxv8i64_imm(&lt;vscale x 8 x i64&gt; %v) {
    define i64 @extractelt_nxv8i64_idx(&lt;vscale x 8 x i64&gt; %v, i32 %idx) {
    ; CHECK-LABEL: extractelt_nxv8i64_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e64, m8, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e64, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: li a1, 32
    +; CHECK-NEXT: vsetivli zero, 1, e64, m8, ta, ma
    ; CHECK-NEXT: vsrl.vx v8, v8, a1
    ; CHECK-NEXT: vmv.x.s a1, v8
    ; CHECK-NEXT: ret
    diff --git a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
    index f5e5b9e9083b8dc..1c19cf9a864299b 100644
    --- a/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
    +++ b/llvm/test/CodeGen/RISCV/rvv/extractelt-int-rv64.ll
    @@ -156,7 +156,7 @@ define signext i8 @extractelt_nxv16i8_imm(&lt;vscale x 16 x i8&gt; %v) {
    define signext i8 @extractelt_nxv16i8_idx(&lt;vscale x 16 x i8&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv16i8_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e8, m2, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -188,7 +188,7 @@ define signext i8 @extractelt_nxv32i8_imm(&lt;vscale x 32 x i8&gt; %v) {
    define signext i8 @extractelt_nxv32i8_idx(&lt;vscale x 32 x i8&gt; %v, i32 zeroext %idx) {
    ; CHECK-LABEL: extractelt_nxv32i8_idx:
    ; CHECK: # %bb.0:
    -; CHECK-NEXT: vsetivli zero, 1, e8, m4, ta, ma
    +; CHECK-NEXT: vsetivli zero, 1, e8, m1, ta, ma
    ; CHECK-NEXT: vslidedown.vx v8, v8, a0
    ; CHECK-NEXT: vmv.x.s a0, v8
    ; CHECK-NEXT: ret
    @@ -220,7 +220,7 @@ define signext i8 @ext...

@michaelmaitland
Copy link
Contributor

This optimization increases the number of dynamic instructions in the case that other instructions prefer to use the larger LMUL:

vsetivli zero, 1, e16, m2, ta, ma
vslidedown.vx v8, v8, a0
// Instructions that want m2
// ...

gets optimized to

vsetivli zero, 1, e16, m1, ta, ma
vslidedown.vx v8, v8, a0
vsetivli zero, 1, e16, m2, ta, ma
// Instructions that want m2
// ...

If this sequence is in a loop, then the effect of this is magnified. Going from M8, M4, or M2 to a smaller LMUL will likely see an improvement that outweighs the cost of the extra vsetivli. However, it is less clear whether going from M1, MF2, or MF4 will outweigh the cost. Should we consider only doing this for LMUL that we are confident bring benefit?

@michaelmaitland michaelmaitland self-requested a review September 13, 2023 18:28
@lukel97
Copy link
Contributor Author

lukel97 commented Sep 13, 2023

If this sequence is in a loop, then the effect of this is magnified. Going from M8, M4, or M2 to a smaller LMUL will likely see an improvement that outweighs the cost of the extra vsetivli. However, it is less clear whether going from M1, MF2, or MF4 will outweigh the cost. Should we consider only doing this for LMUL that we are confident bring benefit?

getSmallestVTForIndex won't reduce LMUL to anything lower than M1, so this patch shouldn't go from M1/MF2 or MF4.
That said, I don't have any hard data on how much a vsetvli toggle costs vs. the savings from reducing LMUL. We were already doing this for extract_vector_elt so I'd imagine it must be somewhat profitable?

@michaelmaitland
Copy link
Contributor

getSmallestVTForIndex won't reduce LMUL to anything lower than M1, so this patch shouldn't go from M1/MF2 or MF4.

This is good enough for me.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch isn't sound. Just because the VL is known doesn't mean the slidedown offset is <= VL. The elements that will be written to the lower elements of the destination may come from the high LMUL part of the input.

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 18, 2023

This patch isn't sound. Just because the VL is known doesn't mean the slidedown offset is <= VL. The elements that will be written to the lower elements of the destination may come from the high LMUL part of the input.

Sorry you're right, I think this should be getSmallestVTForIndex(OrigVT, VL + Offset, DL, DAG, Subtarget)

If we know the VL and offset of a vslidedown_vl, we can work out the minimum
number of registers it's going to operate across. We can reuse the logic from
extract_vector_elt to perform it in a smaller type and reduce the LMUL.

The aim is to generalize llvm#65598 and hopefully extend this to vslideup_vl too so
that we can get the same optimisation for insert_subvector and
insert_vector_elt.

One observation from adding this is that the vslide*_vl nodes all take a mask
operand, but currently anything other than vmset_vl will fail to select, as all
the patterns expect true_mask. So we need to create a new vmset_vl instead of
using extract_subvector on the existing vmset_vl.
@lukel97 lukel97 force-pushed the vslide-shrink/vslidedown branch from b48e459 to 569fb00 Compare September 18, 2023 10:05
@lukel97 lukel97 changed the title [RISCV] Combine vslidedown_vl with known VL to a smaller LMUL [RISCV] Combine vslidedown_vl with known VL and offset to a smaller LMUL Sep 18, 2023
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 18, 2023
Similiar to llvm#66267, we can perform a vslideup_vl on a smaller type if we know
the highest lane that will be written to, which can be determined from VL.

This is an alternative to llvm#65997 and llvm#66087
DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, *ShrunkVT, N->getOperand(1),
DAG.getVectorIdxConstant(0, DL));

// The only mask ever used in vslide*_vl nodes is vmset_vl, and the only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awful hack. How hard would it be to add a pattern for vslidedown_vl with an arbitrary mask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll try adding a masked pattern, hopefully RISCVISelDAGToDAG will be able to convert it to the unmasked pseudo

; CHECK-NEXT: vmv.x.s a0, v8
; CHECK-NEXT: li a1, 32
; CHECK-NEXT: vsetivli zero, 1, e64, m4, ta, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem with your change, but why on earth are we using m4 for VL=1?

; RV32-NEXT: vsrl.vx v16, v12, a0
; RV32-NEXT: vmv.x.s a3, v16
; RV32-NEXT: vmv.x.s a3, v12
; RV32-NEXT: vsetivli zero, 1, e64, m4, ta, ma
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous batch of one off changes - sorry, got confused by the pr web interface.

To summarize:

  • We need a more general solution to the vslidedown w/mask case. Either via general select, or by removing the mask parameter on the node. The current form is highly confusing.
  • The vsrl using high lmul implies maybe we should go with the extract specific case as this lmul should be reducible as well.

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Sep 21, 2023
Sometimes with mask vectors that have been widened, there is a
CopyToRegClass node in between the VMSET and the CopyToReg.

This is a resurrection https://reviews.llvm.org/D148524, and is needed to
remove the mask operand when it's extracted from a subregister as planned in
llvm#66267 (comment)
lukel97 added a commit that referenced this pull request Sep 22, 2023
Sometimes with mask vectors that have been widened, there is a
CopyToRegClass node in between the VMSET and the CopyToReg.

This is a resurrection of https://reviews.llvm.org/D148524, and is
needed to
remove the mask operand when it's extracted from a subvector as planned
in
#66267 (comment)
lukel97 added a commit that referenced this pull request Oct 4, 2023
We were previously only matching on the true_mask pattern. This patch
allows arbitrary masks to be matched, which means we can avoid the
workaround used in #66267. We can just add patterns for the _MASK pseudo
variants because RISCVDAGToDAGISel::doPeepholeMaskedRVV will transform
them to the unmasked variant if the mask is all ones.
@lukel97 lukel97 marked this pull request as draft October 4, 2023 20:38
@lukel97
Copy link
Contributor Author

lukel97 commented Oct 4, 2023

Marking as a draft as it needs rebased, and may not be worthwhile doing if #66671 is not able to fully subsume the lowering logic for insert_vector_elt and insert_subvector

@lukel97 lukel97 closed this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants