-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][CostModel] VPIntrinsics have same cost as their non-vp counterparts #67178
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-risc-v ChangesOn RISCV, only a few VPIntrinsics have their cost modeled by the This patch models the cost of a VPIntrinsic as the cost of its non-VP
I have left this as a TODO since I think this change puts us on the right path Patch is 26.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67178.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index c11d558a73e9d09..7901622f049e5c1 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1687,6 +1687,32 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
}
}
+ // VP Intrinsics should have the same cost as their non-vp counterpart.
+ // TODO: Adjust the cost to make the vp intrinsic cheaper than its non-vp
+ // counterpart when the vector length argument is smaller than the maximum
+ // vector length.
+ if (VPIntrinsic::isVPIntrinsic(ICA.getID())) {
+ std::optional<Intrinsic::ID> FOp =
+ VPIntrinsic::getFunctionalOpcodeForVP(ICA.getID());
+ if (FOp)
+ return thisT()->getArithmeticInstrCost(*FOp, ICA.getReturnType(), CostKind);
+
+ std::optional<Intrinsic::ID> FID =
+ VPIntrinsic::getFunctionalIntrinsicIDForVP(ICA.getID());
+ if (FID) {
+ // Non-vp version will have same Args/Tys except mask and vector length.
+ ArrayRef<const Value *> NewArgs(ICA.getArgs().begin(),
+ ICA.getArgs().end() - 2);
+ ArrayRef<Type *> NewTys(ICA.getArgTypes().begin(),
+ ICA.getArgTypes().end() - 2);
+
+ IntrinsicCostAttributes NewICA(*FID, ICA.getReturnType(), NewArgs,
+ NewTys, ICA.getFlags(), ICA.getInst(),
+ ICA.getScalarizationCost());
+ return thisT()->getIntrinsicInstrCost(NewICA, CostKind);
+ }
+ }
+
// Assume that we need to scalarize this intrinsic.
// Compute the scalarization overhead based on Args for a vector
// intrinsic.
diff --git a/llvm/test/Analysis/CostModel/RISCV/gep.ll b/llvm/test/Analysis/CostModel/RISCV/gep.ll
index c7a3e5d30aba7f4..2c309412b9b2e35 100644
--- a/llvm/test/Analysis/CostModel/RISCV/gep.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/gep.ll
@@ -270,7 +270,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = getelementptr i8, ptr %base, i32 42
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = getelementptr i8, ptr %base, i32 42
-; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
+; RVI-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = getelementptr i8, ptr %base, i32 42
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = getelementptr i8, ptr %base, i32 42
@@ -282,7 +282,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = getelementptr i8, ptr %base, i32 42
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = getelementptr i8, ptr %base, i32 42
-; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
+; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %12 = getelementptr i8, ptr %base, i32 42
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
@@ -340,7 +340,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %4 = getelementptr i8, ptr %base, i32 0
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %5 = getelementptr i8, ptr %base, i32 0
-; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
+; RVI-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %6 = getelementptr i8, ptr %base, i32 0
; RVI-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %7 = getelementptr i8, ptr %base, i32 0
@@ -352,7 +352,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %10 = getelementptr i8, ptr %base, i32 0
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %11 = getelementptr i8, ptr %base, i32 0
-; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
+; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %12 = getelementptr i8, ptr %base, i32 0
; RVI-NEXT: Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
; RVI-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
index 6dcc218981f7a7a..6ad1e31ff61f63a 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
@@ -206,6 +206,199 @@ define void @vp_fshl() {
ret void
}
+define void @add() {
+; CHECK-LABEL: 'add'
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t0 = call <2 x i8> @llvm.vp.add.v2i8(<2 x i8> undef, <2 x i8> undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t1 = add <2 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t2 = call <4 x i8> @llvm.vp.add.v4i8(<4 x i8> undef, <4 x i8> undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t3 = add <4 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t4 = call <8 x i8> @llvm.vp.add.v8i8(<8 x i8> undef, <8 x i8> undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t5 = add <8 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t6 = call <16 x i8> @llvm.vp.add.v16i8(<16 x i8> undef, <16 x i8> undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t7 = add <16 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t8 = call <2 x i64> @llvm.vp.add.v2i64(<2 x i64> undef, <2 x i64> undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t9 = add <2 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t10 = call <4 x i64> @llvm.vp.add.v4i64(<4 x i64> undef, <4 x i64> undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t12 = add <4 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t13 = call <8 x i64> @llvm.vp.add.v8i64(<8 x i64> undef, <8 x i64> undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t14 = add <8 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %t15 = call <16 x i64> @llvm.vp.add.v16i64(<16 x i64> undef, <16 x i64> undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %t16 = add <16 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t17 = call <vscale x 2 x i8> @llvm.vp.add.nxv2i8(<vscale x 2 x i8> undef, <vscale x 2 x i8> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t18 = add <vscale x 2 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t19 = call <vscale x 4 x i8> @llvm.vp.add.nxv4i8(<vscale x 4 x i8> undef, <vscale x 4 x i8> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t20 = add <vscale x 4 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t21 = call <vscale x 8 x i8> @llvm.vp.add.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %t22 = add <vscale x 8 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t23 = call <vscale x 16 x i8> @llvm.vp.add.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t24 = add <vscale x 16 x i8> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t25 = call <vscale x 2 x i64> @llvm.vp.add.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %t26 = add <vscale x 2 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t27 = call <vscale x 4 x i64> @llvm.vp.add.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %t28 = add <vscale x 4 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %t29 = call <vscale x 8 x i64> @llvm.vp.add.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %t30 = add <vscale x 8 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %t31 = call <vscale x 16 x i64> @llvm.vp.add.nxv16i64(<vscale x 16 x i64> undef, <vscale x 16 x i64> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %t32 = add <vscale x 16 x i64> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ %t0 = call <2 x i8> @llvm.vp.add.v2i8(<2 x i8> undef, <2 x i8> undef, <2 x i1> undef, i32 undef)
+ %t1 = add <2 x i8> undef, undef
+ %t2 = call <4 x i8> @llvm.vp.add.v4i8(<4 x i8> undef, <4 x i8> undef, <4 x i1> undef, i32 undef)
+ %t3 = add <4 x i8> undef, undef
+ %t4 = call <8 x i8> @llvm.vp.add.v8i8(<8 x i8> undef, <8 x i8> undef, <8 x i1> undef, i32 undef)
+ %t5 = add <8 x i8> undef, undef
+ %t6 = call <16 x i8> @llvm.vp.add.v16i8(<16 x i8> undef, <16 x i8> undef, <16 x i1> undef, i32 undef)
+ %t7 = add <16 x i8> undef, undef
+ %t8 = call <2 x i64> @llvm.vp.add.v2i64(<2 x i64> undef, <2 x i64> undef, <2 x i1> undef, i32 undef)
+ %t9 = add <2 x i64> undef, undef
+ %t10 = call <4 x i64> @llvm.vp.add.v4i64(<4 x i64> undef, <4 x i64> undef, <4 x i1> undef, i32 undef)
+ %t12 = add <4 x i64> undef, undef
+ %t13 = call <8 x i64> @llvm.vp.add.v8i64(<8 x i64> undef, <8 x i64> undef, <8 x i1> undef, i32 undef)
+ %t14 = add <8 x i64> undef, undef
+ %t15 = call <16 x i64> @llvm.vp.add.v16i64(<16 x i64> undef, <16 x i64> undef, <16 x i1> undef, i32 undef)
+ %t16 = add <16 x i64> undef, undef
+ %t17 = call <vscale x 2 x i8> @llvm.vp.add.nv2i8(<vscale x 2 x i8> undef, <vscale x 2 x i8> undef, <vscale x 2 x i1> undef, i32 undef)
+ %t18 = add <vscale x 2 x i8> undef, undef
+ %t19 = call <vscale x 4 x i8> @llvm.vp.add.nv4i8(<vscale x 4 x i8> undef, <vscale x 4 x i8> undef, <vscale x 4 x i1> undef, i32 undef)
+ %t20 = add <vscale x 4 x i8> undef, undef
+ %t21 = call <vscale x 8 x i8> @llvm.vp.add.nv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+ %t22 = add <vscale x 8 x i8> undef, undef
+ %t23 = call <vscale x 16 x i8> @llvm.vp.add.nv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+ %t24 = add <vscale x 16 x i8> undef, undef
+ %t25 = call <vscale x 2 x i64> @llvm.vp.add.nv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+ %t26 = add <vscale x 2 x i64> undef, undef
+ %t27 = call <vscale x 4 x i64> @llvm.vp.add.nv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+ %t28 = add <vscale x 4 x i64> undef, undef
+ %t29 = call <vscale x 8 x i64> @llvm.vp.add.nv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+ %t30 = add <vscale x 8 x i64> undef, undef
+ %t31 = call <vscale x 16 x i64> @llvm.vp.add.nv16i64(<vscale x 16 x i64> undef, <vscale x 16 x i64> undef, <vscale x 16 x i1> undef, i32 undef)
+ %t32 = add <vscale x 16 x i64> undef, undef
+ ret void
+}
+
+define void @abs() {
+; CHECK-LABEL: 'abs'
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %1 = call <2 x i8> @llvm.vp.abs.v2i8(<2 x i8> undef, i1 false, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = call <2 x i8> @llvm.abs.v2i8(<2 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = call <4 x i8> @llvm.vp.abs.v4i8(<4 x i8> undef, i1 false, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = call <4 x i8> @llvm.abs.v4i8(<4 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = call <8 x i8> @llvm.vp.abs.v8i8(<8 x i8> undef, i1 false, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %6 = call <8 x i8> @llvm.abs.v8i8(<8 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = call <16 x i8> @llvm.vp.abs.v16i8(<16 x i8> undef, i1 false, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = call <16 x i8> @llvm.abs.v16i8(<16 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %9 = call <2 x i64> @llvm.vp.abs.v2i64(<2 x i64> undef, i1 false, <2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = call <2 x i64> @llvm.abs.v2i64(<2 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = call <4 x i64> @llvm.vp.abs.v4i64(<4 x i64> undef, i1 false, <4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %12 = call <4 x i64> @llvm.abs.v4i64(<4 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %13 = call <8 x i64> @llvm.vp.abs.v8i64(<8 x i64> undef, i1 false, <8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %14 = call <8 x i64> @llvm.abs.v8i64(<8 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %15 = call <16 x i64> @llvm.vp.abs.v16i64(<16 x i64> undef, i1 false, <16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %16 = call <16 x i64> @llvm.abs.v16i64(<16 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %17 = call <vscale x 2 x i8> @llvm.vp.abs.nxv2i8(<vscale x 2 x i8> undef, i1 false, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %18 = call <vscale x 2 x i8> @llvm.abs.nxv2i8(<vscale x 2 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = call <vscale x 4 x i8> @llvm.vp.abs.nxv4i8(<vscale x 4 x i8> undef, i1 false, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %20 = call <vscale x 4 x i8> @llvm.abs.nxv4i8(<vscale x 4 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %21 = call <vscale x 8 x i8> @llvm.vp.abs.nxv8i8(<vscale x 8 x i8> undef, i1 false, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 8 x i8> @llvm.abs.nxv8i8(<vscale x 8 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 16 x i8> @llvm.vp.abs.nxv16i8(<vscale x 16 x i8> undef, i1 false, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %24 = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = call <vscale x 2 x i64> @llvm.vp.abs.nxv2i64(<vscale x 2 x i64> undef, i1 false, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %26 = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %27 = call <vscale x 4 x i64> @llvm.vp.abs.nxv4i64(<vscale x 4 x i64> undef, i1 false, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %28 = call <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 8 x i64> @llvm.vp.abs.nxv8i64(<vscale x 8 x i64> undef, i1 false, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %30 = call <vscale x 8 x i64> @llvm.abs.nxv8i64(<vscale x 8 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %31 = call <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64> undef, i1 false, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %32 = call <vscale x 16 x i64> @llvm.abs.nxv16i64(<vscale x 16 x i64> undef, i1 false)
+; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+ call <2 x i8> @llvm.vp.abs.v2i8(<2 x i8> undef, i1 0, <2 x i1> undef, i32 undef)
+ call <2 x i8> @llvm.abs.v2i8(<2 x i8> undef, i1 0)
+ call <4 x i8> @llvm.vp.abs.v4i8(<4 x i8> undef, i1 0, <4 x i1> undef, i32 undef)
+ call <4 x i8> @llvm.abs.v4i8(<4 x i8> undef, i1 0)
+ call <8 x i8> @llvm.vp.abs.v8i8(<8 x i8> undef, i1 0, <8 x ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
bc35af3
to
4b8583b
Compare
On RISCV, only a few VPIntrinsics have their cost modeled by the VectorIntrinsicCostTable. Even so, none of those entries consider LMUL. All other VPIntrinsics do not have meaningful modeling.
…rparts On RISCV, only a few VPIntrinsics have their cost modeled by the VectorIntrinsicCostTable. Even so, none of those entries consider LMUL. All other VPIntrinsics do not have meaningful modeling. This patch models the cost of a VPIntrinsic as the cost of its non-VP counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP version depending on VL. On RISCV, this may be due two reasons (if the instruction is part of a loop): 1. A smaller VL can be used on the last iteration of the loop. 2. The VP instruction may avoid a scalar remainder loop. I have left this as a TODO since I think this change puts us on the right path of modeling the cost of a VPInstruction, and it isn't entierly clear to me how much of a discount we should give to a known VL<VLMAX or what to do when VL is unknown at compile time.
c545c35
to
b1a51d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll await approval from others.
After this, it seems reasonable to pass VL to TTI cost function.
pinging for additional approval. |
respond to craigs comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi there - we bisected to this change in a downstream compiler crash (on x86, I think due to the common codegen change). Here is the stack trace:
We're going to revert locally. Especially given that this is impacting a target outside of RISCV, may I push a revert? We can try to generate a better repro asynchronously. |
Sure thing. Can you please show me how I can reproduce this upstream? |
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
bug.ll : https://gist.github.com/Groverkss/4893d082222ea96dd6ac238cf8697cbd This reproduces the bug for me |
…vp counterparts This was reverted in commit 0abaf3c (llvm#67178). This version of the patch includes a fix which was caused by vp-reductions having an extra start value argument which the non-vp counterparts did not have.
@stellaraccident @Groverkss thank you for pointing out the bug and providing me a way to reproduce upstream. I have posted #68752 which recommits this here with the bug fix included. |
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
…p counterparts (llvm#67178)" This reverts commit fc865c2. Breaks x86 test.
We have a lot of code in RISCVTTIImpl::getIntrinsicInstrCost for vp intrinsics, which just forward the cost to the underlying non-vp cost function. However I just also noticed that there is generic code in BasicTTIImpl's getIntrinsicInstrCost that does the same thing, added in llvm#67178. The only difference is that BasicTTIImpl doesn't yet handle it for type-based costing. There doesn't seem to be any reason that it can't since it's just inspecting the argument types. This shuffles the VP costing up to handle both regular and type-based costing, and begins to deduplicate the VP specific costing in RISCVTTIImpl by moving them into BasicTTIImpl.h. It's not NFC since it picks up a couple of VP nodes that had slipped through the cracks. Future PRs can begin to move more of the code from RISCVTTIImpl to BasicTTIImpl.
We have a lot of code in RISCVTTIImpl::getIntrinsicInstrCost for vp intrinsics, which just forward the cost to the underlying non-vp cost function. However I just also noticed that there is generic code in BasicTTIImpl's getIntrinsicInstrCost that does the same thing, added in llvm#67178. The only difference is that BasicTTIImpl doesn't yet handle it for type-based costing. There doesn't seem to be any reason that it can't since it's just inspecting the argument types. This shuffles the VP costing up to handle both regular and type-based costing, and begins to deduplicate the VP specific costing in RISCVTTIImpl by moving them into BasicTTIImpl.h. It's not NFC since it picks up a couple of VP nodes that had slipped through the cracks. Future PRs can begin to move more of the code from RISCVTTIImpl to BasicTTIImpl.
We have a lot of code in RISCVTTIImpl::getIntrinsicInstrCost for vp intrinsics, which just forward the cost to the underlying non-vp cost function. However I just also noticed that there is generic code in BasicTTIImpl's getIntrinsicInstrCost that does the same thing, added in llvm#67178. The only difference is that BasicTTIImpl doesn't yet handle it for type-based costing. There doesn't seem to be any reason that it can't since it's just inspecting the argument types. This shuffles the VP costing up to handle both regular and type-based costing, and begins to deduplicate the VP specific costing in RISCVTTIImpl by moving them into BasicTTIImpl.h. It's not NFC since it picks up a couple of VP nodes that had slipped through the cracks. Future PRs can begin to move more of the code from RISCVTTIImpl to BasicTTIImpl.
We have a lot of code in RISCVTTIImpl::getIntrinsicInstrCost for vp intrinsics, which just forward the cost to the underlying non-vp cost function. However I just also noticed that there is generic code in BasicTTIImpl's getIntrinsicInstrCost that does the same thing, added in #67178. The only difference is that BasicTTIImpl doesn't yet handle it for type-based costing. There doesn't seem to be any reason that it can't since it's just inspecting the argument types. This shuffles the VP costing up to handle both regular and type-based costing, which allows us to deduplicate some of the VP specific costing in RISCVTTIImpl by delegating it to BasicTTIImpl.h. More of those nodes can be moved over to BasicTTIImpl.h later. It's not NFC since it picks up a couple of VP nodes that had slipped through the cracks. Future PRs can begin to move more of the code from RISCVTTIImpl to BasicTTIImpl.
We have a lot of code in RISCVTTIImpl::getIntrinsicInstrCost for vp intrinsics, which just forward the cost to the underlying non-vp cost function. However I just also noticed that there is generic code in BasicTTIImpl's getIntrinsicInstrCost that does the same thing, added in llvm#67178. The only difference is that BasicTTIImpl doesn't yet handle it for type-based costing. There doesn't seem to be any reason that it can't since it's just inspecting the argument types. This shuffles the VP costing up to handle both regular and type-based costing, which allows us to deduplicate some of the VP specific costing in RISCVTTIImpl by delegating it to BasicTTIImpl.h. More of those nodes can be moved over to BasicTTIImpl.h later. It's not NFC since it picks up a couple of VP nodes that had slipped through the cracks. Future PRs can begin to move more of the code from RISCVTTIImpl to BasicTTIImpl.
On RISCV, only a few VPIntrinsics have their cost modeled by the
VectorIntrinsicCostTable. Even so, none of those entries consider LMUL.
All other VPIntrinsics do not have meaningful modeling.
This patch models the cost of a VPIntrinsic as the cost of its non-VP
counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP
version depending on VL. On RISCV, this may be due two reasons (if the
instruction is part of a loop):
I have left this as a TODO since I think this change puts us on the right path
of modeling the cost of a VPInstruction, and it isn't entierly clear to me how
much of a discount we should give to a known VL<VLMAX or what to do when VL
is unknown at compile time.