Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

Merged
merged 2 commits into from
Feb 26, 2024

Conversation

Stylie777
Copy link
Contributor

@Stylie777 Stylie777 commented Jan 11, 2024

When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour.

This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.

@Stylie777 Stylie777 requested a review from ostannard January 11, 2024 13:50
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-arm

Author: Jack Styles (Stylie777)

Changes

When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour.

This patch takes a similar approach to the RISC-V pass added as part of #3b8c0b3 to fix issue 50157.


Full diff: https://github.com/llvm/llvm-project/pull/77770.diff

8 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARM.h (+4)
  • (modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+3)
  • (added) llvm/lib/Target/ARM/ARMInitUndef.cpp (+242)
  • (modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+8)
  • (modified) llvm/lib/Target/ARM/ARMTargetMachine.cpp (+10)
  • (modified) llvm/lib/Target/ARM/CMakeLists.txt (+1)
  • (modified) llvm/test/CodeGen/ARM/O3-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll (+11)
diff --git a/llvm/lib/Target/ARM/ARM.h b/llvm/lib/Target/ARM/ARM.h
index b96e0182298524..8f38ee370ab5ee 100644
--- a/llvm/lib/Target/ARM/ARM.h
+++ b/llvm/lib/Target/ARM/ARM.h
@@ -57,6 +57,7 @@ FunctionPass *createARMSLSHardeningPass();
 FunctionPass *createARMIndirectThunks();
 Pass *createMVELaneInterleavingPass();
 FunctionPass *createARMFixCortexA57AES1742098Pass();
+FunctionPass *createARMInitUndefPass();
 
 void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
                                   ARMAsmPrinter &AP);
@@ -67,6 +68,7 @@ void initializeARMConstantIslandsPass(PassRegistry &);
 void initializeARMDAGToDAGISelPass(PassRegistry &);
 void initializeARMExpandPseudoPass(PassRegistry &);
 void initializeARMFixCortexA57AES1742098Pass(PassRegistry &);
+void initializeARMInitUndefPass(PassRegistry &);
 void initializeARMLoadStoreOptPass(PassRegistry &);
 void initializeARMLowOverheadLoopsPass(PassRegistry &);
 void initializeARMParallelDSPPass(PassRegistry &);
@@ -80,6 +82,8 @@ void initializeMVEVPTBlockPass(PassRegistry &);
 void initializeThumb2ITBlockPass(PassRegistry &);
 void initializeThumb2SizeReducePass(PassRegistry &);
 
+extern char &ARMInitUndefPass;
+
 } // end namespace llvm
 
 #endif // LLVM_LIB_TARGET_ARM_ARM_H
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 15cda9b9432d5f..8e39e51a28e3e3 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -2409,6 +2409,9 @@ void ARMAsmPrinter::emitInstruction(const MachineInstr *MI) {
   case ARM::SEH_EpilogEnd:
     ATS.emitARMWinCFIEpilogEnd();
     return;
+
+  case ARM::PseudoARMInitUndef:
+    return;
   }
 
   MCInst TmpInst;
diff --git a/llvm/lib/Target/ARM/ARMInitUndef.cpp b/llvm/lib/Target/ARM/ARMInitUndef.cpp
new file mode 100644
index 00000000000000..815abcc2f43660
--- /dev/null
+++ b/llvm/lib/Target/ARM/ARMInitUndef.cpp
@@ -0,0 +1,242 @@
+//===- ARMInitUndef.cpp - Initialize undef vector value to pseudo -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass runs to allow for Undef values to be given a definition in
+// early-clobber instructions. This can occur when using higher levels of
+// optimisations. Before, undef values were ignored and it would result in
+// early-clobber instructions using the same registeres for the output as one of
+// the inputs. This is an illegal operation according to the ARM Architecture
+// Reference Manual. This pass will check for early-clobber instructions and
+// give any undef values a defined Pseudo value to ensure that the early-clobber
+// rules are followed.
+//
+// Example: Without this pass, for the vhcadd instruction the following would
+// be generated: `vhcadd.s32 q0, q0, q0, #270`. This is an illegal instruction
+// as the output register and 2nd input register cannot match. By using this
+// pass, and using the Pseudo instruction, the following will be generated.
+// `vhcadd.s32 q0, q1, q2, #270`. This is allowed, as the output register and qd
+// input register are different.
+//===----------------------------------------------------------------------===//
+
+#include "ARM.h"
+#include "ARMBaseRegisterInfo.h"
+#include "ARMSubtarget.h"
+#include "llvm/ADT/bit.h"
+#include "llvm/CodeGen/DetectDeadLanes.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineOperand.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/Register.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetOpcodes.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Pass.h"
+#include "llvm/PassRegistry.h"
+#include "llvm/Support/Debug.h"
+#include <cstddef>
+#include <optional>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "arm-init-undef"
+#define ARM_INIT_UNDEF_NAME "ARM init undef pass"
+
+namespace {
+class ARMInitUndef : public MachineFunctionPass {
+  const TargetInstrInfo *TII;
+  MachineRegisterInfo *MRI;
+  const ARMSubtarget *ST;
+  const TargetRegisterInfo *TRI;
+
+  SmallSet<Register, 8> NewRegs;
+
+public:
+  static char ID;
+
+  ARMInitUndef() : MachineFunctionPass(ID) {
+    initializeARMInitUndefPass(*PassRegistry::getPassRegistry());
+  }
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  StringRef getPassName() const override { return ARM_INIT_UNDEF_NAME; }
+
+private:
+  bool handleImplicitDef(MachineBasicBlock &MBB,
+                         MachineBasicBlock::iterator &Inst);
+  bool isVectorRegClass(const Register R);
+  const TargetRegisterClass *
+  getVRLargestSuperClass(const TargetRegisterClass *RC) const;
+  bool processBasicBlock(MachineFunction &MF, MachineBasicBlock &MBB,
+                         const DeadLaneDetector &DLD);
+};
+} // end anonymous namespace
+
+char ARMInitUndef::ID = 0;
+INITIALIZE_PASS(ARMInitUndef, DEBUG_TYPE, ARM_INIT_UNDEF_NAME, false, false)
+char &llvm::ARMInitUndefPass = ARMInitUndef::ID;
+
+static bool isEarlyClobberMI(MachineInstr &MI) {
+  return llvm::any_of(MI.defs(), [](const MachineOperand &DefMO) {
+    return DefMO.isReg() && DefMO.isEarlyClobber();
+  });
+}
+
+static unsigned getUndefInitOpcode(unsigned RegClassID) {
+  if (RegClassID == ARM::MQPRRegClass.getID())
+    return ARM::PseudoARMInitUndef;
+
+  llvm_unreachable("Unexpected register class.");
+}
+
+/* handleImplicitDef is used to apply the definition to any undef values. This
+ * is only done for instructions that are early-clobber, and are not tied to
+ * other instructions. It will cycle through every MachineBasicBlock iterator
+ * that is an IMPLICIT_DEF then check for if it is early-clobber, not tied and
+ * is used. If all these are true, then the value is applied. This is then
+ * changed to be defined at ARM_INIT_UNDEF_PSEUDO
+ */
+bool ARMInitUndef::handleImplicitDef(MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator &Inst) {
+  bool Changed = false;
+  Register LastReg;
+
+  while (Inst->getOpcode() == TargetOpcode::IMPLICIT_DEF &&
+         Inst->getOperand(0).getReg() != LastReg) {
+
+    bool HasOtherUse = false;
+    SmallVector<MachineOperand *, 1> UseMOs;
+    LastReg = Inst->getOperand(0).getReg();
+
+    Register Reg = Inst->getOperand(0).getReg();
+    if (!Reg.isVirtual())
+      continue;
+
+    for (MachineOperand &MO : MRI->reg_operands(Reg)) {
+      LLVM_DEBUG(dbgs() << "Register: " << MO.getReg() << "\n");
+      LLVM_DEBUG(dbgs() << "MO " << MO << " is EarlyClobber "
+                        << isEarlyClobberMI(*MO.getParent()) << "\n");
+      if (isEarlyClobberMI(*MO.getParent())) {
+        LLVM_DEBUG(dbgs() << "MO " << &MO << " is use " << MO.isUse()
+                          << " MO is tied " << MO.isTied() << "\n");
+        if (MO.isUse() && !MO.isTied()) {
+          LLVM_DEBUG(dbgs() << "MO " << MO << " added\n");
+          UseMOs.push_back(&MO);
+        } else
+          HasOtherUse = true;
+      }
+    }
+    LLVM_DEBUG(dbgs() << "UseMOs " << &UseMOs << " is empty " << UseMOs.empty()
+                      << "\n");
+    if (UseMOs.empty())
+      continue;
+
+    LLVM_DEBUG(
+        dbgs() << "Emitting PseudoARMInitUndef for implicit vector register "
+               << Reg << '\n'
+               << '\n');
+
+    const TargetRegisterClass *TargetRegClass =
+        getVRLargestSuperClass(MRI->getRegClass(Reg));
+    unsigned Opcode = getUndefInitOpcode(TargetRegClass->getID());
+
+    Register NewDest = Reg;
+    if (HasOtherUse) {
+      NewDest = MRI->createVirtualRegister(TargetRegClass);
+      NewRegs.insert(NewDest);
+    }
+    BuildMI(MBB, Inst, Inst->getDebugLoc(), TII->get(Opcode), NewDest);
+
+    if (!HasOtherUse)
+      Inst = MBB.erase(Inst);
+
+    for (auto *MO : UseMOs) {
+      MO->setReg(NewDest);
+      MO->setIsUndef(false);
+    }
+
+    Changed = true;
+  }
+  return Changed;
+}
+
+bool ARMInitUndef::isVectorRegClass(Register R) {
+  return ARM::MQPRRegClass.hasSubClassEq(MRI->getRegClass(R));
+}
+
+const TargetRegisterClass *
+ARMInitUndef::getVRLargestSuperClass(const TargetRegisterClass *RC) const {
+  if (ARM::MQPRRegClass.hasSubClassEq(RC))
+    return &ARM::MQPRRegClass;
+  return RC;
+}
+
+/* This will process a BasicBlock within the MachineFunction. This will take
+ * each iterator and determine if there is an Implicit Def that needs dealing
+ * with. This also deals with implicit def's that are tied to an operand, to
+ * ensure that the correct undef values are given a definition.
+ */
+bool ARMInitUndef::processBasicBlock(MachineFunction &MF,
+                                     MachineBasicBlock &MBB,
+                                     const DeadLaneDetector &DLD) {
+  bool Changed = false;
+
+  for (MachineBasicBlock::iterator I = MBB.begin(); I != MBB.end(); I++) {
+    MachineInstr &MI = *I;
+
+    unsigned UseOpIdx;
+    if (MI.getNumDefs() != 0 && MI.isRegTiedToUseOperand(0, &UseOpIdx)) {
+      MachineOperand &UseMO = MI.getOperand(UseOpIdx);
+      if (UseMO.getReg() == ARM::NoRegister) {
+        const TargetRegisterClass *RC =
+            TII->getRegClass(MI.getDesc(), UseOpIdx, TRI, MF);
+        Register NewDest = MRI->createVirtualRegister(RC);
+
+        NewRegs.insert(NewDest);
+        BuildMI(MBB, I, I->getDebugLoc(), TII->get(TargetOpcode::IMPLICIT_DEF),
+                NewDest);
+        UseMO.setReg(NewDest);
+        Changed = true;
+      }
+    }
+
+    if (MI.isImplicitDef()) {
+      auto DstReg = MI.getOperand(0).getReg();
+      if (DstReg.isVirtual() && isVectorRegClass(DstReg))
+        Changed |= handleImplicitDef(MBB, I);
+    }
+  }
+  return Changed;
+}
+
+bool ARMInitUndef::runOnMachineFunction(MachineFunction &MF) {
+  ST = &MF.getSubtarget<ARMSubtarget>();
+  MRI = &MF.getRegInfo();
+  TII = ST->getInstrInfo();
+  TRI = MRI->getTargetRegisterInfo();
+
+  bool Changed = false;
+  DeadLaneDetector DLD(MRI, TRI);
+  DLD.computeSubRegisterLaneBitInfo();
+
+  for (MachineBasicBlock &BB : MF) {
+    Changed |= processBasicBlock(MF, BB, DLD);
+  }
+
+  return Changed;
+}
+
+FunctionPass *llvm::createARMInitUndefPass() { return new ARMInitUndef(); }
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 812b5730875d5d..2259c4fd391fcc 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -6532,3 +6532,11 @@ let isPseudo = 1 in {
   let isTerminator = 1 in
   def SEH_EpilogEnd : PseudoInst<(outs), (ins), NoItinerary, []>, Sched<[]>;
 }
+
+//===----------------------------------------------------------------------===//
+// Pseudo Instructions for use when early-clobber is defined and Greedy Register
+// Allocation is used. This ensures the constraint is used properly.
+//===----------------------------------------------------------------------===//
+let isCodeGenOnly = 1 in {
+  def PseudoARMInitUndef :  PseudoInst<(outs MQPR:$vd), (ins), NoItinerary, []>;
+}
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
index a99773691df123..2498a0b6a6a18f 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
@@ -41,6 +41,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/Pass.h"
+#include "llvm/PassRegistry.h"
 #include "llvm/Support/CodeGen.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -111,6 +112,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
   initializeMVELaneInterleavingPass(Registry);
   initializeARMFixCortexA57AES1742098Pass(Registry);
   initializeARMDAGToDAGISelPass(Registry);
+  initializeARMInitUndefPass(Registry);
 }
 
 static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
@@ -384,6 +386,7 @@ class ARMPassConfig : public TargetPassConfig {
   void addPreSched2() override;
   void addPreEmitPass() override;
   void addPreEmitPass2() override;
+  void addOptimizedRegAlloc() override;
 
   std::unique_ptr<CSEConfigBase> getCSEConfig() const override;
 };
@@ -621,6 +624,13 @@ void ARMPassConfig::addPreEmitPass2() {
   }
 }
 
+void ARMPassConfig::addOptimizedRegAlloc() {
+  if (getOptimizeRegAlloc())
+    insertPass(&DetectDeadLanesID, &ARMInitUndefPass);
+
+  TargetPassConfig::addOptimizedRegAlloc();
+}
+
 yaml::MachineFunctionInfo *
 ARMBaseTargetMachine::createDefaultFuncInfoYAML() const {
   return new yaml::ARMFunctionInfo();
diff --git a/llvm/lib/Target/ARM/CMakeLists.txt b/llvm/lib/Target/ARM/CMakeLists.txt
index 3d6af28b437538..d52a42adb325b3 100644
--- a/llvm/lib/Target/ARM/CMakeLists.txt
+++ b/llvm/lib/Target/ARM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(ARMCodeGen
   ARMFixCortexA57AES1742098Pass.cpp
   ARMFrameLowering.cpp
   ARMHazardRecognizer.cpp
+  ARMInitUndef.cpp
   ARMInstructionSelector.cpp
   ARMISelDAGToDAG.cpp
   ARMISelLowering.cpp
diff --git a/llvm/test/CodeGen/ARM/O3-pipeline.ll b/llvm/test/CodeGen/ARM/O3-pipeline.ll
index 5e565970fc3a86..310a82b26893e0 100644
--- a/llvm/test/CodeGen/ARM/O3-pipeline.ll
+++ b/llvm/test/CodeGen/ARM/O3-pipeline.ll
@@ -113,6 +113,7 @@
 ; CHECK-NEXT:      ARM pre- register allocation load / store optimization pass
 ; CHECK-NEXT:      ARM A15 S->D optimizer
 ; CHECK-NEXT:      Detect Dead Lanes
+; CHECK-NEXT:      ARM init undef pass
 ; CHECK-NEXT:      Process Implicit Definitions
 ; CHECK-NEXT:      Remove unreachable machine basic blocks
 ; CHECK-NEXT:      Live Variable Analysis
diff --git a/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll b/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
index 9bb24fc61ccef3..02234c63725360 100644
--- a/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
@@ -699,6 +699,17 @@ entry:
   ret <4 x i32> %0
 }
 
+define arm_aapcs_vfpcc <4 x i32> @test_vhcaddq_rot270_s32_undef() {
+; CHECK-LABEL: test_vhcaddq_rot270_s32_undef:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT: vhcadd.s32 q{{[0-9]+}}, q{{[0-9]+}}, q{{[0-9]+}}, #270
+; CHECK-NOT:  vhcadd.s32 q[[REG:[0-9]+]], q{{[0-9]+}}, q[[REG]], #270
+; CHECK-NEXT:    bx lr
+entry:
+  %0 = tail call <4 x i32> @llvm.arm.mve.vcaddq.v4i32(i32 0, i32 1, <4 x i32> undef, <4 x i32> undef)
+  ret <4 x i32> %0
+}
+
 define arm_aapcs_vfpcc <16 x i8> @test_vhcaddq_rot90_x_s8(<16 x i8> %a, <16 x i8> %b, i16 zeroext %p) {
 ; CHECK-LABEL: test_vhcaddq_rot90_x_s8:
 ; CHECK:       @ %bb.0: @ %entry

@jthackray jthackray self-requested a review January 11, 2024 13:55
@Stylie777 Stylie777 self-assigned this Jan 11, 2024
@tmatheson-arm
Copy link
Contributor

How similar is this to the RISC-V pass? Can it be made one generic pass that any backends could use? The problem does not seem specific to these two architectures.

@Stylie777
Copy link
Contributor Author

How similar is this to the RISC-V pass? Can it be made one generic pass that any backends could use? The problem does not seem specific to these two architectures.

@tmatheson-arm This would require some investigation into how this would work - I can look into this

@Stylie777
Copy link
Contributor Author

@tmatheson-arm to answer your question, the passes are similar to one another in some aspects and different in others. The logic employed is very similar, with how it checks for early-clobber and undef values. The differences come in how it applies the Pseudo instruction. For this solution with ARM, we have one instruction which takes on the form of a ARM::MQPRRegClass type. RISC V has multiple different ways of allocating the pseudo instructions, their version uses four different instructions that can be used depending on the situation.

That difference is my concern in making this a generic pass, and how it would deal with the different architectures and register configurations.

@ostannard
Copy link
Collaborator

To make this pass target-independent, you could move the target-specific bits (getVRLargestSuperClass, isVectorRegClass, getUndefInitOpcode, ...) into TargetRegisterInfo and TargetInstrInfo, which already have a lot of virtual functions for querying target-specific information from target-independent classes.

@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 401f228 to 7148a24 Compare January 24, 2024 09:53
@Stylie777
Copy link
Contributor Author

I have updated this PR to now be a refactor of the RISC-V pass to make it target independent, followed by adding support for the ARM Architecture.

Copy link

github-actions bot commented Jan 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 7148a24 to c0f8599 Compare January 24, 2024 11:21
@Stylie777 Stylie777 changed the title [ARM] Add pass for handling undef early-clobber values [CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. Jan 24, 2024
@topperc topperc requested a review from BeMg January 24, 2024 18:09
@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from c0f8599 to fa1beb5 Compare January 30, 2024 13:26
llvm/include/llvm/CodeGen/TargetInstrInfo.h Show resolved Hide resolved
llvm/include/llvm/CodeGen/TargetInstrInfo.h Outdated Show resolved Hide resolved
llvm/include/llvm/CodeGen/TargetSubtargetInfo.h Outdated Show resolved Hide resolved
llvm/lib/CodeGen/InitUndef.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/InitUndef.cpp Outdated Show resolved Hide resolved
@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch 2 times, most recently from 183f53b to 055a8f7 Compare February 5, 2024 14:30
@Stylie777 Stylie777 requested review from ostannard and removed request for tmatheson-arm February 6, 2024 13:23
@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch 3 times, most recently from 6e8b907 to 5549e7b Compare February 13, 2024 08:24
/// should be a Pseudo Instruction for the different register
/// classes for the different register types that are introduced.
virtual const TargetRegisterClass *
getTargetRegisterClass(const TargetRegisterClass *RC) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name doesn't make much sense, and the comment doesn't clarify things. How about getLargestSuperClass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to getLargestSuperClass

/// the Init Undef Pass. By default, it is assumed that it will not support
/// the pass, with architecture specific overrides providing the information
/// where they are implemented. This was originally used in RISC-V's Init
/// Undef pass but has been moved to be a virtual function when the pass was
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You don't need to talk about the history here, this should just describe the current state of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the last sentence of this comment as this then removes the history.

if (ARM::DPR_VFP2RegClass.hasSubClassEq(RC))
return &ARM::DPR_VFP2RegClass;
if (ARM::GPRRegClass.hasSubClassEq(RC))
return &ARM::DPR_VFP2RegClass;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed.

@@ -278,6 +278,14 @@ class ARMSubtarget : public ARMGenSubtargetInfo {
return &InstrInfo->getRegisterInfo();
}

/// Returns true as the ARM Architecture is supported by the Init Undef Pass.
/// We want to enable this for MVE and NEON instructions, however this can be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any NEON instructions which this is needed for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are currently no NEON instructions that use easy-clobber. There are however early-clobber instructions that might get missed by checking for HasMVEIntegerOps. Changed this to true as it should run regardless, and all the required register types have been implemented.

llvm/test/CodeGen/Thumb2/mve-vmull-splat.ll Show resolved Hide resolved
Currently this pass is designed for RISC-V only, however this is
not the only architecture with the bug reported in issue #50157.
We can convert the exisiting pass to be generic, using some of the
existing Parent classes rather than RISC-V specific classes to
bring the same functionality to other Architectures.

The pass has been refactored, removing the RISC-V specific functions
and data-types and replacing them with datatypes that will support
all architectures and virtual functions in the respecitive classes
that allow support for the pass to be added. By default, this pass
will not run on on all architectures, only those that have the
`supportsInitUndef` function impletmented. This commit will only
refactor the exisiting code and add the pass in as a CodeGen pass rather
than target specific. To add support for other architectures, this should
be done in new, following commits.

This will allow for the pass to be used by any architecture. With the
correct overriding functions, other architectures can be supported
to provide the same functionality as was added to fix issue that
was reported in Issue #50157.

This is still a temporary measure, and a better more permenant fix
should be found, but for the time being this will allow for the
correct early-clobber contraint to be followed when defined in
vector instructions.
For assembly instructions within the ARM Architecture, certain
instructions have early-clobber restraints that are defined
for the instruction. However, when using the Greedy register
allocator this is ignored. To get around this, we can use
the Init Undef pass to assign a Pseudo instruction for the
registers that are early-clobber to ensure the restraint
is followed.

This adds in support for this using a new Pseudo instruction,
`PseudoARMInitUndef` which is used to ensure early-clobber
restrains are followed. The relevant overriding functions
have also been provided to ensure the architecture is
supported by the pass and the required information can be
passed to ensure that early-clobber restrains are respected.
@Stylie777 Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 5549e7b to 87a95f5 Compare February 21, 2024 16:26
Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Stylie777 Stylie777 merged commit 2823340 into main Feb 26, 2024
5 checks passed
@Stylie777 Stylie777 deleted the users/stylie777/add_arm_init_undef_pass branch February 26, 2024 12:12
nikic added a commit to nikic/llvm-project that referenced this pull request Sep 12, 2024
The InitUndef pass works around a register allocation issue, where
undef operands can be allocated to the same register as early-clobber
result operands. This may lead to ISA constraint violations, where
certain input and output registers are not allowed to overlap.

Originally this pass was implemented for RISCV, and then extended
to ARM in llvm#77770. I've since removed the target-specific parts of
the pass in llvm#106744 and llvm#107885. This PR now enables the pass for
all targets.

The motivating case is the one in arm64-ldxr-stxr.ll for the AArch64
target, where we were previously incorrectly allocating a stxp input
and output to the same register.
nikic added a commit that referenced this pull request Sep 16, 2024
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.

Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.

The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants