[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

Stylie777 · 2024-01-11T13:50:47Z

When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour.

This patch takes the existing RISC-V patch and makes it target independent, then adds support for the ARM Architecture. Doing this will ensure early-clobber restraints are followed when using the ARM Architecture. Making the pass target independent will also open up possibility that support other architectures can be added in the future.

github-actions · 2024-01-11T13:51:07Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-01-11T13:51:40Z

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-loongarch
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-arm

Author: Jack Styles (Stylie777)

Changes

When using Greedy Register Allocation, there are times where early-clobber values are ignored, and assigned the same register. This is illeagal behaviour for these intructions. To get around this, using Pseudo instructions for early-clobber registers gives them a definition and allows Greedy to assign them to a different register. This then meets the ARM Architecture Reference Manual and matches the defined behaviour.

This patch takes a similar approach to the RISC-V pass added as part of #3b8c0b3 to fix issue 50157.

Full diff: https://github.com/llvm/llvm-project/pull/77770.diff

8 Files Affected:

(modified) llvm/lib/Target/ARM/ARM.h (+4)
(modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+3)
(added) llvm/lib/Target/ARM/ARMInitUndef.cpp (+242)
(modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+8)
(modified) llvm/lib/Target/ARM/ARMTargetMachine.cpp (+10)
(modified) llvm/lib/Target/ARM/CMakeLists.txt (+1)
(modified) llvm/test/CodeGen/ARM/O3-pipeline.ll (+1)
(modified) llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll (+11)

diff --git a/llvm/lib/Target/ARM/ARM.h b/llvm/lib/Target/ARM/ARM.h
index b96e0182298524..8f38ee370ab5ee 100644
--- a/llvm/lib/Target/ARM/ARM.h
+++ b/llvm/lib/Target/ARM/ARM.h
@@ -57,6 +57,7 @@ FunctionPass *createARMSLSHardeningPass();
 FunctionPass *createARMIndirectThunks();
 Pass *createMVELaneInterleavingPass();
 FunctionPass *createARMFixCortexA57AES1742098Pass();
+FunctionPass *createARMInitUndefPass();
 
 void LowerARMMachineInstrToMCInst(const MachineInstr *MI, MCInst &OutMI,
                                   ARMAsmPrinter &AP);
@@ -67,6 +68,7 @@ void initializeARMConstantIslandsPass(PassRegistry &);
 void initializeARMDAGToDAGISelPass(PassRegistry &);
 void initializeARMExpandPseudoPass(PassRegistry &);
 void initializeARMFixCortexA57AES1742098Pass(PassRegistry &);
+void initializeARMInitUndefPass(PassRegistry &);
 void initializeARMLoadStoreOptPass(PassRegistry &);
 void initializeARMLowOverheadLoopsPass(PassRegistry &);
 void initializeARMParallelDSPPass(PassRegistry &);
@@ -80,6 +82,8 @@ void initializeMVEVPTBlockPass(PassRegistry &);
 void initializeThumb2ITBlockPass(PassRegistry &);
 void initializeThumb2SizeReducePass(PassRegistry &);
 
+extern char &ARMInitUndefPass;
+
 } // end namespace llvm
 
 #endif // LLVM_LIB_TARGET_ARM_ARM_H
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 15cda9b9432d5f..8e39e51a28e3e3 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -2409,6 +2409,9 @@ void ARMAsmPrinter::emitInstruction(const MachineInstr *MI) {
   case ARM::SEH_EpilogEnd:
     ATS.emitARMWinCFIEpilogEnd();
     return;
+
+  case ARM::PseudoARMInitUndef:
+    return;
   }
 
   MCInst TmpInst;
diff --git a/llvm/lib/Target/ARM/ARMInitUndef.cpp b/llvm/lib/Target/ARM/ARMInitUndef.cpp
new file mode 100644
index 00000000000000..815abcc2f43660
--- /dev/null
+++ b/llvm/lib/Target/ARM/ARMInitUndef.cpp
@@ -0,0 +1,242 @@
+//===- ARMInitUndef.cpp - Initialize undef vector value to pseudo -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass runs to allow for Undef values to be given a definition in
+// early-clobber instructions. This can occur when using higher levels of
+// optimisations. Before, undef values were ignored and it would result in
+// early-clobber instructions using the same registeres for the output as one of
+// the inputs. This is an illegal operation according to the ARM Architecture
+// Reference Manual. This pass will check for early-clobber instructions and
+// give any undef values a defined Pseudo value to ensure that the early-clobber
+// rules are followed.
+//
+// Example: Without this pass, for the vhcadd instruction the following would
+// be generated: `vhcadd.s32 q0, q0, q0, #270`. This is an illegal instruction
+// as the output register and 2nd input register cannot match. By using this
+// pass, and using the Pseudo instruction, the following will be generated.
+// `vhcadd.s32 q0, q1, q2, #270`. This is allowed, as the output register and qd
+// input register are different.
+//===----------------------------------------------------------------------===//
+
+#include "ARM.h"
+#include "ARMBaseRegisterInfo.h"
+#include "ARMSubtarget.h"
+#include "llvm/ADT/bit.h"
+#include "llvm/CodeGen/DetectDeadLanes.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineOperand.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/Register.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetOpcodes.h"
+#include "llvm/CodeGen/TargetRegisterInfo.h"
+#include "llvm/Pass.h"
+#include "llvm/PassRegistry.h"
+#include "llvm/Support/Debug.h"
+#include <cstddef>
+#include <optional>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "arm-init-undef"
+#define ARM_INIT_UNDEF_NAME "ARM init undef pass"
+
+namespace {
+class ARMInitUndef : public MachineFunctionPass {
+  const TargetInstrInfo *TII;
+  MachineRegisterInfo *MRI;
+  const ARMSubtarget *ST;
+  const TargetRegisterInfo *TRI;
+
+  SmallSet<Register, 8> NewRegs;
+
+public:
+  static char ID;
+
+  ARMInitUndef() : MachineFunctionPass(ID) {
+    initializeARMInitUndefPass(*PassRegistry::getPassRegistry());
+  }
+  bool runOnMachineFunction(MachineFunction &MF) override;
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.setPreservesCFG();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  StringRef getPassName() const override { return ARM_INIT_UNDEF_NAME; }
+
+private:
+  bool handleImplicitDef(MachineBasicBlock &MBB,
+                         MachineBasicBlock::iterator &Inst);
+  bool isVectorRegClass(const Register R);
+  const TargetRegisterClass *
+  getVRLargestSuperClass(const TargetRegisterClass *RC) const;
+  bool processBasicBlock(MachineFunction &MF, MachineBasicBlock &MBB,
+                         const DeadLaneDetector &DLD);
+};
+} // end anonymous namespace
+
+char ARMInitUndef::ID = 0;
+INITIALIZE_PASS(ARMInitUndef, DEBUG_TYPE, ARM_INIT_UNDEF_NAME, false, false)
+char &llvm::ARMInitUndefPass = ARMInitUndef::ID;
+
+static bool isEarlyClobberMI(MachineInstr &MI) {
+  return llvm::any_of(MI.defs(), [](const MachineOperand &DefMO) {
+    return DefMO.isReg() && DefMO.isEarlyClobber();
+  });
+}
+
+static unsigned getUndefInitOpcode(unsigned RegClassID) {
+  if (RegClassID == ARM::MQPRRegClass.getID())
+    return ARM::PseudoARMInitUndef;
+
+  llvm_unreachable("Unexpected register class.");
+}
+
+/* handleImplicitDef is used to apply the definition to any undef values. This
+ * is only done for instructions that are early-clobber, and are not tied to
+ * other instructions. It will cycle through every MachineBasicBlock iterator
+ * that is an IMPLICIT_DEF then check for if it is early-clobber, not tied and
+ * is used. If all these are true, then the value is applied. This is then
+ * changed to be defined at ARM_INIT_UNDEF_PSEUDO
+ */
+bool ARMInitUndef::handleImplicitDef(MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator &Inst) {
+  bool Changed = false;
+  Register LastReg;
+
+  while (Inst->getOpcode() == TargetOpcode::IMPLICIT_DEF &&
+         Inst->getOperand(0).getReg() != LastReg) {
+
+    bool HasOtherUse = false;
+    SmallVector<MachineOperand *, 1> UseMOs;
+    LastReg = Inst->getOperand(0).getReg();
+
+    Register Reg = Inst->getOperand(0).getReg();
+    if (!Reg.isVirtual())
+      continue;
+
+    for (MachineOperand &MO : MRI->reg_operands(Reg)) {
+      LLVM_DEBUG(dbgs() << "Register: " << MO.getReg() << "\n");
+      LLVM_DEBUG(dbgs() << "MO " << MO << " is EarlyClobber "
+                        << isEarlyClobberMI(*MO.getParent()) << "\n");
+      if (isEarlyClobberMI(*MO.getParent())) {
+        LLVM_DEBUG(dbgs() << "MO " << &MO << " is use " << MO.isUse()
+                          << " MO is tied " << MO.isTied() << "\n");
+        if (MO.isUse() && !MO.isTied()) {
+          LLVM_DEBUG(dbgs() << "MO " << MO << " added\n");
+          UseMOs.push_back(&MO);
+        } else
+          HasOtherUse = true;
+      }
+    }
+    LLVM_DEBUG(dbgs() << "UseMOs " << &UseMOs << " is empty " << UseMOs.empty()
+                      << "\n");
+    if (UseMOs.empty())
+      continue;
+
+    LLVM_DEBUG(
+        dbgs() << "Emitting PseudoARMInitUndef for implicit vector register "
+               << Reg << '\n'
+               << '\n');
+
+    const TargetRegisterClass *TargetRegClass =
+        getVRLargestSuperClass(MRI->getRegClass(Reg));
+    unsigned Opcode = getUndefInitOpcode(TargetRegClass->getID());
+
+    Register NewDest = Reg;
+    if (HasOtherUse) {
+      NewDest = MRI->createVirtualRegister(TargetRegClass);
+      NewRegs.insert(NewDest);
+    }
+    BuildMI(MBB, Inst, Inst->getDebugLoc(), TII->get(Opcode), NewDest);
+
+    if (!HasOtherUse)
+      Inst = MBB.erase(Inst);
+
+    for (auto *MO : UseMOs) {
+      MO->setReg(NewDest);
+      MO->setIsUndef(false);
+    }
+
+    Changed = true;
+  }
+  return Changed;
+}
+
+bool ARMInitUndef::isVectorRegClass(Register R) {
+  return ARM::MQPRRegClass.hasSubClassEq(MRI->getRegClass(R));
+}
+
+const TargetRegisterClass *
+ARMInitUndef::getVRLargestSuperClass(const TargetRegisterClass *RC) const {
+  if (ARM::MQPRRegClass.hasSubClassEq(RC))
+    return &ARM::MQPRRegClass;
+  return RC;
+}
+
+/* This will process a BasicBlock within the MachineFunction. This will take
+ * each iterator and determine if there is an Implicit Def that needs dealing
+ * with. This also deals with implicit def's that are tied to an operand, to
+ * ensure that the correct undef values are given a definition.
+ */
+bool ARMInitUndef::processBasicBlock(MachineFunction &MF,
+                                     MachineBasicBlock &MBB,
+                                     const DeadLaneDetector &DLD) {
+  bool Changed = false;
+
+  for (MachineBasicBlock::iterator I = MBB.begin(); I != MBB.end(); I++) {
+    MachineInstr &MI = *I;
+
+    unsigned UseOpIdx;
+    if (MI.getNumDefs() != 0 && MI.isRegTiedToUseOperand(0, &UseOpIdx)) {
+      MachineOperand &UseMO = MI.getOperand(UseOpIdx);
+      if (UseMO.getReg() == ARM::NoRegister) {
+        const TargetRegisterClass *RC =
+            TII->getRegClass(MI.getDesc(), UseOpIdx, TRI, MF);
+        Register NewDest = MRI->createVirtualRegister(RC);
+
+        NewRegs.insert(NewDest);
+        BuildMI(MBB, I, I->getDebugLoc(), TII->get(TargetOpcode::IMPLICIT_DEF),
+                NewDest);
+        UseMO.setReg(NewDest);
+        Changed = true;
+      }
+    }
+
+    if (MI.isImplicitDef()) {
+      auto DstReg = MI.getOperand(0).getReg();
+      if (DstReg.isVirtual() && isVectorRegClass(DstReg))
+        Changed |= handleImplicitDef(MBB, I);
+    }
+  }
+  return Changed;
+}
+
+bool ARMInitUndef::runOnMachineFunction(MachineFunction &MF) {
+  ST = &MF.getSubtarget<ARMSubtarget>();
+  MRI = &MF.getRegInfo();
+  TII = ST->getInstrInfo();
+  TRI = MRI->getTargetRegisterInfo();
+
+  bool Changed = false;
+  DeadLaneDetector DLD(MRI, TRI);
+  DLD.computeSubRegisterLaneBitInfo();
+
+  for (MachineBasicBlock &BB : MF) {
+    Changed |= processBasicBlock(MF, BB, DLD);
+  }
+
+  return Changed;
+}
+
+FunctionPass *llvm::createARMInitUndefPass() { return new ARMInitUndef(); }
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 812b5730875d5d..2259c4fd391fcc 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -6532,3 +6532,11 @@ let isPseudo = 1 in {
   let isTerminator = 1 in
   def SEH_EpilogEnd : PseudoInst<(outs), (ins), NoItinerary, []>, Sched<[]>;
 }
+
+//===----------------------------------------------------------------------===//
+// Pseudo Instructions for use when early-clobber is defined and Greedy Register
+// Allocation is used. This ensures the constraint is used properly.
+//===----------------------------------------------------------------------===//
+let isCodeGenOnly = 1 in {
+  def PseudoARMInitUndef :  PseudoInst<(outs MQPR:$vd), (ins), NoItinerary, []>;
+}
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
index a99773691df123..2498a0b6a6a18f 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
@@ -41,6 +41,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/Pass.h"
+#include "llvm/PassRegistry.h"
 #include "llvm/Support/CodeGen.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -111,6 +112,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
   initializeMVELaneInterleavingPass(Registry);
   initializeARMFixCortexA57AES1742098Pass(Registry);
   initializeARMDAGToDAGISelPass(Registry);
+  initializeARMInitUndefPass(Registry);
 }
 
 static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
@@ -384,6 +386,7 @@ class ARMPassConfig : public TargetPassConfig {
   void addPreSched2() override;
   void addPreEmitPass() override;
   void addPreEmitPass2() override;
+  void addOptimizedRegAlloc() override;
 
   std::unique_ptr<CSEConfigBase> getCSEConfig() const override;
 };
@@ -621,6 +624,13 @@ void ARMPassConfig::addPreEmitPass2() {
   }
 }
 
+void ARMPassConfig::addOptimizedRegAlloc() {
+  if (getOptimizeRegAlloc())
+    insertPass(&DetectDeadLanesID, &ARMInitUndefPass);
+
+  TargetPassConfig::addOptimizedRegAlloc();
+}
+
 yaml::MachineFunctionInfo *
 ARMBaseTargetMachine::createDefaultFuncInfoYAML() const {
   return new yaml::ARMFunctionInfo();
diff --git a/llvm/lib/Target/ARM/CMakeLists.txt b/llvm/lib/Target/ARM/CMakeLists.txt
index 3d6af28b437538..d52a42adb325b3 100644
--- a/llvm/lib/Target/ARM/CMakeLists.txt
+++ b/llvm/lib/Target/ARM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_llvm_target(ARMCodeGen
   ARMFixCortexA57AES1742098Pass.cpp
   ARMFrameLowering.cpp
   ARMHazardRecognizer.cpp
+  ARMInitUndef.cpp
   ARMInstructionSelector.cpp
   ARMISelDAGToDAG.cpp
   ARMISelLowering.cpp
diff --git a/llvm/test/CodeGen/ARM/O3-pipeline.ll b/llvm/test/CodeGen/ARM/O3-pipeline.ll
index 5e565970fc3a86..310a82b26893e0 100644
--- a/llvm/test/CodeGen/ARM/O3-pipeline.ll
+++ b/llvm/test/CodeGen/ARM/O3-pipeline.ll
@@ -113,6 +113,7 @@
 ; CHECK-NEXT:      ARM pre- register allocation load / store optimization pass
 ; CHECK-NEXT:      ARM A15 S->D optimizer
 ; CHECK-NEXT:      Detect Dead Lanes
+; CHECK-NEXT:      ARM init undef pass
 ; CHECK-NEXT:      Process Implicit Definitions
 ; CHECK-NEXT:      Remove unreachable machine basic blocks
 ; CHECK-NEXT:      Live Variable Analysis
diff --git a/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll b/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
index 9bb24fc61ccef3..02234c63725360 100644
--- a/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
+++ b/llvm/test/CodeGen/Thumb2/mve-intrinsics/vcaddq.ll
@@ -699,6 +699,17 @@ entry:
   ret <4 x i32> %0
 }
 
+define arm_aapcs_vfpcc <4 x i32> @test_vhcaddq_rot270_s32_undef() {
+; CHECK-LABEL: test_vhcaddq_rot270_s32_undef:
+; CHECK:       @ %bb.0: @ %entry
+; CHECK-NEXT: vhcadd.s32 q{{[0-9]+}}, q{{[0-9]+}}, q{{[0-9]+}}, #270
+; CHECK-NOT:  vhcadd.s32 q[[REG:[0-9]+]], q{{[0-9]+}}, q[[REG]], #270
+; CHECK-NEXT:    bx lr
+entry:
+  %0 = tail call <4 x i32> @llvm.arm.mve.vcaddq.v4i32(i32 0, i32 1, <4 x i32> undef, <4 x i32> undef)
+  ret <4 x i32> %0
+}
+
 define arm_aapcs_vfpcc <16 x i8> @test_vhcaddq_rot90_x_s8(<16 x i8> %a, <16 x i8> %b, i16 zeroext %p) {
 ; CHECK-LABEL: test_vhcaddq_rot90_x_s8:
 ; CHECK:       @ %bb.0: @ %entry

llvm/lib/Target/ARM/ARMInitUndef.cpp

tmatheson-arm · 2024-01-11T16:39:46Z

How similar is this to the RISC-V pass? Can it be made one generic pass that any backends could use? The problem does not seem specific to these two architectures.

Stylie777 · 2024-01-11T17:39:23Z

How similar is this to the RISC-V pass? Can it be made one generic pass that any backends could use? The problem does not seem specific to these two architectures.

@tmatheson-arm This would require some investigation into how this would work - I can look into this

Stylie777 · 2024-01-12T08:31:34Z

@tmatheson-arm to answer your question, the passes are similar to one another in some aspects and different in others. The logic employed is very similar, with how it checks for early-clobber and undef values. The differences come in how it applies the Pseudo instruction. For this solution with ARM, we have one instruction which takes on the form of a ARM::MQPRRegClass type. RISC V has multiple different ways of allocating the pseudo instructions, their version uses four different instructions that can be used depending on the situation.

That difference is my concern in making this a generic pass, and how it would deal with the different architectures and register configurations.

ostannard · 2024-01-18T10:21:13Z

To make this pass target-independent, you could move the target-specific bits (getVRLargestSuperClass, isVectorRegClass, getUndefInitOpcode, ...) into TargetRegisterInfo and TargetInstrInfo, which already have a lot of virtual functions for querying target-specific information from target-independent classes.

Stylie777 · 2024-01-24T09:55:29Z

I have updated this PR to now be a refactor of the RISC-V pass to make it target independent, followed by adding support for the ARM Architecture.

github-actions · 2024-01-24T09:56:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

llvm/lib/CodeGen/InitUndef.cpp

llvm/include/llvm/CodeGen/CodeGenPassBuilder.h

ostannard · 2024-02-02T14:44:26Z

llvm/include/llvm/CodeGen/TargetRegisterInfo.h

+  /// should be a Pseudo Instruction for the different register
+  /// classes for the different register types that are introduced.
+  virtual const TargetRegisterClass *
+  getTargetRegisterClass(const TargetRegisterClass *RC) const {


This name doesn't make much sense, and the comment doesn't clarify things. How about getLargestSuperClass?

Changed to getLargestSuperClass

ostannard · 2024-02-13T10:04:11Z

llvm/include/llvm/CodeGen/TargetSubtargetInfo.h

+  /// the Init Undef Pass. By default, it is assumed that it will not support
+  /// the pass, with architecture specific overrides providing the information
+  /// where they are implemented. This was originally used in RISC-V's Init
+  /// Undef pass but has been moved to be a virtual function when the pass was


Nit: You don't need to talk about the history here, this should just describe the current state of the code.

I have removed the last sentence of this comment as this then removes the history.

ostannard · 2024-02-13T10:07:40Z

llvm/lib/Target/ARM/ARMBaseRegisterInfo.h

+    if (ARM::DPR_VFP2RegClass.hasSubClassEq(RC))
+      return &ARM::DPR_VFP2RegClass;
+    if (ARM::GPRRegClass.hasSubClassEq(RC))
+      return &ARM::DPR_VFP2RegClass;


Copy-paste error?

Yes, fixed.

ostannard · 2024-02-13T10:08:25Z

llvm/lib/Target/ARM/ARMSubtarget.h

@@ -278,6 +278,14 @@ class ARMSubtarget : public ARMGenSubtargetInfo {
    return &InstrInfo->getRegisterInfo();
  }

+  /// Returns true as the ARM Architecture is supported by the Init Undef Pass.
+  /// We want to enable this for MVE and NEON instructions, however this can be


Are there any NEON instructions which this is needed for?

There are currently no NEON instructions that use easy-clobber. There are however early-clobber instructions that might get missed by checking for HasMVEIntegerOps. Changed this to true as it should run regardless, and all the required register types have been implemented.

llvm/test/CodeGen/Thumb2/mve-vmull-splat.ll

Currently this pass is designed for RISC-V only, however this is not the only architecture with the bug reported in issue #50157. We can convert the exisiting pass to be generic, using some of the existing Parent classes rather than RISC-V specific classes to bring the same functionality to other Architectures. The pass has been refactored, removing the RISC-V specific functions and data-types and replacing them with datatypes that will support all architectures and virtual functions in the respecitive classes that allow support for the pass to be added. By default, this pass will not run on on all architectures, only those that have the `supportsInitUndef` function impletmented. This commit will only refactor the exisiting code and add the pass in as a CodeGen pass rather than target specific. To add support for other architectures, this should be done in new, following commits. This will allow for the pass to be used by any architecture. With the correct overriding functions, other architectures can be supported to provide the same functionality as was added to fix issue that was reported in Issue #50157. This is still a temporary measure, and a better more permenant fix should be found, but for the time being this will allow for the correct early-clobber contraint to be followed when defined in vector instructions.

For assembly instructions within the ARM Architecture, certain instructions have early-clobber restraints that are defined for the instruction. However, when using the Greedy register allocator this is ignored. To get around this, we can use the Init Undef pass to assign a Pseudo instruction for the registers that are early-clobber to ensure the restraint is followed. This adds in support for this using a new Pseudo instruction, `PseudoARMInitUndef` which is used to ensure early-clobber restrains are followed. The relevant overriding functions have also been provided to ensure the architecture is supported by the pass and the required information can be passed to ensure that early-clobber restrains are respected.

ostannard

LGTM, thanks!

The InitUndef pass works around a register allocation issue, where undef operands can be allocated to the same register as early-clobber result operands. This may lead to ISA constraint violations, where certain input and output registers are not allowed to overlap. Originally this pass was implemented for RISCV, and then extended to ARM in llvm#77770. I've since removed the target-specific parts of the pass in llvm#106744 and llvm#107885. This PR now enables the pass for all targets. The motivating case is the one in arm64-ldxr-stxr.ll for the AArch64 target, where we were previously incorrectly allocating a stxp input and output to the same register.

The InitUndef pass works around a register allocation issue, where undef operands can be allocated to the same register as early-clobber result operands. This may lead to ISA constraint violations, where certain input and output registers are not allowed to overlap. Originally this pass was implemented for RISCV, and then extended to ARM in #77770. I've since removed the target-specific parts of the pass in #106744 and #107885. This PR reduces the pass to use a single requiresDisjointEarlyClobberAndUndef() target hook and enables it by default. The hook is disabled for AMDGPU, because overlapping early-clobber and undef operands are known to be safe for that target, and we get significant codegen diffs otherwise. The motivating case is the one in arm64-ldxr-stxr.ll, where we were previously incorrectly allocating a stxp input and output to the same register.

Stylie777 requested a review from ostannard January 11, 2024 13:50

llvmbot added the backend:ARM label Jan 11, 2024

jthackray self-requested a review January 11, 2024 13:55

jthackray reviewed Jan 11, 2024

View reviewed changes

llvm/lib/Target/ARM/ARMInitUndef.cpp Outdated Show resolved Hide resolved

Stylie777 self-assigned this Jan 11, 2024

Stylie777 requested a review from tmatheson-arm January 12, 2024 07:51

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 401f228 to 7148a24 Compare January 24, 2024 09:53

llvmbot added the backend:RISC-V label Jan 24, 2024

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 7148a24 to c0f8599 Compare January 24, 2024 11:21

Stylie777 changed the title ~~[ARM] Add pass for handling undef early-clobber values~~ [CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. Jan 24, 2024

topperc requested a review from BeMg January 24, 2024 18:09

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from c0f8599 to fa1beb5 Compare January 30, 2024 13:26

llvmbot added backend:AArch64 backend:AMDGPU backend:X86 backend:loongarch labels Jan 30, 2024

ostannard reviewed Jan 30, 2024

View reviewed changes

arichardson reviewed Jan 31, 2024

View reviewed changes

llvm/include/llvm/CodeGen/CodeGenPassBuilder.h Outdated Show resolved Hide resolved

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch 2 times, most recently from 183f53b to 055a8f7 Compare February 5, 2024 14:30

Stylie777 requested review from ostannard and removed request for tmatheson-arm February 6, 2024 13:23

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch 3 times, most recently from 6e8b907 to 5549e7b Compare February 13, 2024 08:24

ostannard reviewed Feb 13, 2024

View reviewed changes

Stylie777 added 2 commits February 14, 2024 15:57

Stylie777 force-pushed the users/stylie777/add_arm_init_undef_pass branch from 5549e7b to 87a95f5 Compare February 21, 2024 16:26

llvmbot added the backend:PowerPC label Feb 21, 2024

ostannard approved these changes Feb 26, 2024

View reviewed changes

Stylie777 merged commit 2823340 into main Feb 26, 2024
5 checks passed

Stylie777 deleted the users/stylie777/add_arm_init_undef_pass branch February 26, 2024 12:12

Stylie777 mentioned this pull request Apr 16, 2024

[AArch64][CodeGen] Fix illegal register aliasing bug for mops instrs #88869

Merged

davemgreen mentioned this pull request Aug 28, 2024

[AArch64] Overlapping registers allocated to input and lateout constraints #106380

Closed

nikic mentioned this pull request Sep 12, 2024

[InitUndef] Enable the InitUndef pass on non-AMDGPU targets #108353

Merged

Stylie777 mentioned this pull request Sep 18, 2024

[InitUndef] Also handle inline asm #108951

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

Stylie777 commented Jan 11, 2024 •

edited

Loading

github-actions bot commented Jan 11, 2024

llvmbot commented Jan 11, 2024 •

edited

Loading

tmatheson-arm commented Jan 11, 2024

Stylie777 commented Jan 11, 2024

Stylie777 commented Jan 12, 2024

ostannard commented Jan 18, 2024

Stylie777 commented Jan 24, 2024

github-actions bot commented Jan 24, 2024 •

edited

Loading

ostannard Feb 2, 2024

Stylie777 Feb 14, 2024

ostannard Feb 13, 2024

Stylie777 Feb 14, 2024

ostannard Feb 13, 2024

Stylie777 Feb 14, 2024

ostannard Feb 13, 2024

Stylie777 Feb 14, 2024

ostannard left a comment

[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. #77770

Conversation

Stylie777 commented Jan 11, 2024 • edited Loading

github-actions bot commented Jan 11, 2024

llvmbot commented Jan 11, 2024 • edited Loading

tmatheson-arm commented Jan 11, 2024

Stylie777 commented Jan 11, 2024

Stylie777 commented Jan 12, 2024

ostannard commented Jan 18, 2024

Stylie777 commented Jan 24, 2024

github-actions bot commented Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ostannard left a comment

Choose a reason for hiding this comment

Stylie777 commented Jan 11, 2024 •

edited

Loading

llvmbot commented Jan 11, 2024 •

edited

Loading

github-actions bot commented Jan 24, 2024 •

edited

Loading