Skip to content

[AMDGPU] Optionally Use GCNRPTrackers during scheduling #93090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

jrbyrnes
Copy link
Contributor

@jrbyrnes jrbyrnes commented May 22, 2024

This is part of a series of PRs which enable using the AMDGPU/GCN RPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 . Since this is the final PR in the series, any high level comments should go here.

This PR adds the scheduling changes to: maintain the GCNRPTrackers during scheduling, and use them when making per-instruction scheduling decisions.

This PR is for commits beginning at fbd0e54 and depends on #93088 and #93089

Copy link

github-actions bot commented May 22, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@jrbyrnes jrbyrnes force-pushed the GCNTrackerRefactorRebase1 branch from 37e7242 to 1c4a370 Compare June 14, 2024 22:06
@jrbyrnes
Copy link
Contributor Author

Address review comments + rebase on top of d890dda

@jrbyrnes jrbyrnes force-pushed the GCNTrackerRefactorRebase1 branch from 46205fe to 5ce0ab5 Compare August 13, 2024 19:22
@jrbyrnes
Copy link
Contributor Author

Latest adds RP speculation by using the new interface in the GCNTrackers -- the review for this implementation, which has a more detailed git history for review purposes, is #93088

Rebased for 825dbbb

@jrbyrnes jrbyrnes marked this pull request as ready for review August 13, 2024 21:30
MaxPressure = max(MaxPressure, CurPressure);
for (const RegisterMaskPair &P : DeadDefs) {
Register Reg = P.RegUnit;
if (!Reg.isVirtual())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead defs increase pressure of physregs too?

Copy link
Contributor Author

@jrbyrnes jrbyrnes Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the functions used to maintain the current state of pressure / live regs (i.e. recede, and advance) ignore PhysRegs. So, if we try to track PhysRegs in our speculation queries, we will run into inconsistency issues. I think enabling PhysReg tracking should be a separate PR, perhaps required to turn on the trackers by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently have a patch stuck for over a year due to the pressure tracker brokenly handling dead defs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #76416 ?

Comment on lines +340 to +339
if (MO.isUndef())
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No isUndef check, it is redundant with the readsReg case for uses and broken of the def of subregister case

@@ -414,6 +508,64 @@ void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(CurPressure == getRegPressure(*MRI, LiveRegs));
}

void GCNUpwardRPTracker::bumpUpwardPressure(const MachineInstr *MI,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear what this method does to the RP tracker object - modifies CurPressure and MaxPressure? What is the state of the object after the method call? From your code I only see one call against a temp object which may indicate that the state of a temp becomes undefined after a call. If so why don't make a non-method function returning max pressure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really clear what this method does to the RP tracker object - modifies CurPressure and MaxPressure? What is the state of the object after the method call?

This method modifies CurPressure and MaxPressure w.r.t the uses/defs in MI. It is meant to be used for RP Speculation, so MI is not required to be the next instruction in the LIS. Due to this requirement, we can't use recede which uses the implied program order in LIS
to calculate live lane masks.

From your code I only see one call against a temp object which may indicate that the state of a temp becomes undefined after a call. If so why don't make a non-method function returning max pressure?

Since we are calculting the impact on RP for a SchedCandidate, we don't want to commit the new pressure state to the tracker (hence why we use the temp object). However, calculating RP w.r.t some instruction feels like the responsibility of the tracker, and keeping it as a method is consistent with the generic implementation.

CurPressure.inc(Reg, LiveAfter, LiveBefore, *MRI);
}
MaxPressure = max(MaxPressure, CurPressure);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall algorithm is really cumbersome. Why do you need findUseBetween?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need findUseBetween?

For bumpUpwardPressure, we actually don't. Thanks for pointing this out. I used in an attempt to correct for a subreg liveness resulting in copying the generic implementation. This was caused by RegisterOperands::adjustLaneLiveness. This method sets the LaneMask of a use to the live lanes of the register at the position of the MI in the LIS. This adds lanes that are live at the MI, but not necessarily used by the MI. findUseBetween was meant to correct this, but this was not obvious and possibly incorrect. I've removed this and the call to and replaced adjustLaneLiveness with adjustDefLaneLiveness (for def lane handling).

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Sep 9, 2024

I spent a while looking RP speculation in the face of dead lanes in superreg defs. In the current code (as propagated from the generic trackers), we basically just ignore these lanes when calcuting pressure impacts for speculative RP. However, this is inaccurate. For example, if we have an instruction which defines a vreg_128 and but only subregs 1 and 2 are used, then bumpDownwardPressure will speculate that this vgpr def increase VGPR pressure by 2. Since those dead regs are not needed after the instruction, this RP change is consistent with RAs view of conflicts for positions after the instruction. However, during the instruction (before the dead slot), we temporarily increase VGPR pressure by 4 to provide registers for those dead lanes. However, this may mislead the scheduler, particularly for bottom up scheduling in which defs reduce RP. Encoding this into RPTracker's CurPressure doesn't seem to make much sense, as this is used to model the pressure after the instruction (whether top-down or bottom-up direction). So, I think we should probably introduce a separate GCNRegPressure field in the trackers to model any temporary increases to RP, and the scheduler can use this to more accurately set heuristics. That said, this should probably be addressed in separate PR.

@jrbyrnes
Copy link
Contributor Author

Ping

@llvmbot
Copy link
Member

llvmbot commented Sep 24, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

This is part of a series of PRs which enable using the AMDGPU/GCN RPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 . Since this is the final PR in the series, any high level comments should go here.

This PR adds the scheduling changes to: maintain the GCNRPTrackers during scheduling, and use them when making per-instruction scheduling decisions.

This PR is for commits beginning at fbd0e54 and depends on #93088 and #93089


Patch is 38.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93090.diff

5 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+275-24)
  • (modified) llvm/lib/Target/AMDGPU/GCNRegPressure.h (+80-25)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+131-23)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+58-2)
diff --git a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
index 061b0515031b1b..085eb8e37e3cd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
@@ -480,7 +480,7 @@ void GCNIterativeScheduler::scheduleLegacyMaxOccupancy(
   LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "
                        "target occupancy = "
                     << TgtOcc << '\n');
-  GCNMaxOccupancySchedStrategy LStrgy(Context);
+  GCNMaxOccupancySchedStrategy LStrgy(Context, /*IsLegacyScheduler=*/true);
   unsigned FinalOccupancy = std::min(Occ, MFI->getOccupancy());
 
   for (int I = 0; I < NumPasses; ++I) {
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index c83af729f501fe..46bb3365a32337 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -288,6 +288,102 @@ collectVirtualRegUses(SmallVectorImpl<RegisterMaskPair> &RegMaskPairs,
   }
 }
 
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getRegLanes(ArrayRef<RegisterMaskPair> RegUnits,
+                               Register RegUnit) {
+  auto I = llvm::find_if(RegUnits, [RegUnit](const RegisterMaskPair Other) {
+    return Other.RegUnit == RegUnit;
+  });
+  if (I == RegUnits.end())
+    return LaneBitmask::getNone();
+  return I->LaneMask;
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getLanesWithProperty(
+    const LiveIntervals &LIS, const MachineRegisterInfo &MRI,
+    bool TrackLaneMasks, Register RegUnit, SlotIndex Pos,
+    LaneBitmask SafeDefault,
+    function_ref<bool(const LiveRange &LR, SlotIndex Pos)> Property) {
+  if (RegUnit.isVirtual()) {
+    const LiveInterval &LI = LIS.getInterval(RegUnit);
+    LaneBitmask Result;
+    if (TrackLaneMasks && LI.hasSubRanges()) {
+      for (const LiveInterval::SubRange &SR : LI.subranges()) {
+        if (Property(SR, Pos))
+          Result |= SR.LaneMask;
+      }
+    } else if (Property(LI, Pos)) {
+      Result = TrackLaneMasks ? MRI.getMaxLaneMaskForVReg(RegUnit)
+                              : LaneBitmask::getAll();
+    }
+
+    return Result;
+  }
+
+  const LiveRange *LR = LIS.getCachedRegUnit(RegUnit);
+  if (LR == nullptr)
+    return SafeDefault;
+  return Property(*LR, Pos) ? LaneBitmask::getAll() : LaneBitmask::getNone();
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+/// Helper to find a vreg use between two indices [PriorUseIdx, NextUseIdx).
+/// The query starts with a lane bitmask which gets lanes/bits removed for every
+/// use we find.
+static LaneBitmask findUseBetween(unsigned Reg, LaneBitmask LastUseMask,
+                                  SlotIndex PriorUseIdx, SlotIndex NextUseIdx,
+                                  const MachineRegisterInfo &MRI,
+                                  const SIRegisterInfo *TRI,
+                                  const LiveIntervals *LIS,
+                                  bool Upward = false) {
+  for (const MachineOperand &MO : MRI.use_nodbg_operands(Reg)) {
+    if (MO.isUndef())
+      continue;
+    const MachineInstr *MI = MO.getParent();
+    SlotIndex InstSlot = LIS->getInstructionIndex(*MI).getRegSlot();
+    bool InRange = Upward ? (InstSlot > PriorUseIdx && InstSlot <= NextUseIdx)
+                          : (InstSlot >= PriorUseIdx && InstSlot < NextUseIdx);
+    if (InRange) {
+      unsigned SubRegIdx = MO.getSubReg();
+      LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(SubRegIdx);
+      LastUseMask &= ~UseMask;
+      if (LastUseMask.none())
+        return LaneBitmask::getNone();
+    }
+  }
+  return LastUseMask;
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getLiveLanesAt(const LiveIntervals &LIS,
+                                  const MachineRegisterInfo &MRI,
+                                  bool TrackLaneMasks, Register RegUnit,
+                                  SlotIndex Pos) {
+  return getLanesWithProperty(
+      LIS, MRI, TrackLaneMasks, RegUnit, Pos, LaneBitmask::getAll(),
+      [](const LiveRange &LR, SlotIndex Pos) { return LR.liveAt(Pos); });
+}
+
+// Copy/paste from RegisterPressure.cpp (RegisterOperands::adjustLaneLiveness)
+static void adjustDefLaneLiveness(SmallVectorImpl<RegisterMaskPair> &Defs,
+                                  SlotIndex &Pos, const LiveIntervals &LIS,
+                                  const MachineRegisterInfo &MRI) {
+  for (auto *I = Defs.begin(); I != Defs.end();) {
+    LaneBitmask LiveAfter =
+        getLiveLanesAt(LIS, MRI, true, I->RegUnit, Pos.getDeadSlot());
+    // If the def is all that is live after the instruction, then in case
+    // of a subregister def we need a read-undef flag.
+    LaneBitmask ActualDef = I->LaneMask & LiveAfter;
+    if (ActualDef.none()) {
+      I = Defs.erase(I);
+    } else {
+      I->LaneMask = ActualDef;
+      ++I;
+    }
+  }
+}
+
 ///////////////////////////////////////////////////////////////////////////////
 // GCNRPTracker
 
@@ -343,17 +439,41 @@ void GCNRPTracker::reset(const MachineInstr &MI,
   MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
 }
 
-////////////////////////////////////////////////////////////////////////////////
-// GCNUpwardRPTracker
-
-void GCNUpwardRPTracker::reset(const MachineRegisterInfo &MRI_,
-                               const LiveRegSet &LiveRegs_) {
+void GCNRPTracker::reset(const MachineRegisterInfo &MRI_,
+                         const LiveRegSet &LiveRegs_) {
   MRI = &MRI_;
   LiveRegs = LiveRegs_;
   LastTrackedMI = nullptr;
   MaxPressure = CurPressure = getRegPressure(MRI_, LiveRegs_);
 }
 
+void GCNRPTracker::bumpDeadDefs(ArrayRef<RegisterMaskPair> DeadDefs) {
+  GCNRegPressure TempPressure = CurPressure;
+  for (const RegisterMaskPair &P : DeadDefs) {
+    Register Reg = P.RegUnit;
+    if (!Reg.isVirtual())
+      continue;
+    LaneBitmask LiveMask = LiveRegs[Reg];
+    LaneBitmask BumpedMask = LiveMask | P.LaneMask;
+    CurPressure.inc(Reg, LiveMask, BumpedMask, *MRI);
+  }
+  MaxPressure = max(MaxPressure, CurPressure);
+  CurPressure = TempPressure;
+}
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+LaneBitmask GCNRPTracker::getLastUsedLanes(Register RegUnit,
+                                           SlotIndex Pos) const {
+  return getLanesWithProperty(
+      LIS, *MRI, true, RegUnit, Pos.getBaseIndex(), LaneBitmask::getNone(),
+      [](const LiveRange &LR, SlotIndex Pos) {
+        const LiveRange::Segment *S = LR.getSegmentContaining(Pos);
+        return S != nullptr && S->end == Pos.getRegSlot();
+      });
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// GCNUpwardRPTracker
+
 void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
   assert(MRI && "call reset first");
 
@@ -414,6 +534,49 @@ void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
   assert(CurPressure == getRegPressure(*MRI, LiveRegs));
 }
 
+void GCNUpwardRPTracker::bumpUpwardPressure(const MachineInstr *MI,
+                                            const SIRegisterInfo *TRI) {
+  assert(!MI->isDebugOrPseudoInstr() && "Expect a nondebug instruction.");
+
+  SlotIndex SlotIdx = LIS.getInstructionIndex(*MI).getRegSlot();
+
+  // Account for register pressure similar to RegPressureTracker::recede().
+  RegisterOperands RegOpers;
+
+  RegOpers.collect(*MI, *TRI, *MRI, true, /*IgnoreDead=*/true);
+  assert(RegOpers.DeadDefs.empty());
+  adjustDefLaneLiveness(RegOpers.Defs, SlotIdx, LIS, *MRI);
+  RegOpers.detectDeadDefs(*MI, LIS);
+
+  // Boost max pressure for all dead defs together.
+  // Since CurrSetPressure and MaxSetPressure
+  bumpDeadDefs(RegOpers.DeadDefs);
+
+  // Kill liveness at live defs.
+  for (const RegisterMaskPair &P : RegOpers.Defs) {
+    Register Reg = P.RegUnit;
+    if (!Reg.isVirtual())
+      continue;
+    LaneBitmask LiveAfter = LiveRegs[Reg];
+    LaneBitmask UseLanes = getRegLanes(RegOpers.Uses, Reg);
+    LaneBitmask DefLanes = P.LaneMask;
+    LaneBitmask LiveBefore = (LiveAfter & ~DefLanes) | UseLanes;
+
+    CurPressure.inc(Reg, LiveAfter, LiveAfter & LiveBefore, *MRI);
+    MaxPressure = max(MaxPressure, CurPressure);
+  }
+  // Generate liveness for uses.
+  for (const RegisterMaskPair &P : RegOpers.Uses) {
+    Register Reg = P.RegUnit;
+    if (!Reg.isVirtual())
+      continue;
+    LaneBitmask LiveAfter = LiveRegs[Reg];
+    LaneBitmask LiveBefore = LiveAfter | P.LaneMask;
+    CurPressure.inc(Reg, LiveAfter, LiveBefore, *MRI);
+  }
+  MaxPressure = max(MaxPressure, CurPressure);
+}
+
 ////////////////////////////////////////////////////////////////////////////////
 // GCNDownwardRPTracker
 
@@ -430,28 +593,44 @@ bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
   return true;
 }
 
-bool GCNDownwardRPTracker::advanceBeforeNext() {
+bool GCNDownwardRPTracker::advanceBeforeNext(MachineInstr *MI,
+                                             bool UseInternalIterator,
+                                             LiveIntervals *TheLIS) {
   assert(MRI && "call reset first");
-  if (!LastTrackedMI)
-    return NextMI == MBBEnd;
-
-  assert(NextMI == MBBEnd || !NextMI->isDebugInstr());
+  SlotIndex SI;
+  const LiveIntervals *CurrLIS;
+  const MachineInstr *CurrMI;
+  if (UseInternalIterator) {
+    if (!LastTrackedMI)
+      return NextMI == MBBEnd;
+
+    assert(NextMI == MBBEnd || !NextMI->isDebugInstr());
+    CurrLIS = &LIS;
+    CurrMI = LastTrackedMI;
+
+    SI = NextMI == MBBEnd
+             ? CurrLIS->getInstructionIndex(*LastTrackedMI).getDeadSlot()
+             : CurrLIS->getInstructionIndex(*NextMI).getBaseIndex();
+  } else { //! UseInternalIterator
+    CurrLIS = TheLIS;
+    SI = CurrLIS->getInstructionIndex(*MI).getBaseIndex();
+    CurrMI = MI;
+  }
 
-  SlotIndex SI = NextMI == MBBEnd
-                     ? LIS.getInstructionIndex(*LastTrackedMI).getDeadSlot()
-                     : LIS.getInstructionIndex(*NextMI).getBaseIndex();
   assert(SI.isValid());
 
   // Remove dead registers or mask bits.
   SmallSet<Register, 8> SeenRegs;
-  for (auto &MO : LastTrackedMI->operands()) {
+  for (auto &MO : CurrMI->operands()) {
     if (!MO.isReg() || !MO.getReg().isVirtual())
       continue;
     if (MO.isUse() && !MO.readsReg())
       continue;
+    if (!UseInternalIterator && MO.isDef())
+      continue;
     if (!SeenRegs.insert(MO.getReg()).second)
       continue;
-    const LiveInterval &LI = LIS.getInterval(MO.getReg());
+    const LiveInterval &LI = CurrLIS->getInterval(MO.getReg());
     if (LI.hasSubRanges()) {
       auto It = LiveRegs.end();
       for (const auto &S : LI.subranges()) {
@@ -481,15 +660,22 @@ bool GCNDownwardRPTracker::advanceBeforeNext() {
 
   LastTrackedMI = nullptr;
 
-  return NextMI == MBBEnd;
+  return UseInternalIterator && (NextMI == MBBEnd);
 }
 
-void GCNDownwardRPTracker::advanceToNext() {
-  LastTrackedMI = &*NextMI++;
-  NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
+void GCNDownwardRPTracker::advanceToNext(MachineInstr *MI,
+                                         bool UseInternalIterator) {
+  if (UseInternalIterator) {
+    LastTrackedMI = &*NextMI++;
+    NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
+  } else {
+    LastTrackedMI = MI;
+  }
+
+  const MachineInstr *CurrMI = LastTrackedMI;
 
   // Add new registers or mask bits.
-  for (const auto &MO : LastTrackedMI->all_defs()) {
+  for (const auto &MO : CurrMI->all_defs()) {
     Register Reg = MO.getReg();
     if (!Reg.isVirtual())
       continue;
@@ -502,11 +688,17 @@ void GCNDownwardRPTracker::advanceToNext() {
   MaxPressure = max(MaxPressure, CurPressure);
 }
 
-bool GCNDownwardRPTracker::advance() {
-  if (NextMI == MBBEnd)
+bool GCNDownwardRPTracker::advance(MachineInstr *MI, bool UseInternalIterator,
+                                   LiveIntervals *TheLIS) {
+  if (UseInternalIterator && NextMI == MBBEnd)
     return false;
-  advanceBeforeNext();
-  advanceToNext();
+
+  advanceBeforeNext(MI, UseInternalIterator, TheLIS);
+  advanceToNext(MI, UseInternalIterator);
+  if (!UseInternalIterator) {
+    // We must remove any dead def lanes from the current RP
+    advanceBeforeNext(MI, true, TheLIS);
+  }
   return true;
 }
 
@@ -548,6 +740,65 @@ Printable llvm::reportMismatch(const GCNRPTracker::LiveRegSet &LISLR,
   });
 }
 
+void GCNDownwardRPTracker::bumpDownwardPressure(const MachineInstr *MI,
+                                                const SIRegisterInfo *TRI) {
+  assert(!MI->isDebugOrPseudoInstr() && "Expect a nondebug instruction.");
+
+  SlotIndex SlotIdx;
+  SlotIdx = LIS.getInstructionIndex(*MI).getRegSlot();
+
+  // Account for register pressure similar to RegPressureTracker::recede().
+  RegisterOperands RegOpers;
+  RegOpers.collect(*MI, *TRI, *MRI, true, /*IgnoreDead=*/false);
+  RegOpers.adjustLaneLiveness(LIS, *MRI, SlotIdx);
+
+  for (const RegisterMaskPair &Use : RegOpers.Uses) {
+    Register Reg = Use.RegUnit;
+    if (!Reg.isVirtual())
+      continue;
+    LaneBitmask LastUseMask = getLastUsedLanes(Reg, SlotIdx);
+    if (LastUseMask.none())
+      continue;
+    // The LastUseMask is queried from the liveness information of instruction
+    // which may be further down the schedule. Some lanes may actually not be
+    // last uses for the current position.
+    // FIXME: allow the caller to pass in the list of vreg uses that remain
+    // to be bottom-scheduled to avoid searching uses at each query.
+    SlotIndex CurrIdx;
+    const MachineBasicBlock *MBB = MI->getParent();
+    MachineBasicBlock::const_iterator IdxPos = skipDebugInstructionsForward(
+        LastTrackedMI ? LastTrackedMI : MBB->begin(), MBB->end());
+    if (IdxPos == MBB->end()) {
+      CurrIdx = LIS.getMBBEndIdx(MBB);
+    } else {
+      CurrIdx = LIS.getInstructionIndex(*IdxPos).getRegSlot();
+    }
+
+    LastUseMask =
+        findUseBetween(Reg, LastUseMask, CurrIdx, SlotIdx, *MRI, TRI, &LIS);
+    if (LastUseMask.none())
+      continue;
+
+    LaneBitmask LiveMask = LiveRegs[Reg];
+    LaneBitmask NewMask = LiveMask & ~LastUseMask;
+    CurPressure.inc(Reg, LiveMask, NewMask, *MRI);
+  }
+
+  // Generate liveness for defs.
+  for (const RegisterMaskPair &Def : RegOpers.Defs) {
+    Register Reg = Def.RegUnit;
+    if (!Reg.isVirtual())
+      continue;
+    LaneBitmask LiveMask = LiveRegs[Reg];
+    LaneBitmask NewMask = LiveMask | Def.LaneMask;
+    CurPressure.inc(Reg, LiveMask, NewMask, *MRI);
+  }
+  MaxPressure = max(MaxPressure, CurPressure);
+
+  // Boost pressure for all dead defs together.
+  bumpDeadDefs(RegOpers.DeadDefs);
+}
+
 bool GCNUpwardRPTracker::isValid() const {
   const auto &SI = LIS.getInstructionIndex(*LastTrackedMI).getBaseIndex();
   const auto LISLR = llvm::getLiveRegs(SI, LIS, *MRI);
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index 54dc1972d27619..463da472bb69ff 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -19,6 +19,7 @@
 
 #include "GCNSubtarget.h"
 #include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/RegisterPressure.h"
 #include <algorithm>
 
 namespace llvm {
@@ -149,6 +150,9 @@ inline GCNRegPressure operator-(const GCNRegPressure &P1,
   return Diff;
 }
 
+///////////////////////////////////////////////////////////////////////////////
+// GCNRPTracker
+
 class GCNRPTracker {
 public:
   using LiveRegSet = DenseMap<unsigned, LaneBitmask>;
@@ -165,7 +169,14 @@ class GCNRPTracker {
   void reset(const MachineInstr &MI, const LiveRegSet *LiveRegsCopy,
              bool After);
 
+  /// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+  void bumpDeadDefs(ArrayRef<RegisterMaskPair> DeadDefs);
+
+  LaneBitmask getLastUsedLanes(Register RegUnit, SlotIndex Pos) const;
+
 public:
+  // reset tracker and set live register set to the specified value.
+  void reset(const MachineRegisterInfo &MRI_, const LiveRegSet &LiveRegs_);
   // live regs for the current state
   const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }
   const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }
@@ -182,34 +193,45 @@ class GCNRPTracker {
 GCNRPTracker::LiveRegSet getLiveRegs(SlotIndex SI, const LiveIntervals &LIS,
                                      const MachineRegisterInfo &MRI);
 
+////////////////////////////////////////////////////////////////////////////////
+// GCNUpwardRPTracker
+
 class GCNUpwardRPTracker : public GCNRPTracker {
 public:
   GCNUpwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}
 
-  // reset tracker and set live register set to the specified value.
-  void reset(const MachineRegisterInfo &MRI_, const LiveRegSet &LiveRegs_);
+  using GCNRPTracker::reset;
 
-  // reset tracker at the specified slot index.
+  /// reset tracker at the specified slot index \p SI.
   void reset(const MachineRegisterInfo &MRI, SlotIndex SI) {
-    reset(MRI, llvm::getLiveRegs(SI, LIS, MRI));
+    GCNRPTracker::reset(MRI, llvm::getLiveRegs(SI, LIS, MRI));
   }
 
-  // reset tracker to the end of the MBB.
+  /// reset tracker to the end of the \p MBB.
   void reset(const MachineBasicBlock &MBB) {
     reset(MBB.getParent()->getRegInfo(),
           LIS.getSlotIndexes()->getMBBEndIdx(&MBB));
   }
 
-  // reset tracker to the point just after MI (in program order).
+  /// reset tracker to the point just after \p MI (in program order).
   void reset(const MachineInstr &MI) {
     reset(MI.getMF()->getRegInfo(), LIS.getInstructionIndex(MI).getDeadSlot());
   }
 
-  // move to the state just before the MI (in program order).
+  /// Move to the state of RP just before the \p MI . If \p UseInternalIterator
+  /// is set, also update the internal iterators. Setting \p UseInternalIterator
+  /// to false allows for an externally managed iterator / program order.
   void recede(const MachineInstr &MI);
 
-  // checks whether the tracker's state after receding MI corresponds
-  // to reported by LIS.
+  /// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+  /// Calculate the impact \p MI will have on CurPressure and MaxPressure. This
+  /// does not rely on the implicit program ordering in the LiveIntervals to
+  /// support RP Speculation. It leaves the state of pressure inconsistent with
+  /// the current position
+  void bumpUpwardPressure(const MachineInstr *MI, const SIRegisterInfo *TRI);
+
+  /// \p returns whether the tracker's state after receding MI corresponds
+  /// to reported by LIS.
   bool isValid() const;
 
   const GCNRegPressure &getMaxPressure() const { return MaxPressure; }
@@ -223,6 +245,9 @@ class GCNUpwardRPTracker : public GCNRPTracker {
   }
 };
 
+////////////////////////////////////////////////////////////////////////////////
+// GCNDownwardRPTracker
+
 class GCNDownwardRPTracker : public GCNRPTracker {
   // Last position of reset or advanceBeforeNext
   MachineBasicBlock::const_iterator NextMI;
@@ -232,37 +257,67 @@ class GCNDownwardRPTracker : public GCNRPTracker {
 public:
   GCNDownwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}
 
+  using GCNRPTracker::reset;
+
   MachineBasicBlock::const_iterator getNext() const { return NextMI; }
 
-  // Return MaxPressure and clear it.
+  /// \p return MaxPressure and clear it.
   GCNRegPressure moveMaxPressure() {
     auto Res = MaxPressure;
     MaxPressure.clear();
     return Res;
   }
 
-  // Reset tracker to the point before the MI
-  // filling live regs upon this point using LIS.
-  // Returns false if block is empty except debug values.
+  /// Reset tracker to the point before the \p MI
+  /// filling \p LiveRegs upon this point using LIS.
+  /// \p returns false if block is empty except debug values.
   bool reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr);
 
-  // Move to the state right before the next MI or after the end of MBB.
-  // Returns false if reached end of the block.
-  bool advanceBeforeNext();
-
-  // Move to the state at the MI, advanceBeforeNext has to be called first.
-  void advanceToNext();
-
-  // Move to the state at the next MI. Returns false if reached end of block.
-  bool advance();
-
-  // Advance instructions until before End.
+  /// Move to the state right before the next MI or after the end of MBB.
+  /// \p returns false if reached end of the block.
+  /// If \p UseInternalIterator is true...
[truncated]

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Sep 27, 2024

Added several tests based on the more significant lit changes.

Since this is a change which impacts scheduling heuristics, the changes are a mix of regressions and improvements. I've analyzed and included all the significant regressions, and added comments / fixmes to explain the situation. The causes are: 1. not tracking PhysRegs, 2. the scheduler improving RP in initial stage and not triggering high-RP stage, and 3. the scheduler being overly sensitive to minor changes in RP, and not being willing to incur a small penalty to enable scheduling of RP reducing instructions.

These will be addressed in a follow-up PR which enables the GCN RP Tracker features by default.

I've also included several tests where things go as expected -- more accurate RP Tracking leads to reductions in RP and, ultimately, improving occupancy

@jrbyrnes jrbyrnes force-pushed the GCNTrackerRefactorRebase1 branch from acbca50 to bf61d05 Compare October 3, 2024 22:17
@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 3, 2024

Force push to rebase + solve merge conflicts

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 7, 2024

Fix recede to handle LIS program order violations + delete bumpUpwardPressure

@jrbyrnes jrbyrnes force-pushed the GCNTrackerRefactorRebase1 branch from bb3bcc7 to 35ab173 Compare October 8, 2024 17:44
@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 8, 2024

Force push to rebase for 5cb6b15

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 8, 2024

Latest passes psdb with GCNTrackers enabled

@vpykhtin
Copy link
Contributor

vpykhtin commented Oct 9, 2024

No more objections from my side given that the patch is under a CL flag.

@jrbyrnes jrbyrnes force-pushed the GCNTrackerRefactorRebase1 branch from aa74786 to 276193f Compare October 9, 2024 16:43
@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 9, 2024

Having issues with CLI, force-push to land via web UI

@jrbyrnes jrbyrnes merged commit 17bc959 into llvm:main Oct 9, 2024
5 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 9, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/6882

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
80.048 [40/12/3930] Linking CXX executable bin/sancov
80.066 [40/11/3931] Linking CXX executable bin/llvm-objdump
80.067 [39/11/3932] Linking CXX executable bin/llvm-nm
80.068 [39/10/3933] Generating ../../bin/llvm-otool
80.080 [39/9/3934] Linking CXX executable bin/llvm-debuginfo-analyzer
80.114 [39/8/3935] Linking CXX executable bin/llvm-cfi-verify
80.181 [39/7/3936] Linking CXX executable bin/llvm-profgen
80.396 [39/6/3937] Linking CXX executable bin/llvm-jitlink
86.163 [39/5/3938] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNRegPressure.cpp.o
88.857 [39/4/3939] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++ -DEXPENSIVE_CHECKS -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GLIBCXX_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/1/llvm-clang-x86_64-expensive-checks-debian/build/lib/Target/AMDGPU -I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Target/AMDGPU -I/b/1/llvm-clang-x86_64-expensive-checks-debian/build/include -I/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/include -U_GLIBCXX_DEBUG -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -c /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:238:26: error: use of undeclared identifier 'TheTracker'
                         TheTracker, UpwardTracker, DAG, SRI);
                         ^
1 error generated.
89.082 [39/3/3940] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/SIFormMemoryClauses.cpp.o
89.534 [39/2/3941] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNIterativeScheduler.cpp.o
104.244 [39/1/3942] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
ninja: build stopped: subcommand failed.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Dec 12, 2024
This adds the ability to use the GCNRPTrackers during scheduling. These trackers have several advantages over the generic trackers: 1. global live-thru trackers, 2. subregister based RP deltas, and 3. flexible vreg -> PressureSet mappings.

This feature is off-by-default to ease with the roll-out process. In particular, when using the optional trackers, the scheduler will still maintain the generic trackers leading to unnecessary compile time.

(cherry picked from commit 17bc959)
Change-Id: Ia45099682b69df7113b7736f052e2bde6f926ca4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants