-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU] Optionally Use GCNRPTrackers during scheduling #93090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
11b43a3
to
7b6eb7e
Compare
7b6eb7e
to
37e7242
Compare
37e7242
to
1c4a370
Compare
Address review comments + rebase on top of d890dda |
46205fe
to
5ce0ab5
Compare
MaxPressure = max(MaxPressure, CurPressure); | ||
for (const RegisterMaskPair &P : DeadDefs) { | ||
Register Reg = P.RegUnit; | ||
if (!Reg.isVirtual()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead defs increase pressure of physregs too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the functions used to maintain the current state of pressure / live regs (i.e. recede, and advance) ignore PhysRegs. So, if we try to track PhysRegs in our speculation queries, we will run into inconsistency issues. I think enabling PhysReg tracking should be a separate PR, perhaps required to turn on the trackers by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently have a patch stuck for over a year due to the pressure tracker brokenly handling dead defs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this #76416 ?
if (MO.isUndef()) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No isUndef check, it is redundant with the readsReg case for uses and broken of the def of subregister case
@@ -414,6 +508,64 @@ void GCNUpwardRPTracker::recede(const MachineInstr &MI) { | |||
assert(CurPressure == getRegPressure(*MRI, LiveRegs)); | |||
} | |||
|
|||
void GCNUpwardRPTracker::bumpUpwardPressure(const MachineInstr *MI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really clear what this method does to the RP tracker object - modifies CurPressure and MaxPressure? What is the state of the object after the method call? From your code I only see one call against a temp object which may indicate that the state of a temp becomes undefined after a call. If so why don't make a non-method function returning max pressure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really clear what this method does to the RP tracker object - modifies CurPressure and MaxPressure? What is the state of the object after the method call?
This method modifies CurPressure and MaxPressure w.r.t the uses/defs in MI. It is meant to be used for RP Speculation, so MI is not required to be the next instruction in the LIS. Due to this requirement, we can't use recede which uses the implied program order in LIS
to calculate live lane masks.
From your code I only see one call against a temp object which may indicate that the state of a temp becomes undefined after a call. If so why don't make a non-method function returning max pressure?
Since we are calculting the impact on RP for a SchedCandidate, we don't want to commit the new pressure state to the tracker (hence why we use the temp object). However, calculating RP w.r.t some instruction feels like the responsibility of the tracker, and keeping it as a method is consistent with the generic implementation.
CurPressure.inc(Reg, LiveAfter, LiveBefore, *MRI); | ||
} | ||
MaxPressure = max(MaxPressure, CurPressure); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall algorithm is really cumbersome. Why do you need findUseBetween?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need findUseBetween?
For bumpUpwardPressure, we actually don't. Thanks for pointing this out. I used in an attempt to correct for a subreg liveness resulting in copying the generic implementation. This was caused by RegisterOperands::adjustLaneLiveness. This method sets the LaneMask of a use to the live lanes of the register at the position of the MI in the LIS. This adds lanes that are live at the MI, but not necessarily used by the MI. findUseBetween was meant to correct this, but this was not obvious and possibly incorrect. I've removed this and the call to and replaced adjustLaneLiveness with adjustDefLaneLiveness (for def lane handling).
I spent a while looking RP speculation in the face of dead lanes in superreg defs. In the current code (as propagated from the generic trackers), we basically just ignore these lanes when calcuting pressure impacts for speculative RP. However, this is inaccurate. For example, if we have an instruction which defines a vreg_128 and but only subregs 1 and 2 are used, then bumpDownwardPressure will speculate that this vgpr def increase VGPR pressure by 2. Since those dead regs are not needed after the instruction, this RP change is consistent with RAs view of conflicts for positions after the instruction. However, during the instruction (before the dead slot), we temporarily increase VGPR pressure by 4 to provide registers for those dead lanes. However, this may mislead the scheduler, particularly for bottom up scheduling in which defs reduce RP. Encoding this into RPTracker's CurPressure doesn't seem to make much sense, as this is used to model the pressure after the instruction (whether top-down or bottom-up direction). So, I think we should probably introduce a separate GCNRegPressure field in the trackers to model any temporary increases to RP, and the scheduler can use this to more accurately set heuristics. That said, this should probably be addressed in separate PR. |
Ping |
@llvm/pr-subscribers-backend-amdgpu Author: Jeffrey Byrnes (jrbyrnes) ChangesThis is part of a series of PRs which enable using the AMDGPU/GCN RPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 . Since this is the final PR in the series, any high level comments should go here. This PR adds the scheduling changes to: maintain the GCNRPTrackers during scheduling, and use them when making per-instruction scheduling decisions. This PR is for commits beginning at fbd0e54 and depends on #93088 and #93089 Patch is 38.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93090.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
index 061b0515031b1b..085eb8e37e3cd2 100644
--- a/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNIterativeScheduler.cpp
@@ -480,7 +480,7 @@ void GCNIterativeScheduler::scheduleLegacyMaxOccupancy(
LLVM_DEBUG(dbgs() << "Scheduling using default scheduler, "
"target occupancy = "
<< TgtOcc << '\n');
- GCNMaxOccupancySchedStrategy LStrgy(Context);
+ GCNMaxOccupancySchedStrategy LStrgy(Context, /*IsLegacyScheduler=*/true);
unsigned FinalOccupancy = std::min(Occ, MFI->getOccupancy());
for (int I = 0; I < NumPasses; ++I) {
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index c83af729f501fe..46bb3365a32337 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -288,6 +288,102 @@ collectVirtualRegUses(SmallVectorImpl<RegisterMaskPair> &RegMaskPairs,
}
}
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getRegLanes(ArrayRef<RegisterMaskPair> RegUnits,
+ Register RegUnit) {
+ auto I = llvm::find_if(RegUnits, [RegUnit](const RegisterMaskPair Other) {
+ return Other.RegUnit == RegUnit;
+ });
+ if (I == RegUnits.end())
+ return LaneBitmask::getNone();
+ return I->LaneMask;
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getLanesWithProperty(
+ const LiveIntervals &LIS, const MachineRegisterInfo &MRI,
+ bool TrackLaneMasks, Register RegUnit, SlotIndex Pos,
+ LaneBitmask SafeDefault,
+ function_ref<bool(const LiveRange &LR, SlotIndex Pos)> Property) {
+ if (RegUnit.isVirtual()) {
+ const LiveInterval &LI = LIS.getInterval(RegUnit);
+ LaneBitmask Result;
+ if (TrackLaneMasks && LI.hasSubRanges()) {
+ for (const LiveInterval::SubRange &SR : LI.subranges()) {
+ if (Property(SR, Pos))
+ Result |= SR.LaneMask;
+ }
+ } else if (Property(LI, Pos)) {
+ Result = TrackLaneMasks ? MRI.getMaxLaneMaskForVReg(RegUnit)
+ : LaneBitmask::getAll();
+ }
+
+ return Result;
+ }
+
+ const LiveRange *LR = LIS.getCachedRegUnit(RegUnit);
+ if (LR == nullptr)
+ return SafeDefault;
+ return Property(*LR, Pos) ? LaneBitmask::getAll() : LaneBitmask::getNone();
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+/// Helper to find a vreg use between two indices [PriorUseIdx, NextUseIdx).
+/// The query starts with a lane bitmask which gets lanes/bits removed for every
+/// use we find.
+static LaneBitmask findUseBetween(unsigned Reg, LaneBitmask LastUseMask,
+ SlotIndex PriorUseIdx, SlotIndex NextUseIdx,
+ const MachineRegisterInfo &MRI,
+ const SIRegisterInfo *TRI,
+ const LiveIntervals *LIS,
+ bool Upward = false) {
+ for (const MachineOperand &MO : MRI.use_nodbg_operands(Reg)) {
+ if (MO.isUndef())
+ continue;
+ const MachineInstr *MI = MO.getParent();
+ SlotIndex InstSlot = LIS->getInstructionIndex(*MI).getRegSlot();
+ bool InRange = Upward ? (InstSlot > PriorUseIdx && InstSlot <= NextUseIdx)
+ : (InstSlot >= PriorUseIdx && InstSlot < NextUseIdx);
+ if (InRange) {
+ unsigned SubRegIdx = MO.getSubReg();
+ LaneBitmask UseMask = TRI->getSubRegIndexLaneMask(SubRegIdx);
+ LastUseMask &= ~UseMask;
+ if (LastUseMask.none())
+ return LaneBitmask::getNone();
+ }
+ }
+ return LastUseMask;
+}
+
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+static LaneBitmask getLiveLanesAt(const LiveIntervals &LIS,
+ const MachineRegisterInfo &MRI,
+ bool TrackLaneMasks, Register RegUnit,
+ SlotIndex Pos) {
+ return getLanesWithProperty(
+ LIS, MRI, TrackLaneMasks, RegUnit, Pos, LaneBitmask::getAll(),
+ [](const LiveRange &LR, SlotIndex Pos) { return LR.liveAt(Pos); });
+}
+
+// Copy/paste from RegisterPressure.cpp (RegisterOperands::adjustLaneLiveness)
+static void adjustDefLaneLiveness(SmallVectorImpl<RegisterMaskPair> &Defs,
+ SlotIndex &Pos, const LiveIntervals &LIS,
+ const MachineRegisterInfo &MRI) {
+ for (auto *I = Defs.begin(); I != Defs.end();) {
+ LaneBitmask LiveAfter =
+ getLiveLanesAt(LIS, MRI, true, I->RegUnit, Pos.getDeadSlot());
+ // If the def is all that is live after the instruction, then in case
+ // of a subregister def we need a read-undef flag.
+ LaneBitmask ActualDef = I->LaneMask & LiveAfter;
+ if (ActualDef.none()) {
+ I = Defs.erase(I);
+ } else {
+ I->LaneMask = ActualDef;
+ ++I;
+ }
+ }
+}
+
///////////////////////////////////////////////////////////////////////////////
// GCNRPTracker
@@ -343,17 +439,41 @@ void GCNRPTracker::reset(const MachineInstr &MI,
MaxPressure = CurPressure = getRegPressure(*MRI, LiveRegs);
}
-////////////////////////////////////////////////////////////////////////////////
-// GCNUpwardRPTracker
-
-void GCNUpwardRPTracker::reset(const MachineRegisterInfo &MRI_,
- const LiveRegSet &LiveRegs_) {
+void GCNRPTracker::reset(const MachineRegisterInfo &MRI_,
+ const LiveRegSet &LiveRegs_) {
MRI = &MRI_;
LiveRegs = LiveRegs_;
LastTrackedMI = nullptr;
MaxPressure = CurPressure = getRegPressure(MRI_, LiveRegs_);
}
+void GCNRPTracker::bumpDeadDefs(ArrayRef<RegisterMaskPair> DeadDefs) {
+ GCNRegPressure TempPressure = CurPressure;
+ for (const RegisterMaskPair &P : DeadDefs) {
+ Register Reg = P.RegUnit;
+ if (!Reg.isVirtual())
+ continue;
+ LaneBitmask LiveMask = LiveRegs[Reg];
+ LaneBitmask BumpedMask = LiveMask | P.LaneMask;
+ CurPressure.inc(Reg, LiveMask, BumpedMask, *MRI);
+ }
+ MaxPressure = max(MaxPressure, CurPressure);
+ CurPressure = TempPressure;
+}
+/// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+LaneBitmask GCNRPTracker::getLastUsedLanes(Register RegUnit,
+ SlotIndex Pos) const {
+ return getLanesWithProperty(
+ LIS, *MRI, true, RegUnit, Pos.getBaseIndex(), LaneBitmask::getNone(),
+ [](const LiveRange &LR, SlotIndex Pos) {
+ const LiveRange::Segment *S = LR.getSegmentContaining(Pos);
+ return S != nullptr && S->end == Pos.getRegSlot();
+ });
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// GCNUpwardRPTracker
+
void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(MRI && "call reset first");
@@ -414,6 +534,49 @@ void GCNUpwardRPTracker::recede(const MachineInstr &MI) {
assert(CurPressure == getRegPressure(*MRI, LiveRegs));
}
+void GCNUpwardRPTracker::bumpUpwardPressure(const MachineInstr *MI,
+ const SIRegisterInfo *TRI) {
+ assert(!MI->isDebugOrPseudoInstr() && "Expect a nondebug instruction.");
+
+ SlotIndex SlotIdx = LIS.getInstructionIndex(*MI).getRegSlot();
+
+ // Account for register pressure similar to RegPressureTracker::recede().
+ RegisterOperands RegOpers;
+
+ RegOpers.collect(*MI, *TRI, *MRI, true, /*IgnoreDead=*/true);
+ assert(RegOpers.DeadDefs.empty());
+ adjustDefLaneLiveness(RegOpers.Defs, SlotIdx, LIS, *MRI);
+ RegOpers.detectDeadDefs(*MI, LIS);
+
+ // Boost max pressure for all dead defs together.
+ // Since CurrSetPressure and MaxSetPressure
+ bumpDeadDefs(RegOpers.DeadDefs);
+
+ // Kill liveness at live defs.
+ for (const RegisterMaskPair &P : RegOpers.Defs) {
+ Register Reg = P.RegUnit;
+ if (!Reg.isVirtual())
+ continue;
+ LaneBitmask LiveAfter = LiveRegs[Reg];
+ LaneBitmask UseLanes = getRegLanes(RegOpers.Uses, Reg);
+ LaneBitmask DefLanes = P.LaneMask;
+ LaneBitmask LiveBefore = (LiveAfter & ~DefLanes) | UseLanes;
+
+ CurPressure.inc(Reg, LiveAfter, LiveAfter & LiveBefore, *MRI);
+ MaxPressure = max(MaxPressure, CurPressure);
+ }
+ // Generate liveness for uses.
+ for (const RegisterMaskPair &P : RegOpers.Uses) {
+ Register Reg = P.RegUnit;
+ if (!Reg.isVirtual())
+ continue;
+ LaneBitmask LiveAfter = LiveRegs[Reg];
+ LaneBitmask LiveBefore = LiveAfter | P.LaneMask;
+ CurPressure.inc(Reg, LiveAfter, LiveBefore, *MRI);
+ }
+ MaxPressure = max(MaxPressure, CurPressure);
+}
+
////////////////////////////////////////////////////////////////////////////////
// GCNDownwardRPTracker
@@ -430,28 +593,44 @@ bool GCNDownwardRPTracker::reset(const MachineInstr &MI,
return true;
}
-bool GCNDownwardRPTracker::advanceBeforeNext() {
+bool GCNDownwardRPTracker::advanceBeforeNext(MachineInstr *MI,
+ bool UseInternalIterator,
+ LiveIntervals *TheLIS) {
assert(MRI && "call reset first");
- if (!LastTrackedMI)
- return NextMI == MBBEnd;
-
- assert(NextMI == MBBEnd || !NextMI->isDebugInstr());
+ SlotIndex SI;
+ const LiveIntervals *CurrLIS;
+ const MachineInstr *CurrMI;
+ if (UseInternalIterator) {
+ if (!LastTrackedMI)
+ return NextMI == MBBEnd;
+
+ assert(NextMI == MBBEnd || !NextMI->isDebugInstr());
+ CurrLIS = &LIS;
+ CurrMI = LastTrackedMI;
+
+ SI = NextMI == MBBEnd
+ ? CurrLIS->getInstructionIndex(*LastTrackedMI).getDeadSlot()
+ : CurrLIS->getInstructionIndex(*NextMI).getBaseIndex();
+ } else { //! UseInternalIterator
+ CurrLIS = TheLIS;
+ SI = CurrLIS->getInstructionIndex(*MI).getBaseIndex();
+ CurrMI = MI;
+ }
- SlotIndex SI = NextMI == MBBEnd
- ? LIS.getInstructionIndex(*LastTrackedMI).getDeadSlot()
- : LIS.getInstructionIndex(*NextMI).getBaseIndex();
assert(SI.isValid());
// Remove dead registers or mask bits.
SmallSet<Register, 8> SeenRegs;
- for (auto &MO : LastTrackedMI->operands()) {
+ for (auto &MO : CurrMI->operands()) {
if (!MO.isReg() || !MO.getReg().isVirtual())
continue;
if (MO.isUse() && !MO.readsReg())
continue;
+ if (!UseInternalIterator && MO.isDef())
+ continue;
if (!SeenRegs.insert(MO.getReg()).second)
continue;
- const LiveInterval &LI = LIS.getInterval(MO.getReg());
+ const LiveInterval &LI = CurrLIS->getInterval(MO.getReg());
if (LI.hasSubRanges()) {
auto It = LiveRegs.end();
for (const auto &S : LI.subranges()) {
@@ -481,15 +660,22 @@ bool GCNDownwardRPTracker::advanceBeforeNext() {
LastTrackedMI = nullptr;
- return NextMI == MBBEnd;
+ return UseInternalIterator && (NextMI == MBBEnd);
}
-void GCNDownwardRPTracker::advanceToNext() {
- LastTrackedMI = &*NextMI++;
- NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
+void GCNDownwardRPTracker::advanceToNext(MachineInstr *MI,
+ bool UseInternalIterator) {
+ if (UseInternalIterator) {
+ LastTrackedMI = &*NextMI++;
+ NextMI = skipDebugInstructionsForward(NextMI, MBBEnd);
+ } else {
+ LastTrackedMI = MI;
+ }
+
+ const MachineInstr *CurrMI = LastTrackedMI;
// Add new registers or mask bits.
- for (const auto &MO : LastTrackedMI->all_defs()) {
+ for (const auto &MO : CurrMI->all_defs()) {
Register Reg = MO.getReg();
if (!Reg.isVirtual())
continue;
@@ -502,11 +688,17 @@ void GCNDownwardRPTracker::advanceToNext() {
MaxPressure = max(MaxPressure, CurPressure);
}
-bool GCNDownwardRPTracker::advance() {
- if (NextMI == MBBEnd)
+bool GCNDownwardRPTracker::advance(MachineInstr *MI, bool UseInternalIterator,
+ LiveIntervals *TheLIS) {
+ if (UseInternalIterator && NextMI == MBBEnd)
return false;
- advanceBeforeNext();
- advanceToNext();
+
+ advanceBeforeNext(MI, UseInternalIterator, TheLIS);
+ advanceToNext(MI, UseInternalIterator);
+ if (!UseInternalIterator) {
+ // We must remove any dead def lanes from the current RP
+ advanceBeforeNext(MI, true, TheLIS);
+ }
return true;
}
@@ -548,6 +740,65 @@ Printable llvm::reportMismatch(const GCNRPTracker::LiveRegSet &LISLR,
});
}
+void GCNDownwardRPTracker::bumpDownwardPressure(const MachineInstr *MI,
+ const SIRegisterInfo *TRI) {
+ assert(!MI->isDebugOrPseudoInstr() && "Expect a nondebug instruction.");
+
+ SlotIndex SlotIdx;
+ SlotIdx = LIS.getInstructionIndex(*MI).getRegSlot();
+
+ // Account for register pressure similar to RegPressureTracker::recede().
+ RegisterOperands RegOpers;
+ RegOpers.collect(*MI, *TRI, *MRI, true, /*IgnoreDead=*/false);
+ RegOpers.adjustLaneLiveness(LIS, *MRI, SlotIdx);
+
+ for (const RegisterMaskPair &Use : RegOpers.Uses) {
+ Register Reg = Use.RegUnit;
+ if (!Reg.isVirtual())
+ continue;
+ LaneBitmask LastUseMask = getLastUsedLanes(Reg, SlotIdx);
+ if (LastUseMask.none())
+ continue;
+ // The LastUseMask is queried from the liveness information of instruction
+ // which may be further down the schedule. Some lanes may actually not be
+ // last uses for the current position.
+ // FIXME: allow the caller to pass in the list of vreg uses that remain
+ // to be bottom-scheduled to avoid searching uses at each query.
+ SlotIndex CurrIdx;
+ const MachineBasicBlock *MBB = MI->getParent();
+ MachineBasicBlock::const_iterator IdxPos = skipDebugInstructionsForward(
+ LastTrackedMI ? LastTrackedMI : MBB->begin(), MBB->end());
+ if (IdxPos == MBB->end()) {
+ CurrIdx = LIS.getMBBEndIdx(MBB);
+ } else {
+ CurrIdx = LIS.getInstructionIndex(*IdxPos).getRegSlot();
+ }
+
+ LastUseMask =
+ findUseBetween(Reg, LastUseMask, CurrIdx, SlotIdx, *MRI, TRI, &LIS);
+ if (LastUseMask.none())
+ continue;
+
+ LaneBitmask LiveMask = LiveRegs[Reg];
+ LaneBitmask NewMask = LiveMask & ~LastUseMask;
+ CurPressure.inc(Reg, LiveMask, NewMask, *MRI);
+ }
+
+ // Generate liveness for defs.
+ for (const RegisterMaskPair &Def : RegOpers.Defs) {
+ Register Reg = Def.RegUnit;
+ if (!Reg.isVirtual())
+ continue;
+ LaneBitmask LiveMask = LiveRegs[Reg];
+ LaneBitmask NewMask = LiveMask | Def.LaneMask;
+ CurPressure.inc(Reg, LiveMask, NewMask, *MRI);
+ }
+ MaxPressure = max(MaxPressure, CurPressure);
+
+ // Boost pressure for all dead defs together.
+ bumpDeadDefs(RegOpers.DeadDefs);
+}
+
bool GCNUpwardRPTracker::isValid() const {
const auto &SI = LIS.getInstructionIndex(*LastTrackedMI).getBaseIndex();
const auto LISLR = llvm::getLiveRegs(SI, LIS, *MRI);
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index 54dc1972d27619..463da472bb69ff 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -19,6 +19,7 @@
#include "GCNSubtarget.h"
#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/RegisterPressure.h"
#include <algorithm>
namespace llvm {
@@ -149,6 +150,9 @@ inline GCNRegPressure operator-(const GCNRegPressure &P1,
return Diff;
}
+///////////////////////////////////////////////////////////////////////////////
+// GCNRPTracker
+
class GCNRPTracker {
public:
using LiveRegSet = DenseMap<unsigned, LaneBitmask>;
@@ -165,7 +169,14 @@ class GCNRPTracker {
void reset(const MachineInstr &MI, const LiveRegSet *LiveRegsCopy,
bool After);
+ /// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+ void bumpDeadDefs(ArrayRef<RegisterMaskPair> DeadDefs);
+
+ LaneBitmask getLastUsedLanes(Register RegUnit, SlotIndex Pos) const;
+
public:
+ // reset tracker and set live register set to the specified value.
+ void reset(const MachineRegisterInfo &MRI_, const LiveRegSet &LiveRegs_);
// live regs for the current state
const decltype(LiveRegs) &getLiveRegs() const { return LiveRegs; }
const MachineInstr *getLastTrackedMI() const { return LastTrackedMI; }
@@ -182,34 +193,45 @@ class GCNRPTracker {
GCNRPTracker::LiveRegSet getLiveRegs(SlotIndex SI, const LiveIntervals &LIS,
const MachineRegisterInfo &MRI);
+////////////////////////////////////////////////////////////////////////////////
+// GCNUpwardRPTracker
+
class GCNUpwardRPTracker : public GCNRPTracker {
public:
GCNUpwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}
- // reset tracker and set live register set to the specified value.
- void reset(const MachineRegisterInfo &MRI_, const LiveRegSet &LiveRegs_);
+ using GCNRPTracker::reset;
- // reset tracker at the specified slot index.
+ /// reset tracker at the specified slot index \p SI.
void reset(const MachineRegisterInfo &MRI, SlotIndex SI) {
- reset(MRI, llvm::getLiveRegs(SI, LIS, MRI));
+ GCNRPTracker::reset(MRI, llvm::getLiveRegs(SI, LIS, MRI));
}
- // reset tracker to the end of the MBB.
+ /// reset tracker to the end of the \p MBB.
void reset(const MachineBasicBlock &MBB) {
reset(MBB.getParent()->getRegInfo(),
LIS.getSlotIndexes()->getMBBEndIdx(&MBB));
}
- // reset tracker to the point just after MI (in program order).
+ /// reset tracker to the point just after \p MI (in program order).
void reset(const MachineInstr &MI) {
reset(MI.getMF()->getRegInfo(), LIS.getInstructionIndex(MI).getDeadSlot());
}
- // move to the state just before the MI (in program order).
+ /// Move to the state of RP just before the \p MI . If \p UseInternalIterator
+ /// is set, also update the internal iterators. Setting \p UseInternalIterator
+ /// to false allows for an externally managed iterator / program order.
void recede(const MachineInstr &MI);
- // checks whether the tracker's state after receding MI corresponds
- // to reported by LIS.
+ /// Mostly copy/paste from CodeGen/RegisterPressure.cpp
+ /// Calculate the impact \p MI will have on CurPressure and MaxPressure. This
+ /// does not rely on the implicit program ordering in the LiveIntervals to
+ /// support RP Speculation. It leaves the state of pressure inconsistent with
+ /// the current position
+ void bumpUpwardPressure(const MachineInstr *MI, const SIRegisterInfo *TRI);
+
+ /// \p returns whether the tracker's state after receding MI corresponds
+ /// to reported by LIS.
bool isValid() const;
const GCNRegPressure &getMaxPressure() const { return MaxPressure; }
@@ -223,6 +245,9 @@ class GCNUpwardRPTracker : public GCNRPTracker {
}
};
+////////////////////////////////////////////////////////////////////////////////
+// GCNDownwardRPTracker
+
class GCNDownwardRPTracker : public GCNRPTracker {
// Last position of reset or advanceBeforeNext
MachineBasicBlock::const_iterator NextMI;
@@ -232,37 +257,67 @@ class GCNDownwardRPTracker : public GCNRPTracker {
public:
GCNDownwardRPTracker(const LiveIntervals &LIS_) : GCNRPTracker(LIS_) {}
+ using GCNRPTracker::reset;
+
MachineBasicBlock::const_iterator getNext() const { return NextMI; }
- // Return MaxPressure and clear it.
+ /// \p return MaxPressure and clear it.
GCNRegPressure moveMaxPressure() {
auto Res = MaxPressure;
MaxPressure.clear();
return Res;
}
- // Reset tracker to the point before the MI
- // filling live regs upon this point using LIS.
- // Returns false if block is empty except debug values.
+ /// Reset tracker to the point before the \p MI
+ /// filling \p LiveRegs upon this point using LIS.
+ /// \p returns false if block is empty except debug values.
bool reset(const MachineInstr &MI, const LiveRegSet *LiveRegs = nullptr);
- // Move to the state right before the next MI or after the end of MBB.
- // Returns false if reached end of the block.
- bool advanceBeforeNext();
-
- // Move to the state at the MI, advanceBeforeNext has to be called first.
- void advanceToNext();
-
- // Move to the state at the next MI. Returns false if reached end of block.
- bool advance();
-
- // Advance instructions until before End.
+ /// Move to the state right before the next MI or after the end of MBB.
+ /// \p returns false if reached end of the block.
+ /// If \p UseInternalIterator is true...
[truncated]
|
Added several tests based on the more significant lit changes. Since this is a change which impacts scheduling heuristics, the changes are a mix of regressions and improvements. I've analyzed and included all the significant regressions, and added comments / fixmes to explain the situation. The causes are: 1. not tracking PhysRegs, 2. the scheduler improving RP in initial stage and not triggering high-RP stage, and 3. the scheduler being overly sensitive to minor changes in RP, and not being willing to incur a small penalty to enable scheduling of RP reducing instructions. These will be addressed in a follow-up PR which enables the GCN RP Tracker features by default. I've also included several tests where things go as expected -- more accurate RP Tracking leads to reductions in RP and, ultimately, improving occupancy |
acbca50
to
bf61d05
Compare
Force push to rebase + solve merge conflicts |
Fix recede to handle LIS program order violations + delete bumpUpwardPressure |
bb3bcc7
to
35ab173
Compare
Force push to rebase for 5cb6b15 |
Latest passes psdb with GCNTrackers enabled |
No more objections from my side given that the patch is under a CL flag. |
aa74786
to
276193f
Compare
Having issues with CLI, force-push to land via web UI |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/6882 Here is the relevant piece of the build log for the reference
|
This adds the ability to use the GCNRPTrackers during scheduling. These trackers have several advantages over the generic trackers: 1. global live-thru trackers, 2. subregister based RP deltas, and 3. flexible vreg -> PressureSet mappings. This feature is off-by-default to ease with the roll-out process. In particular, when using the optional trackers, the scheduler will still maintain the generic trackers leading to unnecessary compile time. (cherry picked from commit 17bc959) Change-Id: Ia45099682b69df7113b7736f052e2bde6f926ca4
This is part of a series of PRs which enable using the AMDGPU/GCN RPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 . Since this is the final PR in the series, any high level comments should go here.
This PR adds the scheduling changes to: maintain the GCNRPTrackers during scheduling, and use them when making per-instruction scheduling decisions.
This PR is for commits beginning at fbd0e54 and depends on #93088 and #93089