Skip to content

[AMDGPU] NFC: Add BBLiveOutMap & LiveOut Cache #93089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jrbyrnes
Copy link
Contributor

This is part of a series of PRs which enable using the GCNRPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 .

This PR uses getLiveRegMap to calculate BBLiveOutMap, which ultimately provides the per-region liveouts to the RPTrackers.

Change-Id: I63cfd44e635cc4bee0e6780ca43b692c46e940b7
@llvmbot
Copy link
Member

llvmbot commented May 22, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

This is part of a series of PRs which enable using the GCNRPTrackers during scheduling. I've split them up to (hopefully) make reviewing easier. For context see #88797 .

This PR uses getLiveRegMap to calculate BBLiveOutMap, which ultimately provides the per-region liveouts to the RPTrackers.


Full diff: https://github.com/llvm/llvm-project/pull/93089.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+55-1)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+7)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 94d93390d0916..a4d05f62a7f74 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -58,6 +58,11 @@ static cl::opt<bool>
                         "Wave Limited (amdgpu-limit-wave-threshold)."),
                cl::init(false));
 
+static cl::opt<bool> GCNTrackers(
+    "amdgpu-use-amdgpu-trackers", cl::Hidden,
+    cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
+    cl::init(false));
+
 const unsigned ScheduleMetrics::ScaleFactor = 100;
 
 GCNSchedStrategy::GCNSchedStrategy(const MachineSchedContext *C)
@@ -526,6 +531,19 @@ GCNScheduleDAGMILive::getRealRegPressure(unsigned RegionIdx) const {
   return RPTracker.moveMaxPressure();
 }
 
+static MachineInstr *getLastMIForRegion(MachineBasicBlock::iterator RegionBegin,
+                                        MachineBasicBlock::iterator RegionEnd) {
+  MachineInstr *LastMI;
+  auto *BB = RegionBegin->getParent();
+  if (RegionEnd != BB->end() && !RegionEnd->isDebugInstr())
+    LastMI = &*RegionEnd;
+  else if (RegionEnd == BB->end())
+    LastMI = &*prev_nodbg(RegionEnd, RegionBegin);
+  else
+    LastMI = &*skipDebugInstructionsBackward(RegionEnd, RegionBegin);
+  return LastMI;
+}
+
 void GCNScheduleDAGMILive::computeBlockPressure(unsigned RegionIdx,
                                                 const MachineBasicBlock *MBB) {
   GCNDownwardRPTracker RPTracker(*LIS);
@@ -597,6 +615,16 @@ void GCNScheduleDAGMILive::computeBlockPressure(unsigned RegionIdx,
     RPTracker.advanceBeforeNext();
     MBBLiveIns[OnlySucc] = RPTracker.moveLiveRegs();
   }
+
+  if (GCNTrackers) {
+    assert(LiveOuts.size() == Regions.size());
+    for (unsigned RegionIdx = 0; RegionIdx < Regions.size(); RegionIdx++) {
+      auto RegionBegin = Regions[RegionIdx].first;
+      auto RegionEnd = Regions[RegionIdx].second;
+      MachineInstr *LastMI = getLastMIForRegion(RegionBegin, RegionEnd);
+      LiveOuts[RegionIdx] = BBLiveOutMap.lookup(LastMI);
+    }
+  }
 }
 
 DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet>
@@ -616,11 +644,24 @@ GCNScheduleDAGMILive::getBBLiveInMap() const {
   return getLiveRegMap(BBStarters, false /*After*/, *LIS);
 }
 
+DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet>
+GCNScheduleDAGMILive::getBBLiveOutMap() const {
+  assert(!Regions.empty());
+  std::vector<MachineInstr *> BBEnders;
+  BBEnders.reserve(Regions.size());
+  auto I = Regions.rbegin(), E = Regions.rend();
+  for (; I != E; I++)
+    BBEnders.push_back(getLastMIForRegion(I->first, I->second));
+
+  return getLiveRegMap(BBEnders, true /*After*/, *LIS);
+}
+
 void GCNScheduleDAGMILive::finalizeSchedule() {
   // Start actual scheduling here. This function is called by the base
   // MachineScheduler after all regions have been recorded by
   // GCNScheduleDAGMILive::schedule().
   LiveIns.resize(Regions.size());
+  LiveOuts.resize(Regions.size());
   Pressure.resize(Regions.size());
   RescheduleRegions.resize(Regions.size());
   RegionsWithHighRP.resize(Regions.size());
@@ -639,8 +680,12 @@ void GCNScheduleDAGMILive::finalizeSchedule() {
 void GCNScheduleDAGMILive::runSchedStages() {
   LLVM_DEBUG(dbgs() << "All regions recorded, starting actual scheduling.\n");
 
-  if (!Regions.empty())
+  if (!Regions.empty()) {
     BBLiveInMap = getBBLiveInMap();
+    if (GCNTrackers) {
+      BBLiveOutMap = getBBLiveOutMap();
+    }
+  }
 
   GCNSchedStrategy &S = static_cast<GCNSchedStrategy &>(*SchedImpl);
   while (S.advanceStage()) {
@@ -1499,6 +1544,15 @@ bool PreRARematStage::sinkTriviallyRematInsts(const GCNSubtarget &ST,
   DAG.Regions = NewRegions;
   DAG.RescheduleRegions = NewRescheduleRegions;
 
+  if (GCNTrackers) {
+    DAG.BBLiveOutMap = DAG.getBBLiveOutMap();
+    auto I = DAG.Regions.begin(), E = DAG.Regions.end();
+    for (; I != E; I++) {
+      MachineInstr *LastMI = getLastMIForRegion(I->first, I->second);
+      DAG.LiveOuts.push_back(DAG.BBLiveOutMap.lookup(LastMI));
+    }
+  }
+
   SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
   MFI.increaseOccupancy(MF, ++DAG.MinOccupancy);
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 2084aae4128ff..243bb7f0c094d 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -205,6 +205,9 @@ class GCNScheduleDAGMILive final : public ScheduleDAGMILive {
   // Region live-in cache.
   SmallVector<GCNRPTracker::LiveRegSet, 32> LiveIns;
 
+  // Region live-out cache.
+  SmallVector<GCNRPTracker::LiveRegSet, 32> LiveOuts;
+
   // Region pressure cache.
   SmallVector<GCNRegPressure, 32> Pressure;
 
@@ -215,6 +218,10 @@ class GCNScheduleDAGMILive final : public ScheduleDAGMILive {
 
   DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> getBBLiveInMap() const;
 
+  DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> BBLiveOutMap;
+
+  DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> getBBLiveOutMap() const;
+
   // Return current region pressure.
   GCNRegPressure getRealRegPressure(unsigned RegionIdx) const;
 

Change-Id: Iaeaa9bc5b037d78ab965c3bc1778d424e37eb546
Copy link

github-actions bot commented May 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Change-Id: I8418e9dd9571feb8cdbb32623f21ecb2ff41aa9e
@vpykhtin
Copy link
Contributor

vpykhtin commented May 24, 2024

You've added two maps containing the same value but with different key. Should we only keep LiveOuts indexed by the region index? If not then maybe it worth to add a level of indirection and add a map of LastMIForRegion to RegionIdx or vice-versa? LiveRegSet is relatively heavy to copy and store and I would avoid that (this is a deep copy not the pointer copy).

Change-Id: I4f8fa1415194aeda4cc6318c6213e9c71a2d50dc
@jrbyrnes
Copy link
Contributor Author

Should we only keep LiveOuts indexed by the region index? If not then maybe it worth to add a level of indirection and add a map of LastMIForRegion to RegionIdx or vice-versa?

Okay, I've added the wrapper to add a layer of indirection to support the indexing that we want (by RegionIdx). The reason why we need this is because scheduling may change the last MI per region. Thus, we cannot assume that iterators in the Regions vector will be valid Map keys for our LiveRegs across different scheduling stages. The RegionPressureMap constructs a map of LiveRegs based on the iterators of the original program order, and a map of RegionIdx to these original iterators. Using this layer of indirection we can index the LiveRegs by RegionIdx.

For this version of the feature, the indirection approach is the appropriate one. In #88797 , I redefined getLiveRegMap to construct a map by the RegionIdx, but that introduced too many changes.

Comment on lines +169 to +171
// The live in/out pressure as indexed by the first or last MI in the region
// before scheduling.
DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> BBLiveRegMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this is making the pressure tracking even more complicated and expensive. LiveRegSet already felt wrong, and this adds another level on top of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is adding an instance of DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> . However, this PR is not introducing this new level above GCNRPTracker::LiveRegSet, as it already exists in our code

DenseMap<MachineInstr *, GCNRPTracker::LiveRegSet> BBLiveInMap;
, so I'm not sure that that particular class is unduly complicating our tracking. Also, this cache is used to avoid recalculation of the liveouts across stages. Obviously, there are some costs to creating the map, but I assume it is making things less expensive in general.

Perhaps more importantly, I don't think there is an alternative to this besides deleting the cache. Implementing a LIS->getLiveRegsAt(SlotIndex) with cached results would probably be counterproductive due to the maintenance required for each call to SlotIndex insertMachineInstrInMaps during scheduling. We could introduce and cache GCNPressureDiffs (instead of the new trackers) which would use our more accurate RP costs to encode the PDiff. But this is ultimately just a different way to build the cache, and I'm not sure this is worth a complete redesign (especially since there may be issues with inconsistent state between tracker and pdiffs).

Change-Id: Ic703ad43fa7fd49afcb1d17c30325eb0dd7a8355
Change-Id: I90da0419eec73e7166e7b156796462a8ce6d5497
Change-Id: I90da0419eec73e7166e7b156796462a8ce6d5497
@jrbyrnes
Copy link
Contributor Author

Thanks for the review.

It is to be landed atomically in #93090

@jrbyrnes
Copy link
Contributor Author

jrbyrnes commented Oct 9, 2024

Landed with 17bc959

@jrbyrnes jrbyrnes closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants