Skip to content

[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 28, 2025

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented Oct 8, 2024

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2024

@llvm/pr-subscribers-backend-x86

Author: Stephen Tozer (SLTozer)

Changes

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.


Full diff: https://github.com/llvm/llvm-project/pull/111551.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp (+36-24)
  • (added) llvm/test/CodeGen/X86/fake-use-remove-loads.mir (+129)
diff --git a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
index ef7a58670c3ac5..0670e1b8871286 100644
--- a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
+++ b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
@@ -94,11 +94,13 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
 
     for (MachineInstr &MI : make_early_inc_range(reverse(*MBB))) {
       if (MI.isFakeUse()) {
-        for (const MachineOperand &MO : MI.operands()) {
-          // Track the Fake Uses that use this register so that we can delete
-          // them if we delete the corresponding load.
-          if (MO.isReg())
-            RegFakeUses[MO.getReg()].push_back(&MI);
+        const MachineOperand &FakeUseOp = MI.getOperand(0);
+        // Track the Fake Uses that use this register so that we can delete
+        // them if we delete the corresponding load.
+        if (FakeUseOp.isReg()) {
+          assert(FakeUseOp.getReg().isPhysical() &&
+                 "VReg seen in function with NoVRegs set?");
+          RegFakeUses[FakeUseOp.getReg()].push_back(&MI);
         }
         // Do not record FAKE_USE uses in LivePhysRegs so that we can recognize
         // otherwise-unused loads.
@@ -113,27 +115,34 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
         // Don't delete live physreg defs, or any reserved register defs.
         if (!LivePhysRegs.available(Reg) || MRI->isReserved(Reg))
           continue;
-        // There should be an exact match between the loaded register and the
-        // FAKE_USE use. If not, this is a load that is unused by anything? It
-        // should probably be deleted, but that's outside of this pass' scope.
-        if (RegFakeUses.contains(Reg)) {
-          LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
-          // It is possible that some DBG_VALUE instructions refer to this
-          // instruction. They will be deleted in the live debug variable
-          // analysis.
-          MI.eraseFromParent();
-          AnyChanges = true;
-          ++NumLoadsDeleted;
-          // Each FAKE_USE now appears to be a fake use of the previous value
-          // of the loaded register; delete them to avoid incorrectly
-          // interpreting them as such.
-          for (MachineInstr *FakeUse : RegFakeUses[Reg]) {
+        // There should typically be an exact match between the loaded register
+        // and the FAKE_USE, but sometimes regalloc will choose to load a larger
+        // value than is needed. Therefore, as long as the load isn't used by
+        // anything except at least one FAKE_USE, we will delete it. If it isn't
+        // used by any fake uses, it should still be safe to delete but we
+        // choose to ignore it so that this pass has no side effects unrelated
+        // to fake uses.
+        bool HasFakeUse = false;
+        for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) {
+          if (!RegFakeUses.contains(SubReg))
+            continue;
+          HasFakeUse = true;
+          for (MachineInstr *FakeUse : RegFakeUses[SubReg]) {
             LLVM_DEBUG(dbgs()
                        << "RemoveLoadsIntoFakeUses: DELETING: " << *FakeUse);
             FakeUse->eraseFromParent();
           }
-          NumFakeUsesDeleted += RegFakeUses[Reg].size();
-          RegFakeUses[Reg].clear();
+          NumFakeUsesDeleted += RegFakeUses[SubReg].size();
+          RegFakeUses[SubReg].clear();
+        }
+        if (HasFakeUse) {
+          LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
+          // Since this load only exists to restore a spilled register and we
+          // haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
+          // for this load; otherwise, deleting this would be incorrect.
+          MI.eraseFromParent();
+          AnyChanges = true;
+          ++NumLoadsDeleted;
         }
         continue;
       }
@@ -147,8 +156,11 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
             Register Reg = MO.getReg();
             assert(Reg.isPhysical() &&
                    "VReg seen in function with NoVRegs set?");
-            for (MCRegUnit Unit : TRI->regunits(Reg))
-              RegFakeUses.erase(Unit);
+            // We clear RegFakeUses for this register and all subregisters,
+            // because any such FAKE_USE encountered prior applies only to this
+            // instruction.
+            for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg))
+              RegFakeUses.erase(SubReg);
           }
         }
       }
diff --git a/llvm/test/CodeGen/X86/fake-use-remove-loads.mir b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
new file mode 100644
index 00000000000000..a8edbed5b28fe5
--- /dev/null
+++ b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
@@ -0,0 +1,129 @@
+# Ensure that loads into FAKE_USEs are correctly removed by the
+# remove-loads-into-fake-uses pass.
+# RUN: llc -run-pass remove-loads-into-fake-uses -mtriple=x86_64-unknown-linux -debug-only=remove-loads-into-fake-uses 2>&1 -o - %s | FileCheck %s --implicit-check-not=DELETING
+# REQUIRES: asserts
+#
+## We ensure that the load into the FAKE_USE is removed, along with the FAKE_USE
+## itself, even when the FAKE_USE is for a subregister of the move, and that we
+## correctly handle situations where FAKE_USE has additional `killed` operands
+## added by other passes.
+
+# CHECK: DELETING: FAKE_USE renamable $eax
+# CHECK: DELETING: renamable $rax = MOV64rm $rbp
+
+## Also verify that the store to the stack slot still exists.
+
+# CHECK-LABEL: bb.5:
+# CHECK: MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+# CHECK-LABEL: bb.6:
+
+
+--- |
+  define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) {
+  entry:
+    ret void
+  }
+
+...
+---
+name:            _ZN1g1jEv
+alignment:       16
+tracksRegLiveness: true
+noPhis:          true
+noVRegs:         true
+hasFakeUses:     true
+tracksDebugUserValues: true
+liveins:
+  - { reg: '$rdi', virtual-reg: '' }
+  - { reg: '$esi', virtual-reg: '' }
+  - { reg: '$rdx', virtual-reg: '' }
+frameInfo:
+  isCalleeSavedInfoValid: true
+stack:
+  - { id: 0, name: '', type: spill-slot, offset: -64, size: 8, alignment: 8, 
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true, 
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+body:             |
+  bb.0:
+    successors: %bb.2(0x80000000)
+    liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $rbx
+  
+    frame-setup PUSH64r killed $rbp, implicit-def $rsp, implicit $rsp
+    frame-setup CFI_INSTRUCTION def_cfa_offset 16
+    frame-setup CFI_INSTRUCTION offset $rbp, -16
+    $rbp = frame-setup MOV64rr $rsp
+    frame-setup CFI_INSTRUCTION def_cfa_register $rbp
+    frame-setup PUSH64r killed $r15, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r14, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r13, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r12, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $rbx, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
+    CFI_INSTRUCTION offset $rbx, -56
+    CFI_INSTRUCTION offset $r12, -48
+    CFI_INSTRUCTION offset $r13, -40
+    CFI_INSTRUCTION offset $r14, -32
+    CFI_INSTRUCTION offset $r15, -24
+    $rbx = MOV64rr $rdx
+    $r14d = MOV32rr $esi
+    $r15 = MOV64rr $rdi
+    renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+    JMP_1 %bb.2
+  
+  bb.1:
+    successors: %bb.2(0x783e0f0f), %bb.6(0x07c1f0f1)
+    liveins: $rbx, $r12, $r15, $r13d, $r14d
+  
+    renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
+    FAKE_USE renamable $eax, implicit killed $rax
+    renamable $eax = MOV32ri 1, implicit-def $rax
+    TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
+    JCC_1 %bb.6, 9, implicit $eflags
+  
+  bb.2:
+    successors: %bb.3(0x80000000)
+    liveins: $rax, $rbx, $r12, $r15, $r14d
+  
+    MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
+  
+  bb.3:
+    successors: %bb.4(0x04000000), %bb.3(0x7c000000)
+    liveins: $eax, $rbx, $r12, $r15, $r14d
+  
+    $r13d = MOV32rr killed $eax
+    $rdi = MOV64rr $r15
+    CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
+    dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg :: (volatile load (s32) from %ir.ref.tmp5)
+    renamable $eax = MOV32ri 1
+    TEST8ri renamable $r14b, 1, implicit-def $eflags
+    JCC_1 %bb.3, 4, implicit $eflags
+  
+  bb.4:
+    successors: %bb.5(0x40000000), %bb.1(0x40000000)
+    liveins: $eflags, $rbx, $r12, $r15, $r13d, $r14d
+  
+    JCC_1 %bb.1, 4, implicit $eflags
+  
+  bb.5:
+    successors: %bb.1(0x80000000)
+    liveins: $rbx, $r12, $r15, $r13d, $r14d
+  
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+    MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+    CALL64r killed renamable $rax, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def dead $eax
+    JMP_1 %bb.1
+  
+  bb.6:
+    $rsp = frame-destroy ADD64ri32 $rsp, 8, implicit-def dead $eflags
+    $rbx = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r12 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r13 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r14 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r15 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8
+    RET64
+
+...

const MachineOperand &FakeUseOp = MI.getOperand(0);
// Track the Fake Uses that use this register so that we can delete
// them if we delete the corresponding load.
if (FakeUseOp.isReg()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even valid for this to be a non-register operand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - although there's no real meaning to non-register fake uses, at some points (that I don't recall off the top of my head) the operands can get replaced; it'd be equally valid to also just delete fake uses that have non-reg operands (and at any other point where we're examining fake uses).

Comment on lines 101 to 102
assert(FakeUseOp.getReg().isPhysical() &&
"VReg seen in function with NoVRegs set?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bother with the assertion, the verifier should catch it based on the properties. Probably should relax this anyway, for wasm

// choose to ignore it so that this pass has no side effects unrelated
// to fake uses.
bool HasFakeUse = false;
for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to consider sub registers. Do everything in terms of register units, and liveness queries on the instruction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the patch to use regunits - it requires adding an entry for each register unit used by each FAKE_USE in the map, and an extra SmallDenseSet, but I imagine it's more efficient this way than iterating over all subregisters.

Comment on lines +118 to +124
// There should typically be an exact match between the loaded register
// and the FAKE_USE, but sometimes regalloc will choose to load a larger
// value than is needed. Therefore, as long as the load isn't used by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it's just an optimization bug. Do you have an example?

Copy link
Contributor Author

@SLTozer SLTozer Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can post a full file if you want, but to summarize with snippets:
Pre-regalloc we have:

bb.5.for.end16:
  %2:gr32 = COPY killed %19:gr32
  FAKE_USE killed %2:gr32
  ...

After some merging, including %2 merging with %19, we get:

544B	bb.5.for.end16:
576B	  FAKE_USE %20.sub_32bit:gr64_with_sub_8bit

Then, after the following update:

selectOrSplit GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02 w=5.118239e-02
RS_Spill Cascade 0
Inline spilling GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02
From original %14
	cannot remat for 576e	FAKE_USE %20.sub_32bit:gr64_with_sub_8bit
Merged spilled regs: SS#0 [120r,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
spillAroundUses %20
	merged orig valno 1: SS#0 [112B,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
Checking redundant spills for 1@112B in %21 [96r,112B:0)[112B,120r:1)[592r,672B:2) 0@96r 1@112B-phi 2@592r  weight:1.348007e-01
Merged to stack int: SS#0 [112B,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
Checking redundant spills for 0@120r in %20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02
	hoisted: 112B	MOV64mr %stack.0, 1, $noreg, 0, $noreg, %21:gr64_with_sub_8bit :: (store (s64) into %stack.0)
	folded:   472r	MOV64mi32 %stack.0, 1, $noreg, 0, $noreg, 0 :: (store (s64) into %stack.0)
	reload:   552r	%23:gr64_with_sub_8bit = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
	rewrite: 576r	FAKE_USE killed %23.sub_32bit:gr64_with_sub_8bit

We end up with:

544B	bb.5.for.end16:
552B	  %23:gr64 = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
576B	  FAKE_USE %23.sub_32bit:gr64

Which becomes:

bb.5.for.end16:
  renamable $rax = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
  FAKE_USE renamable $eax, implicit killed $rax

Where $rax is entirely unused after the MOV other than the FAKE_USE of $eax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)

Comment on lines 125 to 131
SmallDenseSet<MachineInstr *> FakeUsesToDelete;
for (MCRegUnit Unit : TRI->regunits(Reg)) {
if (!RegFakeUses.contains(Unit))
continue;
for (MachineInstr *FakeUse : RegFakeUses[Unit])
FakeUsesToDelete.insert(FakeUse);
RegFakeUses.erase(Unit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of manual liveness work. You're already walking backwards over the function and tracking liveness. You're erasing the instruction. Why not just erase the instruction, and then check at the reload point whether the def is dead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to erase any FAKE_USE here unless we know we're deleting the load which feeds into it; we only delete the FAKE_USEs here since leaving it around after the load is deleted would make it a fake use of the wrong value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in some kind of liveness query function on the instruction, not inline here. Preferably using one of the MachineInstr queries

Comment on lines 104 to 105
for (MCRegUnit Unit : TRI->regunits(FakeUseOp.getReg()))
RegFakeUses[Unit].push_back(&MI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be cheaper to just maintain a vector of the fake use instructions and check the registers as needed, rather than a DenseMap entry per regunit. The regunits should be applied in the actual liveness check s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it's fairly cheap to check whether two registers have any RegUnit overlap that should be fine, since we'll need to check each load seen against each FAKE_USE seen so far in the MBB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cheaper than maintaining many map entries, that's kind of the point

@arsenm
Copy link
Contributor

arsenm commented Oct 9, 2024

Spills for these still feels like a situation we should have avoided in the first place. Is it possible to teach foldMemoryOperand to absorb the spill reference directly into the instruction, or otherwise avoid inserting the restore in the first place?

@SLTozer
Copy link
Contributor Author

SLTozer commented Oct 9, 2024

Spills for these still feels like a situation we should have avoided in the first place. Is it possible to teach foldMemoryOperand to absorb the spill reference directly into the instruction, or otherwise avoid inserting the restore in the first place?

I'm making a first pass attempt at this; if it turns out to be simple without affecting value liveness, then that solution could replace this pass, which would simplify matters! I've not worked much in regalloc though, so if there are complications I'd prefer to land this patch as a stopgap-measure to ensure that fake uses aren't handled incorrectly (potentially causing crashes) in the meanwhile.

Comment on lines +118 to +124
// There should typically be an exact match between the loaded register
// and the FAKE_USE, but sometimes regalloc will choose to load a larger
// value than is needed. Therefore, as long as the load isn't used by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)

Comment on lines 21 to 96
--- |
define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) {
entry:
ret void
}

...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need IR section

}

...
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this down to one block?

frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8
RET64

...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative test with an intervening use of other sub registers

// should probably be deleted, but that's outside of this pass' scope.
if (RegFakeUses.contains(Reg)) {
// There should typically be an exact match between the loaded register
// and the FAKE_USE, but sometimes regalloc will choose to load a larger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use MachineInstr::readsRegister and modifiesRegister queries?

SmallDenseSet<MachineInstr *> FakeUsesToDelete;
SmallVector<MachineInstr *> RemainingFakeUses;
for (MachineInstr *&FakeUse : reverse(RegFakeUses)) {
if (TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeUse->readsRegister(Reg, TRI)

// because any such FAKE_USE encountered prior is no longer relevant
// for later encountered loads.
for (MachineInstr *&FakeUse : reverse(RegFakeUses))
if (!TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeUse->readsRegister

@SLTozer
Copy link
Contributor Author

SLTozer commented Nov 5, 2024

Ping
Edit: Ah, the simplifying down to one block (and IR removal) change is outstanding Nevermind, it has been done but git showed me an incomplete diff when I went to double check the old comments!

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used LiveRegUnits, to confirm my guesswork, it's maintaining the set of live-registers as we step backwards through the code and the available method tells us whether it's in use?

Looking good, but see inline for a potential unsupported arrangement of debug-info and fake uses.

Comment on lines +134 to +141
// Since this load only exists to restore a spilled register and we
// haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
// for this load; otherwise, deleting this would be incorrect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the nasty feeling that this is only true with instruction-referencing LiveDebugValues, where we're able to track the location of values through stack spills and restores. IIRC, in the location-based model LiveDebugVariables will issue DBG_VALUEs when values get re-loaded? (98% sure, we have to insert or not-insert DW_OP_deref in that model when such transitions occur). The failure mode in the location-based model would be setting a variable location to a register that is now no longer defined by a restore, i.e. pointing it at some garbage value.

Is there anything we can easily do about this? Only supporting fake-uses with instruction-referencing debug-info is an easy workaround with technical justification, but we should put reasonable effort into making this more general. There are non-x86 folks out there who're using fake-uses downstream who might be inconvenienced -- they're not immune from this failure-mode though, and would have the workaround of disabling this pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest way would be to disable this specific pass for non-instruction-referencing builds - it's a relatively small optimization, and not necessary for correctness. The alternative would be to track uses by DBG_VALUES to check that the loaded value isn't used by one; any preferences?

Comment on lines 11 to 12
# CHECK: DELETING: renamable $rax = MOV64rm $rbp
# CHECK: DELETING: FAKE_USE renamable $eax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid test; but there have been requests in the past that tests do not use LLVM_DEBUG outputs to check the compilers behaviour, as the (rare) situations where the debug-text-output is wrong allows regressions to slip through. I think this is testable with a more-precise checking that the relevant instructions no longer appear in the relevant block; thus we should take that approach.

(IIRC debug-output testing is acceptable when it's hard to identify/check-for properties of the program that can easily be printed during passes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positively checking the debug output is fine, but much less preferably to concrete result checks. Negative checks of debug output is dangerous since it silently stops working if the printing ever changes

@arsenm
Copy link
Contributor

arsenm commented Dec 3, 2024

I haven't used LiveRegUnits, to confirm my guesswork, it's maintaining the set of live-registers as we step backwards through the code and the available method tells us whether it's in use?

Yes

@SLTozer
Copy link
Contributor Author

SLTozer commented Dec 12, 2024

Ping

const MachineOperand &FakeUseOp = MI.getOperand(0);
// Track the Fake Uses that use these register units so that we can
// delete them if we delete the corresponding load.
if (FakeUseOp.isReg())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking isReg twice. But you can just remove the operand checks, take all the instructions, and use readRegister and completely avoid the artificial restriction that this only have one operand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There won't be any fake uses with multiple operands - we could change that in future, but we'd need changes beyond the scope of this particular patch to enable them. We could preempt their existence, but doing so would complicate the logic further down, because currently we only need to track a list of all "live" fake uses; if we assumed that fake uses could refer to multiple registers then I think we'd need an extra data structure to keep track of which specific registers are being used by each fake use (e.g. a map, or a vector of tuples) so that we'd know which fake uses to delete. It's not a huge complication, but not worth paying for if we aren't going to use it imo - WDYT?

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This soft-LGTM, with the caveat about the correct method for testing for instr-ref and (alas) needing to test for it. (Probably just wants another RUN line?).

Comment on lines 80 to 83
// Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE
// instructions of the restored values that would become invalid.
if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple()))
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe instead this should be using the "UseDebugInstrRef" field of MachineFunction -- observe the MachineFunction::shouldUseDebugInstrRef function, there are compositions of attributes and flags that mess up instruction referencing, thus it's enabled on a function-by-function basis.

Sad to say, but that'll mean we need a test too.

@jmorse
Copy link
Member

jmorse commented Jan 24, 2025

I suppose with non-x86 architectures not supporting instruction referencing, this means they'll be slower than the previous implementation uploaded. Un-necessary load elimination happens here and is disabled, instead of happening really late in code emission like the previous implementation. But this is fine -- when everyone supports instruction referencing everything will be faster!

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only
used by fake uses, as well as the fake use in question. There are two
existing errors with the pass however: it incorrectly examines every operand
of each FAKE_USE, when only the first is relevant (extra operands should
just be "killed" regs), and it ignores cases where the FAKE_USE register is
not an exact match for the loaded register, which is incorrect as regalloc
may choose to load a wider value than was required pre-regalloc.

This patch fixes both of these cases.
@SLTozer SLTozer force-pushed the fix-fake-use-load-removal branch from 66bb42e to 0d4a75b Compare January 24, 2025 18:19
Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but see the inline comment about the different "is this instruction referencing" calls.

@@ -79,7 +79,7 @@ INITIALIZE_PASS_END(RemoveLoadsIntoFakeUses, DEBUG_TYPE,
bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
// Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE
// instructions of the restored values that would become invalid.
if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple()))
if (!MF.shouldUseDebugInstrRef())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should call useDebugInstrRef, which queries the bit of state that MachineFunction carries around with it. IIRC, the "should" method is testing which mode should be used given the optimisation configuration, while the bit records the actual truth of whether instr-ref is being used or not. This allows for (semi-legitimate) situations where some MIR that uses instr-ref is loaded into a configuration where we would have chosen to not use instr-ref.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And uh, that awkwardly has an effect on the test you're writing too.

Comment on lines 2 to 3
# Ensure that loads into FAKE_USEs are correctly removed by the
# remove-loads-into-fake-uses pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... and that if we do not use instruction referencing, no modifications are made"

Copy link
Member

@jmorse jmorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jmorse
Copy link
Member

jmorse commented Jan 28, 2025

Also looks fine, cheers for making it clearer

@SLTozer SLTozer merged commit 22687aa into llvm:main Jan 28, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants