[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551

SLTozer · 2024-10-08T15:52:43Z

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.

llvmbot · 2024-10-08T15:53:20Z

@llvm/pr-subscribers-backend-x86

Author: Stephen Tozer (SLTozer)

Changes

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.

Full diff: https://github.com/llvm/llvm-project/pull/111551.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp (+36-24)
(added) llvm/test/CodeGen/X86/fake-use-remove-loads.mir (+129)

diff --git a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
index ef7a58670c3ac5..0670e1b8871286 100644
--- a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
+++ b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
@@ -94,11 +94,13 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
 
     for (MachineInstr &MI : make_early_inc_range(reverse(*MBB))) {
       if (MI.isFakeUse()) {
-        for (const MachineOperand &MO : MI.operands()) {
-          // Track the Fake Uses that use this register so that we can delete
-          // them if we delete the corresponding load.
-          if (MO.isReg())
-            RegFakeUses[MO.getReg()].push_back(&MI);
+        const MachineOperand &FakeUseOp = MI.getOperand(0);
+        // Track the Fake Uses that use this register so that we can delete
+        // them if we delete the corresponding load.
+        if (FakeUseOp.isReg()) {
+          assert(FakeUseOp.getReg().isPhysical() &&
+                 "VReg seen in function with NoVRegs set?");
+          RegFakeUses[FakeUseOp.getReg()].push_back(&MI);
         }
         // Do not record FAKE_USE uses in LivePhysRegs so that we can recognize
         // otherwise-unused loads.
@@ -113,27 +115,34 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
         // Don't delete live physreg defs, or any reserved register defs.
         if (!LivePhysRegs.available(Reg) || MRI->isReserved(Reg))
           continue;
-        // There should be an exact match between the loaded register and the
-        // FAKE_USE use. If not, this is a load that is unused by anything? It
-        // should probably be deleted, but that's outside of this pass' scope.
-        if (RegFakeUses.contains(Reg)) {
-          LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
-          // It is possible that some DBG_VALUE instructions refer to this
-          // instruction. They will be deleted in the live debug variable
-          // analysis.
-          MI.eraseFromParent();
-          AnyChanges = true;
-          ++NumLoadsDeleted;
-          // Each FAKE_USE now appears to be a fake use of the previous value
-          // of the loaded register; delete them to avoid incorrectly
-          // interpreting them as such.
-          for (MachineInstr *FakeUse : RegFakeUses[Reg]) {
+        // There should typically be an exact match between the loaded register
+        // and the FAKE_USE, but sometimes regalloc will choose to load a larger
+        // value than is needed. Therefore, as long as the load isn't used by
+        // anything except at least one FAKE_USE, we will delete it. If it isn't
+        // used by any fake uses, it should still be safe to delete but we
+        // choose to ignore it so that this pass has no side effects unrelated
+        // to fake uses.
+        bool HasFakeUse = false;
+        for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) {
+          if (!RegFakeUses.contains(SubReg))
+            continue;
+          HasFakeUse = true;
+          for (MachineInstr *FakeUse : RegFakeUses[SubReg]) {
             LLVM_DEBUG(dbgs()
                        << "RemoveLoadsIntoFakeUses: DELETING: " << *FakeUse);
             FakeUse->eraseFromParent();
           }
-          NumFakeUsesDeleted += RegFakeUses[Reg].size();
-          RegFakeUses[Reg].clear();
+          NumFakeUsesDeleted += RegFakeUses[SubReg].size();
+          RegFakeUses[SubReg].clear();
+        }
+        if (HasFakeUse) {
+          LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
+          // Since this load only exists to restore a spilled register and we
+          // haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
+          // for this load; otherwise, deleting this would be incorrect.
+          MI.eraseFromParent();
+          AnyChanges = true;
+          ++NumLoadsDeleted;
         }
         continue;
       }
@@ -147,8 +156,11 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
             Register Reg = MO.getReg();
             assert(Reg.isPhysical() &&
                    "VReg seen in function with NoVRegs set?");
-            for (MCRegUnit Unit : TRI->regunits(Reg))
-              RegFakeUses.erase(Unit);
+            // We clear RegFakeUses for this register and all subregisters,
+            // because any such FAKE_USE encountered prior applies only to this
+            // instruction.
+            for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg))
+              RegFakeUses.erase(SubReg);
           }
         }
       }
diff --git a/llvm/test/CodeGen/X86/fake-use-remove-loads.mir b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
new file mode 100644
index 00000000000000..a8edbed5b28fe5
--- /dev/null
+++ b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
@@ -0,0 +1,129 @@
+# Ensure that loads into FAKE_USEs are correctly removed by the
+# remove-loads-into-fake-uses pass.
+# RUN: llc -run-pass remove-loads-into-fake-uses -mtriple=x86_64-unknown-linux -debug-only=remove-loads-into-fake-uses 2>&1 -o - %s | FileCheck %s --implicit-check-not=DELETING
+# REQUIRES: asserts
+#
+## We ensure that the load into the FAKE_USE is removed, along with the FAKE_USE
+## itself, even when the FAKE_USE is for a subregister of the move, and that we
+## correctly handle situations where FAKE_USE has additional `killed` operands
+## added by other passes.
+
+# CHECK: DELETING: FAKE_USE renamable $eax
+# CHECK: DELETING: renamable $rax = MOV64rm $rbp
+
+## Also verify that the store to the stack slot still exists.
+
+# CHECK-LABEL: bb.5:
+# CHECK: MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+# CHECK-LABEL: bb.6:
+
+
+--- |
+  define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) {
+  entry:
+    ret void
+  }
+
+...
+---
+name:            _ZN1g1jEv
+alignment:       16
+tracksRegLiveness: true
+noPhis:          true
+noVRegs:         true
+hasFakeUses:     true
+tracksDebugUserValues: true
+liveins:
+  - { reg: '$rdi', virtual-reg: '' }
+  - { reg: '$esi', virtual-reg: '' }
+  - { reg: '$rdx', virtual-reg: '' }
+frameInfo:
+  isCalleeSavedInfoValid: true
+stack:
+  - { id: 0, name: '', type: spill-slot, offset: -64, size: 8, alignment: 8, 
+      stack-id: default, callee-saved-register: '', callee-saved-restored: true, 
+      debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+body:             |
+  bb.0:
+    successors: %bb.2(0x80000000)
+    liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $rbx
+  
+    frame-setup PUSH64r killed $rbp, implicit-def $rsp, implicit $rsp
+    frame-setup CFI_INSTRUCTION def_cfa_offset 16
+    frame-setup CFI_INSTRUCTION offset $rbp, -16
+    $rbp = frame-setup MOV64rr $rsp
+    frame-setup CFI_INSTRUCTION def_cfa_register $rbp
+    frame-setup PUSH64r killed $r15, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r14, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r13, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $r12, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r killed $rbx, implicit-def $rsp, implicit $rsp
+    frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
+    CFI_INSTRUCTION offset $rbx, -56
+    CFI_INSTRUCTION offset $r12, -48
+    CFI_INSTRUCTION offset $r13, -40
+    CFI_INSTRUCTION offset $r14, -32
+    CFI_INSTRUCTION offset $r15, -24
+    $rbx = MOV64rr $rdx
+    $r14d = MOV32rr $esi
+    $r15 = MOV64rr $rdi
+    renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+    JMP_1 %bb.2
+  
+  bb.1:
+    successors: %bb.2(0x783e0f0f), %bb.6(0x07c1f0f1)
+    liveins: $rbx, $r12, $r15, $r13d, $r14d
+  
+    renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
+    FAKE_USE renamable $eax, implicit killed $rax
+    renamable $eax = MOV32ri 1, implicit-def $rax
+    TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
+    JCC_1 %bb.6, 9, implicit $eflags
+  
+  bb.2:
+    successors: %bb.3(0x80000000)
+    liveins: $rax, $rbx, $r12, $r15, $r14d
+  
+    MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
+  
+  bb.3:
+    successors: %bb.4(0x04000000), %bb.3(0x7c000000)
+    liveins: $eax, $rbx, $r12, $r15, $r14d
+  
+    $r13d = MOV32rr killed $eax
+    $rdi = MOV64rr $r15
+    CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
+    dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg :: (volatile load (s32) from %ir.ref.tmp5)
+    renamable $eax = MOV32ri 1
+    TEST8ri renamable $r14b, 1, implicit-def $eflags
+    JCC_1 %bb.3, 4, implicit $eflags
+  
+  bb.4:
+    successors: %bb.5(0x40000000), %bb.1(0x40000000)
+    liveins: $eflags, $rbx, $r12, $r15, $r13d, $r14d
+  
+    JCC_1 %bb.1, 4, implicit $eflags
+  
+  bb.5:
+    successors: %bb.1(0x80000000)
+    liveins: $rbx, $r12, $r15, $r13d, $r14d
+  
+    renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+    MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+    CALL64r killed renamable $rax, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def dead $eax
+    JMP_1 %bb.1
+  
+  bb.6:
+    $rsp = frame-destroy ADD64ri32 $rsp, 8, implicit-def dead $eflags
+    $rbx = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r12 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r13 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r14 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $r15 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+    frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8
+    RET64
+
+...

arsenm · 2024-10-08T15:54:41Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        const MachineOperand &FakeUseOp = MI.getOperand(0);
+        // Track the Fake Uses that use this register so that we can delete
+        // them if we delete the corresponding load.
+        if (FakeUseOp.isReg()) {


Is it even valid for this to be a non-register operand

Yes - although there's no real meaning to non-register fake uses, at some points (that I don't recall off the top of my head) the operands can get replaced; it'd be equally valid to also just delete fake uses that have non-reg operands (and at any other point where we're examining fake uses).

arsenm · 2024-10-08T15:55:47Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+          assert(FakeUseOp.getReg().isPhysical() &&
+                 "VReg seen in function with NoVRegs set?");


Don't bother with the assertion, the verifier should catch it based on the properties. Probably should relax this anyway, for wasm

arsenm · 2024-10-08T16:03:43Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        // choose to ignore it so that this pass has no side effects unrelated
+        // to fake uses.
+        bool HasFakeUse = false;
+        for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) {


You shouldn't need to consider sub registers. Do everything in terms of register units, and liveness queries on the instruction

Updated the patch to use regunits - it requires adding an entry for each register unit used by each FAKE_USE in the map, and an extra SmallDenseSet, but I imagine it's more efficient this way than iterating over all subregisters.

arsenm · 2024-10-08T19:28:14Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        // There should typically be an exact match between the loaded register
+        // and the FAKE_USE, but sometimes regalloc will choose to load a larger
+        // value than is needed. Therefore, as long as the load isn't used by


This sounds like it's just an optimization bug. Do you have an example?

I can post a full file if you want, but to summarize with snippets:
Pre-regalloc we have:

bb.5.for.end16: %2:gr32 = COPY killed %19:gr32 FAKE_USE killed %2:gr32 ...

After some merging, including %2 merging with %19, we get:

544B bb.5.for.end16: 576B FAKE_USE %20.sub_32bit:gr64_with_sub_8bit

Then, after the following update:

selectOrSplit GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02 w=5.118239e-02 RS_Spill Cascade 0 Inline spilling GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02 From original %14 cannot remat for 576e FAKE_USE %20.sub_32bit:gr64_with_sub_8bit Merged spilled regs: SS#0 [120r,432B:0)[472r,576r:0) 0@x weight:0.000000e+00 spillAroundUses %20 merged orig valno 1: SS#0 [112B,432B:0)[472r,576r:0) 0@x weight:0.000000e+00 Checking redundant spills for 1@112B in %21 [96r,112B:0)[112B,120r:1)[592r,672B:2) 0@96r 1@112B-phi 2@592r weight:1.348007e-01 Merged to stack int: SS#0 [112B,432B:0)[472r,576r:0) 0@x weight:0.000000e+00 Checking redundant spills for 0@120r in %20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02 hoisted: 112B MOV64mr %stack.0, 1, $noreg, 0, $noreg, %21:gr64_with_sub_8bit :: (store (s64) into %stack.0) folded: 472r MOV64mi32 %stack.0, 1, $noreg, 0, $noreg, 0 :: (store (s64) into %stack.0) reload: 552r %23:gr64_with_sub_8bit = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0) rewrite: 576r FAKE_USE killed %23.sub_32bit:gr64_with_sub_8bit

We end up with:

544B bb.5.for.end16: 552B %23:gr64 = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0) 576B FAKE_USE %23.sub_32bit:gr64

Which becomes:

bb.5.for.end16: renamable $rax = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0) FAKE_USE renamable $eax, implicit killed $rax

Where $rax is entirely unused after the MOV other than the FAKE_USE of $eax.

Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)

arsenm · 2024-10-08T19:29:41Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        SmallDenseSet<MachineInstr *> FakeUsesToDelete;
+        for (MCRegUnit Unit : TRI->regunits(Reg)) {
+          if (!RegFakeUses.contains(Unit))
+            continue;
+          for (MachineInstr *FakeUse : RegFakeUses[Unit])
+            FakeUsesToDelete.insert(FakeUse);
+          RegFakeUses.erase(Unit);


This is a lot of manual liveness work. You're already walking backwards over the function and tracking liveness. You're erasing the instruction. Why not just erase the instruction, and then check at the reload point whether the def is dead?

We don't want to erase any FAKE_USE here unless we know we're deleting the load which feeds into it; we only delete the FAKE_USEs here since leaving it around after the load is deleted would make it a fake use of the wrong value.

This needs to be in some kind of liveness query function on the instruction, not inline here. Preferably using one of the MachineInstr queries

arsenm · 2024-10-08T19:30:13Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+          for (MCRegUnit Unit : TRI->regunits(FakeUseOp.getReg()))
+            RegFakeUses[Unit].push_back(&MI);


Would probably be cheaper to just maintain a vector of the fake use instructions and check the registers as needed, rather than a DenseMap entry per regunit. The regunits should be applied in the actual liveness check s

As long as it's fairly cheap to check whether two registers have any RegUnit overlap that should be fine, since we'll need to check each load seen against each FAKE_USE seen so far in the MBB.

It's cheaper than maintaining many map entries, that's kind of the point

arsenm · 2024-10-09T07:15:22Z

Spills for these still feels like a situation we should have avoided in the first place. Is it possible to teach foldMemoryOperand to absorb the spill reference directly into the instruction, or otherwise avoid inserting the restore in the first place?

SLTozer · 2024-10-09T12:20:40Z

Spills for these still feels like a situation we should have avoided in the first place. Is it possible to teach foldMemoryOperand to absorb the spill reference directly into the instruction, or otherwise avoid inserting the restore in the first place?

I'm making a first pass attempt at this; if it turns out to be simple without affecting value liveness, then that solution could replace this pass, which would simplify matters! I've not worked much in regalloc though, so if there are complications I'd prefer to land this patch as a stopgap-measure to ensure that fake uses aren't handled incorrectly (potentially causing crashes) in the meanwhile.

arsenm · 2024-10-11T14:15:40Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        // There should typically be an exact match between the loaded register
+        // and the FAKE_USE, but sometimes regalloc will choose to load a larger
+        // value than is needed. Therefore, as long as the load isn't used by


Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)

arsenm · 2024-10-11T14:22:38Z

llvm/test/CodeGen/X86/fake-use-remove-loads.mir

+--- |
+  define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) {
+  entry:
+    ret void
+  }
+
+...


Don't need IR section

arsenm · 2024-10-11T14:22:58Z

llvm/test/CodeGen/X86/fake-use-remove-loads.mir

+  }
+
+...
+---


Simplify this down to one block?

arsenm · 2024-10-11T14:23:15Z

llvm/test/CodeGen/X86/fake-use-remove-loads.mir

+    frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8
+    RET64
+
+...


Negative test with an intervening use of other sub registers

arsenm · 2024-10-11T14:23:54Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

-        // should probably be deleted, but that's outside of this pass' scope.
-        if (RegFakeUses.contains(Reg)) {
+        // There should typically be an exact match between the loaded register
+        // and the FAKE_USE, but sometimes regalloc will choose to load a larger


Try to use MachineInstr::readsRegister and modifiesRegister queries?

arsenm · 2024-10-15T19:11:23Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        SmallDenseSet<MachineInstr *> FakeUsesToDelete;
+        SmallVector<MachineInstr *> RemainingFakeUses;
+        for (MachineInstr *&FakeUse : reverse(RegFakeUses)) {
+          if (TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg())) {


FakeUse->readsRegister(Reg, TRI)

arsenm · 2024-10-15T19:11:43Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+          // because any such FAKE_USE encountered prior is no longer relevant
+          // for later encountered loads.
+          for (MachineInstr *&FakeUse : reverse(RegFakeUses))
+            if (!TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg()))


FakeUse->readsRegister

SLTozer · 2024-11-05T18:16:18Z

Ping
~~Edit: Ah, the simplifying down to one block (and IR removal) change is outstanding~~ Nevermind, it has been done but git showed me an incomplete diff when I went to double check the old comments!

jmorse

I haven't used LiveRegUnits, to confirm my guesswork, it's maintaining the set of live-registers as we step backwards through the code and the available method tells us whether it's in use?

Looking good, but see inline for a potential unsupported arrangement of debug-info and fake uses.

jmorse · 2024-12-02T12:32:26Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+          // Since this load only exists to restore a spilled register and we
+          // haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
+          // for this load; otherwise, deleting this would be incorrect.


I have the nasty feeling that this is only true with instruction-referencing LiveDebugValues, where we're able to track the location of values through stack spills and restores. IIRC, in the location-based model LiveDebugVariables will issue DBG_VALUEs when values get re-loaded? (98% sure, we have to insert or not-insert DW_OP_deref in that model when such transitions occur). The failure mode in the location-based model would be setting a variable location to a register that is now no longer defined by a restore, i.e. pointing it at some garbage value.

Is there anything we can easily do about this? Only supporting fake-uses with instruction-referencing debug-info is an easy workaround with technical justification, but we should put reasonable effort into making this more general. There are non-x86 folks out there who're using fake-uses downstream who might be inconvenienced -- they're not immune from this failure-mode though, and would have the workaround of disabling this pass.

I think the easiest way would be to disable this specific pass for non-instruction-referencing builds - it's a relatively small optimization, and not necessary for correctness. The alternative would be to track uses by DBG_VALUES to check that the loaded value isn't used by one; any preferences?

jmorse · 2024-12-02T12:47:29Z

llvm/test/CodeGen/X86/fake-use-remove-loads.mir

+# CHECK: DELETING: renamable $rax = MOV64rm $rbp
+# CHECK: DELETING: FAKE_USE renamable $eax


This is a solid test; but there have been requests in the past that tests do not use LLVM_DEBUG outputs to check the compilers behaviour, as the (rare) situations where the debug-text-output is wrong allows regressions to slip through. I think this is testable with a more-precise checking that the relevant instructions no longer appear in the relevant block; thus we should take that approach.

(IIRC debug-output testing is acceptable when it's hard to identify/check-for properties of the program that can easily be printed during passes).

Positively checking the debug output is fine, but much less preferably to concrete result checks. Negative checks of debug output is dangerous since it silently stops working if the printing ever changes

arsenm · 2024-12-03T17:38:18Z

I haven't used LiveRegUnits, to confirm my guesswork, it's maintaining the set of live-registers as we step backwards through the code and the available method tells us whether it's in use?

Yes

SLTozer · 2024-12-12T12:35:49Z

Ping

arsenm · 2024-12-13T04:07:09Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+        const MachineOperand &FakeUseOp = MI.getOperand(0);
+        // Track the Fake Uses that use these register units so that we can
+        // delete them if we delete the corresponding load.
+        if (FakeUseOp.isReg())


This is checking isReg twice. But you can just remove the operand checks, take all the instructions, and use readRegister and completely avoid the artificial restriction that this only have one operand

There won't be any fake uses with multiple operands - we could change that in future, but we'd need changes beyond the scope of this particular patch to enable them. We could preempt their existence, but doing so would complicate the logic further down, because currently we only need to track a list of all "live" fake uses; if we assumed that fake uses could refer to multiple registers then I think we'd need an extra data structure to keep track of which specific registers are being used by each fake use (e.g. a map, or a vector of tuples) so that we'd know which fake uses to delete. It's not a huge complication, but not worth paying for if we aren't going to use it imo - WDYT?

jmorse

This soft-LGTM, with the caveat about the correct method for testing for instr-ref and (alas) needing to test for it. (Probably just wants another RUN line?).

jmorse · 2025-01-24T16:40:41Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

+  // Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE
+  // instructions of the restored values that would become invalid.
+  if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple()))
+    return false;


I believe instead this should be using the "UseDebugInstrRef" field of MachineFunction -- observe the MachineFunction::shouldUseDebugInstrRef function, there are compositions of attributes and flags that mess up instruction referencing, thus it's enabled on a function-by-function basis.

Sad to say, but that'll mean we need a test too.

jmorse · 2025-01-24T17:02:38Z

I suppose with non-x86 architectures not supporting instruction referencing, this means they'll be slower than the previous implementation uploaded. Un-necessary load elimination happens here and is disabled, instead of happening really late in code emission like the previous implementation. But this is fine -- when everyone supports instruction referencing everything will be faster!

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands should just be "killed" regs), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than was required pre-regalloc. This patch fixes both of these cases.

jmorse

Looks good but see the inline comment about the different "is this instruction referencing" calls.

jmorse · 2025-01-28T11:54:12Z

llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp

@@ -79,7 +79,7 @@ INITIALIZE_PASS_END(RemoveLoadsIntoFakeUses, DEBUG_TYPE,
 bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
  // Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE
  // instructions of the restored values that would become invalid.
-  if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple()))
+  if (!MF.shouldUseDebugInstrRef())


I think this should call useDebugInstrRef, which queries the bit of state that MachineFunction carries around with it. IIRC, the "should" method is testing which mode should be used given the optimisation configuration, while the bit records the actual truth of whether instr-ref is being used or not. This allows for (semi-legitimate) situations where some MIR that uses instr-ref is loaded into a configuration where we would have chosen to not use instr-ref.

And uh, that awkwardly has an effect on the test you're writing too.

jmorse · 2025-01-28T11:55:27Z

llvm/test/CodeGen/X86/fake-use-remove-loads.mir

 # Ensure that loads into FAKE_USEs are correctly removed by the
 # remove-loads-into-fake-uses pass.


"... and that if we do not use instruction referencing, no modifications are made"

jmorse

LGTM

jmorse · 2025-01-28T13:11:58Z

Also looks fine, cheers for making it clearer

SLTozer requested review from arsenm, jmorse, OCHyams, CarlosAlbertoEnciso and wolfy1961 October 8, 2024 15:52

SLTozer self-assigned this Oct 8, 2024

llvmbot added the backend:X86 label Oct 8, 2024

arsenm reviewed Oct 8, 2024

View reviewed changes

arsenm requested review from MatzeB, qcolombet and kparzysz October 11, 2024 14:07

arsenm reviewed Oct 11, 2024

View reviewed changes

arsenm reviewed Oct 15, 2024

View reviewed changes

jmorse reviewed Dec 2, 2024

View reviewed changes

arsenm reviewed Dec 13, 2024

View reviewed changes

jmorse reviewed Jan 24, 2025

View reviewed changes

SLTozer added 6 commits January 24, 2025 18:15

Remove assertions, handle 0-op FAKE_USES, use RegUnits

fd2c94e

Use a SmallVector of fake uses instead of a SmallDenseMap

0a8b5a6

Use 'readsRegister'

c9d1b6e

Update test to check full output

52b95c7

Turn off pass when using varloc-based LDV

6aadd23

SLTozer added 2 commits January 24, 2025 18:15

Remove unnecessary isReg check

a5fd7ee

Fix check for instr-ref enabled, update test to check disabled case

0d4a75b

SLTozer force-pushed the fix-fake-use-load-removal branch from 66bb42e to 0d4a75b Compare January 24, 2025 18:19

jmorse approved these changes Jan 28, 2025

View reviewed changes

Use instr-ref bit instead of high-level function

026fc06

jmorse approved these changes Jan 28, 2025

View reviewed changes

Update test to be more update_test_checks-friendly

92c0631

SLTozer merged commit 22687aa into llvm:main Jan 28, 2025
6 of 8 checks passed

		assert(FakeUseOp.getReg().isPhysical() &&
		"VReg seen in function with NoVRegs set?");

		for (MCRegUnit Unit : TRI->regunits(FakeUseOp.getReg()))
		RegFakeUses[Unit].push_back(&MI);

+                }
+              ...
+              ---

		# CHECK: DELETING: renamable $rax = MOV64rm $rbp
		# CHECK: DELETING: FAKE_USE renamable $eax

		# Ensure that loads into FAKE_USEs are correctly removed by the
		# remove-loads-into-fake-uses pass.

[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551

[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551

Uh oh!

Conversation

SLTozer commented Oct 8, 2024

Uh oh!

llvmbot commented Oct 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Oct 9, 2024

Uh oh!

SLTozer commented Oct 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SLTozer commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmorse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Dec 3, 2024

Uh oh!

SLTozer commented Dec 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmorse left a comment

Choose a reason for hiding this comment

SLTozer Oct 8, 2024 •

edited

Loading

SLTozer commented Nov 5, 2024 •

edited

Loading