-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 Author: Stephen Tozer (SLTozer) ChangesIn the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases. Full diff: https://github.com/llvm/llvm-project/pull/111551.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
index ef7a58670c3ac5..0670e1b8871286 100644
--- a/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
+++ b/llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
@@ -94,11 +94,13 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
for (MachineInstr &MI : make_early_inc_range(reverse(*MBB))) {
if (MI.isFakeUse()) {
- for (const MachineOperand &MO : MI.operands()) {
- // Track the Fake Uses that use this register so that we can delete
- // them if we delete the corresponding load.
- if (MO.isReg())
- RegFakeUses[MO.getReg()].push_back(&MI);
+ const MachineOperand &FakeUseOp = MI.getOperand(0);
+ // Track the Fake Uses that use this register so that we can delete
+ // them if we delete the corresponding load.
+ if (FakeUseOp.isReg()) {
+ assert(FakeUseOp.getReg().isPhysical() &&
+ "VReg seen in function with NoVRegs set?");
+ RegFakeUses[FakeUseOp.getReg()].push_back(&MI);
}
// Do not record FAKE_USE uses in LivePhysRegs so that we can recognize
// otherwise-unused loads.
@@ -113,27 +115,34 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
// Don't delete live physreg defs, or any reserved register defs.
if (!LivePhysRegs.available(Reg) || MRI->isReserved(Reg))
continue;
- // There should be an exact match between the loaded register and the
- // FAKE_USE use. If not, this is a load that is unused by anything? It
- // should probably be deleted, but that's outside of this pass' scope.
- if (RegFakeUses.contains(Reg)) {
- LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
- // It is possible that some DBG_VALUE instructions refer to this
- // instruction. They will be deleted in the live debug variable
- // analysis.
- MI.eraseFromParent();
- AnyChanges = true;
- ++NumLoadsDeleted;
- // Each FAKE_USE now appears to be a fake use of the previous value
- // of the loaded register; delete them to avoid incorrectly
- // interpreting them as such.
- for (MachineInstr *FakeUse : RegFakeUses[Reg]) {
+ // There should typically be an exact match between the loaded register
+ // and the FAKE_USE, but sometimes regalloc will choose to load a larger
+ // value than is needed. Therefore, as long as the load isn't used by
+ // anything except at least one FAKE_USE, we will delete it. If it isn't
+ // used by any fake uses, it should still be safe to delete but we
+ // choose to ignore it so that this pass has no side effects unrelated
+ // to fake uses.
+ bool HasFakeUse = false;
+ for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) {
+ if (!RegFakeUses.contains(SubReg))
+ continue;
+ HasFakeUse = true;
+ for (MachineInstr *FakeUse : RegFakeUses[SubReg]) {
LLVM_DEBUG(dbgs()
<< "RemoveLoadsIntoFakeUses: DELETING: " << *FakeUse);
FakeUse->eraseFromParent();
}
- NumFakeUsesDeleted += RegFakeUses[Reg].size();
- RegFakeUses[Reg].clear();
+ NumFakeUsesDeleted += RegFakeUses[SubReg].size();
+ RegFakeUses[SubReg].clear();
+ }
+ if (HasFakeUse) {
+ LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
+ // Since this load only exists to restore a spilled register and we
+ // haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
+ // for this load; otherwise, deleting this would be incorrect.
+ MI.eraseFromParent();
+ AnyChanges = true;
+ ++NumLoadsDeleted;
}
continue;
}
@@ -147,8 +156,11 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
Register Reg = MO.getReg();
assert(Reg.isPhysical() &&
"VReg seen in function with NoVRegs set?");
- for (MCRegUnit Unit : TRI->regunits(Reg))
- RegFakeUses.erase(Unit);
+ // We clear RegFakeUses for this register and all subregisters,
+ // because any such FAKE_USE encountered prior applies only to this
+ // instruction.
+ for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg))
+ RegFakeUses.erase(SubReg);
}
}
}
diff --git a/llvm/test/CodeGen/X86/fake-use-remove-loads.mir b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
new file mode 100644
index 00000000000000..a8edbed5b28fe5
--- /dev/null
+++ b/llvm/test/CodeGen/X86/fake-use-remove-loads.mir
@@ -0,0 +1,129 @@
+# Ensure that loads into FAKE_USEs are correctly removed by the
+# remove-loads-into-fake-uses pass.
+# RUN: llc -run-pass remove-loads-into-fake-uses -mtriple=x86_64-unknown-linux -debug-only=remove-loads-into-fake-uses 2>&1 -o - %s | FileCheck %s --implicit-check-not=DELETING
+# REQUIRES: asserts
+#
+## We ensure that the load into the FAKE_USE is removed, along with the FAKE_USE
+## itself, even when the FAKE_USE is for a subregister of the move, and that we
+## correctly handle situations where FAKE_USE has additional `killed` operands
+## added by other passes.
+
+# CHECK: DELETING: FAKE_USE renamable $eax
+# CHECK: DELETING: renamable $rax = MOV64rm $rbp
+
+## Also verify that the store to the stack slot still exists.
+
+# CHECK-LABEL: bb.5:
+# CHECK: MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+# CHECK-LABEL: bb.6:
+
+
+--- |
+ define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) {
+ entry:
+ ret void
+ }
+
+...
+---
+name: _ZN1g1jEv
+alignment: 16
+tracksRegLiveness: true
+noPhis: true
+noVRegs: true
+hasFakeUses: true
+tracksDebugUserValues: true
+liveins:
+ - { reg: '$rdi', virtual-reg: '' }
+ - { reg: '$esi', virtual-reg: '' }
+ - { reg: '$rdx', virtual-reg: '' }
+frameInfo:
+ isCalleeSavedInfoValid: true
+stack:
+ - { id: 0, name: '', type: spill-slot, offset: -64, size: 8, alignment: 8,
+ stack-id: default, callee-saved-register: '', callee-saved-restored: true,
+ debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
+body: |
+ bb.0:
+ successors: %bb.2(0x80000000)
+ liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $rbx
+
+ frame-setup PUSH64r killed $rbp, implicit-def $rsp, implicit $rsp
+ frame-setup CFI_INSTRUCTION def_cfa_offset 16
+ frame-setup CFI_INSTRUCTION offset $rbp, -16
+ $rbp = frame-setup MOV64rr $rsp
+ frame-setup CFI_INSTRUCTION def_cfa_register $rbp
+ frame-setup PUSH64r killed $r15, implicit-def $rsp, implicit $rsp
+ frame-setup PUSH64r killed $r14, implicit-def $rsp, implicit $rsp
+ frame-setup PUSH64r killed $r13, implicit-def $rsp, implicit $rsp
+ frame-setup PUSH64r killed $r12, implicit-def $rsp, implicit $rsp
+ frame-setup PUSH64r killed $rbx, implicit-def $rsp, implicit $rsp
+ frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
+ CFI_INSTRUCTION offset $rbx, -56
+ CFI_INSTRUCTION offset $r12, -48
+ CFI_INSTRUCTION offset $r13, -40
+ CFI_INSTRUCTION offset $r14, -32
+ CFI_INSTRUCTION offset $r15, -24
+ $rbx = MOV64rr $rdx
+ $r14d = MOV32rr $esi
+ $r15 = MOV64rr $rdi
+ renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
+ renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+ JMP_1 %bb.2
+
+ bb.1:
+ successors: %bb.2(0x783e0f0f), %bb.6(0x07c1f0f1)
+ liveins: $rbx, $r12, $r15, $r13d, $r14d
+
+ renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
+ FAKE_USE renamable $eax, implicit killed $rax
+ renamable $eax = MOV32ri 1, implicit-def $rax
+ TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
+ JCC_1 %bb.6, 9, implicit $eflags
+
+ bb.2:
+ successors: %bb.3(0x80000000)
+ liveins: $rax, $rbx, $r12, $r15, $r14d
+
+ MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
+ renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
+
+ bb.3:
+ successors: %bb.4(0x04000000), %bb.3(0x7c000000)
+ liveins: $eax, $rbx, $r12, $r15, $r14d
+
+ $r13d = MOV32rr killed $eax
+ $rdi = MOV64rr $r15
+ CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
+ dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg :: (volatile load (s32) from %ir.ref.tmp5)
+ renamable $eax = MOV32ri 1
+ TEST8ri renamable $r14b, 1, implicit-def $eflags
+ JCC_1 %bb.3, 4, implicit $eflags
+
+ bb.4:
+ successors: %bb.5(0x40000000), %bb.1(0x40000000)
+ liveins: $eflags, $rbx, $r12, $r15, $r13d, $r14d
+
+ JCC_1 %bb.1, 4, implicit $eflags
+
+ bb.5:
+ successors: %bb.1(0x80000000)
+ liveins: $rbx, $r12, $r15, $r13d, $r14d
+
+ renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
+ MOV64mi32 $rbp, 1, $noreg, -48, $noreg, 0 :: (store (s64) into %stack.0)
+ CALL64r killed renamable $rax, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def dead $eax
+ JMP_1 %bb.1
+
+ bb.6:
+ $rsp = frame-destroy ADD64ri32 $rsp, 8, implicit-def dead $eflags
+ $rbx = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ $r12 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ $r13 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ $r14 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ $r15 = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp
+ frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8
+ RET64
+
+...
|
const MachineOperand &FakeUseOp = MI.getOperand(0); | ||
// Track the Fake Uses that use this register so that we can delete | ||
// them if we delete the corresponding load. | ||
if (FakeUseOp.isReg()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it even valid for this to be a non-register operand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - although there's no real meaning to non-register fake uses, at some points (that I don't recall off the top of my head) the operands can get replaced; it'd be equally valid to also just delete fake uses that have non-reg operands (and at any other point where we're examining fake uses).
assert(FakeUseOp.getReg().isPhysical() && | ||
"VReg seen in function with NoVRegs set?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bother with the assertion, the verifier should catch it based on the properties. Probably should relax this anyway, for wasm
// choose to ignore it so that this pass has no side effects unrelated | ||
// to fake uses. | ||
bool HasFakeUse = false; | ||
for (MCPhysReg SubReg : TRI->subregs_inclusive(Reg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to consider sub registers. Do everything in terms of register units, and liveness queries on the instruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the patch to use regunits - it requires adding an entry for each register unit used by each FAKE_USE in the map, and an extra SmallDenseSet, but I imagine it's more efficient this way than iterating over all subregisters.
// There should typically be an exact match between the loaded register | ||
// and the FAKE_USE, but sometimes regalloc will choose to load a larger | ||
// value than is needed. Therefore, as long as the load isn't used by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it's just an optimization bug. Do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can post a full file if you want, but to summarize with snippets:
Pre-regalloc we have:
bb.5.for.end16:
%2:gr32 = COPY killed %19:gr32
FAKE_USE killed %2:gr32
...
After some merging, including %2 merging with %19, we get:
544B bb.5.for.end16:
576B FAKE_USE %20.sub_32bit:gr64_with_sub_8bit
Then, after the following update:
selectOrSplit GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02 w=5.118239e-02
RS_Spill Cascade 0
Inline spilling GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02
From original %14
cannot remat for 576e FAKE_USE %20.sub_32bit:gr64_with_sub_8bit
Merged spilled regs: SS#0 [120r,432B:0)[472r,576r:0) 0@x weight:0.000000e+00
spillAroundUses %20
merged orig valno 1: SS#0 [112B,432B:0)[472r,576r:0) 0@x weight:0.000000e+00
Checking redundant spills for 1@112B in %21 [96r,112B:0)[112B,120r:1)[592r,672B:2) 0@96r 1@112B-phi 2@592r weight:1.348007e-01
Merged to stack int: SS#0 [112B,432B:0)[472r,576r:0) 0@x weight:0.000000e+00
Checking redundant spills for 0@120r in %20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi weight:5.118239e-02
hoisted: 112B MOV64mr %stack.0, 1, $noreg, 0, $noreg, %21:gr64_with_sub_8bit :: (store (s64) into %stack.0)
folded: 472r MOV64mi32 %stack.0, 1, $noreg, 0, $noreg, 0 :: (store (s64) into %stack.0)
reload: 552r %23:gr64_with_sub_8bit = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
rewrite: 576r FAKE_USE killed %23.sub_32bit:gr64_with_sub_8bit
We end up with:
544B bb.5.for.end16:
552B %23:gr64 = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
576B FAKE_USE %23.sub_32bit:gr64
Which becomes:
bb.5.for.end16:
renamable $rax = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
FAKE_USE renamable $eax, implicit killed $rax
Where $rax
is entirely unused after the MOV other than the FAKE_USE
of $eax
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)
SmallDenseSet<MachineInstr *> FakeUsesToDelete; | ||
for (MCRegUnit Unit : TRI->regunits(Reg)) { | ||
if (!RegFakeUses.contains(Unit)) | ||
continue; | ||
for (MachineInstr *FakeUse : RegFakeUses[Unit]) | ||
FakeUsesToDelete.insert(FakeUse); | ||
RegFakeUses.erase(Unit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of manual liveness work. You're already walking backwards over the function and tracking liveness. You're erasing the instruction. Why not just erase the instruction, and then check at the reload point whether the def is dead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to erase any FAKE_USE here unless we know we're deleting the load which feeds into it; we only delete the FAKE_USEs here since leaving it around after the load is deleted would make it a fake use of the wrong value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be in some kind of liveness query function on the instruction, not inline here. Preferably using one of the MachineInstr queries
for (MCRegUnit Unit : TRI->regunits(FakeUseOp.getReg())) | ||
RegFakeUses[Unit].push_back(&MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would probably be cheaper to just maintain a vector of the fake use instructions and check the registers as needed, rather than a DenseMap entry per regunit. The regunits should be applied in the actual liveness check s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as it's fairly cheap to check whether two registers have any RegUnit overlap that should be fine, since we'll need to check each load seen against each FAKE_USE seen so far in the MBB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cheaper than maintaining many map entries, that's kind of the point
Spills for these still feels like a situation we should have avoided in the first place. Is it possible to teach foldMemoryOperand to absorb the spill reference directly into the instruction, or otherwise avoid inserting the restore in the first place? |
I'm making a first pass attempt at this; if it turns out to be simple without affecting value liveness, then that solution could replace this pass, which would simplify matters! I've not worked much in regalloc though, so if there are complications I'd prefer to land this patch as a stopgap-measure to ensure that fake uses aren't handled incorrectly (potentially causing crashes) in the meanwhile. |
// There should typically be an exact match between the loaded register | ||
// and the FAKE_USE, but sometimes regalloc will choose to load a larger | ||
// value than is needed. Therefore, as long as the load isn't used by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)
--- | | ||
define void @_ZN1g1jEv(ptr %this, i1 %cmp36, ptr %ref.tmp5) { | ||
entry: | ||
ret void | ||
} | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need IR section
} | ||
|
||
... | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this down to one block?
frame-destroy CFI_INSTRUCTION def_cfa $rsp, 8 | ||
RET64 | ||
|
||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative test with an intervening use of other sub registers
// should probably be deleted, but that's outside of this pass' scope. | ||
if (RegFakeUses.contains(Reg)) { | ||
// There should typically be an exact match between the loaded register | ||
// and the FAKE_USE, but sometimes regalloc will choose to load a larger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use MachineInstr::readsRegister and modifiesRegister queries?
SmallDenseSet<MachineInstr *> FakeUsesToDelete; | ||
SmallVector<MachineInstr *> RemainingFakeUses; | ||
for (MachineInstr *&FakeUse : reverse(RegFakeUses)) { | ||
if (TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FakeUse->readsRegister(Reg, TRI)
// because any such FAKE_USE encountered prior is no longer relevant | ||
// for later encountered loads. | ||
for (MachineInstr *&FakeUse : reverse(RegFakeUses)) | ||
if (!TRI->regsOverlap(Reg, FakeUse->getOperand(0).getReg())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FakeUse->readsRegister
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used LiveRegUnits
, to confirm my guesswork, it's maintaining the set of live-registers as we step backwards through the code and the available
method tells us whether it's in use?
Looking good, but see inline for a potential unsupported arrangement of debug-info and fake uses.
// Since this load only exists to restore a spilled register and we | ||
// haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs | ||
// for this load; otherwise, deleting this would be incorrect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the nasty feeling that this is only true with instruction-referencing LiveDebugValues, where we're able to track the location of values through stack spills and restores. IIRC, in the location-based model LiveDebugVariables will issue DBG_VALUEs when values get re-loaded? (98% sure, we have to insert or not-insert DW_OP_deref in that model when such transitions occur). The failure mode in the location-based model would be setting a variable location to a register that is now no longer defined by a restore, i.e. pointing it at some garbage value.
Is there anything we can easily do about this? Only supporting fake-uses with instruction-referencing debug-info is an easy workaround with technical justification, but we should put reasonable effort into making this more general. There are non-x86 folks out there who're using fake-uses downstream who might be inconvenienced -- they're not immune from this failure-mode though, and would have the workaround of disabling this pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the easiest way would be to disable this specific pass for non-instruction-referencing builds - it's a relatively small optimization, and not necessary for correctness. The alternative would be to track uses by DBG_VALUES to check that the loaded value isn't used by one; any preferences?
# CHECK: DELETING: renamable $rax = MOV64rm $rbp | ||
# CHECK: DELETING: FAKE_USE renamable $eax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a solid test; but there have been requests in the past that tests do not use LLVM_DEBUG outputs to check the compilers behaviour, as the (rare) situations where the debug-text-output is wrong allows regressions to slip through. I think this is testable with a more-precise checking that the relevant instructions no longer appear in the relevant block; thus we should take that approach.
(IIRC debug-output testing is acceptable when it's hard to identify/check-for properties of the program that can easily be printed during passes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Positively checking the debug output is fine, but much less preferably to concrete result checks. Negative checks of debug output is dangerous since it silently stops working if the printing ever changes
Yes |
Ping |
const MachineOperand &FakeUseOp = MI.getOperand(0); | ||
// Track the Fake Uses that use these register units so that we can | ||
// delete them if we delete the corresponding load. | ||
if (FakeUseOp.isReg()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking isReg twice. But you can just remove the operand checks, take all the instructions, and use readRegister and completely avoid the artificial restriction that this only have one operand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There won't be any fake uses with multiple operands - we could change that in future, but we'd need changes beyond the scope of this particular patch to enable them. We could preempt their existence, but doing so would complicate the logic further down, because currently we only need to track a list of all "live" fake uses; if we assumed that fake uses could refer to multiple registers then I think we'd need an extra data structure to keep track of which specific registers are being used by each fake use (e.g. a map, or a vector of tuples) so that we'd know which fake uses to delete. It's not a huge complication, but not worth paying for if we aren't going to use it imo - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This soft-LGTM, with the caveat about the correct method for testing for instr-ref and (alas) needing to test for it. (Probably just wants another RUN line?).
// Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE | ||
// instructions of the restored values that would become invalid. | ||
if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple())) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe instead this should be using the "UseDebugInstrRef" field of MachineFunction -- observe the MachineFunction::shouldUseDebugInstrRef
function, there are compositions of attributes and flags that mess up instruction referencing, thus it's enabled on a function-by-function basis.
Sad to say, but that'll mean we need a test too.
I suppose with non-x86 architectures not supporting instruction referencing, this means they'll be slower than the previous implementation uploaded. Un-necessary load elimination happens here and is disabled, instead of happening really late in code emission like the previous implementation. But this is fine -- when everyone supports instruction referencing everything will be faster! |
In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands should just be "killed" regs), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than was required pre-regalloc. This patch fixes both of these cases.
66bb42e
to
0d4a75b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but see the inline comment about the different "is this instruction referencing" calls.
@@ -79,7 +79,7 @@ INITIALIZE_PASS_END(RemoveLoadsIntoFakeUses, DEBUG_TYPE, | |||
bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) { | |||
// Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE | |||
// instructions of the restored values that would become invalid. | |||
if (debuginfoShouldUseDebugInstrRef(MF.getTarget().getTargetTriple())) | |||
if (!MF.shouldUseDebugInstrRef()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should call useDebugInstrRef
, which queries the bit of state that MachineFunction carries around with it. IIRC, the "should" method is testing which mode should be used given the optimisation configuration, while the bit records the actual truth of whether instr-ref is being used or not. This allows for (semi-legitimate) situations where some MIR that uses instr-ref is loaded into a configuration where we would have chosen to not use instr-ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And uh, that awkwardly has an effect on the test you're writing too.
# Ensure that loads into FAKE_USEs are correctly removed by the | ||
# remove-loads-into-fake-uses pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... and that if we do not use instruction referencing, no modifications are made"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Also looks fine, cheers for making it clearer |
In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.