-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[CodeGen] Simplify updateLiveIn in MachineSink #79831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGen] Simplify updateLiveIn in MachineSink #79831
Conversation
When a whole register is added a basic block's liveins, use LaneBitmask::getAll for the live lanes instead of trying to calculate an accurate mask of the lanes that comprise the register. This simplifies the code and matches other places where a whole register is marked as livein. This also avoids problems when regunits that are synthesized by TableGen to represent ad hoc aliasing have a lane mask of 0. Fixes llvm#78942
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesWhen a whole register is added a basic block's liveins, use This simplifies the code and matches other places where a whole register This also avoids problems when regunits that are synthesized by TableGen Fixes #78942 Full diff: https://github.com/llvm/llvm-project/pull/79831.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp
index e7e8f602683480..c3a1d3759882d8 100644
--- a/llvm/lib/CodeGen/MachineSink.cpp
+++ b/llvm/lib/CodeGen/MachineSink.cpp
@@ -1949,13 +1949,8 @@ static void updateLiveIn(MachineInstr *MI, MachineBasicBlock *SuccBB,
for (unsigned DefReg : DefedRegsInCopy)
for (MCPhysReg S : TRI->subregs_inclusive(DefReg))
SuccBB->removeLiveIn(S);
- for (auto U : UsedOpsInCopy) {
- Register SrcReg = MI->getOperand(U).getReg();
- LaneBitmask Mask;
- for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S)
- Mask |= (*S).second;
- SuccBB->addLiveIn(SrcReg, Mask);
- }
+ for (auto U : UsedOpsInCopy)
+ SuccBB->addLiveIn(MI->getOperand(U).getReg());
SuccBB->sortUniqueLiveIns();
}
diff --git a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
index 65648bacd55679..c1da29ecc2c2f5 100644
--- a/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
+++ b/llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
@@ -47,7 +47,7 @@ body: |
; GCN-NEXT: {{ $}}
; GCN-NEXT: bb.1:
; GCN-NEXT: successors: %bb.2(0x80000000)
- ; GCN-NEXT: liveins: $exec:0x000000000000000F, $sgpr30, $sgpr31, $vgpr0:0x0000000000000003, $vgpr1:0x0000000000000003, $vgpr2:0x0000000000000003, $vgpr3:0x0000000000000003, $vgpr4:0x0000000000000003, $vgpr5:0x0000000000000003, $vgpr6:0x0000000000000003, $vgpr7:0x0000000000000003, $vgpr8:0x0000000000000003, $vgpr9:0x0000000000000003, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F
+ ; GCN-NEXT: liveins: $exec, $sgpr30, $sgpr31, $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr40, $sgpr30_sgpr31, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr41_vgpr42:0x000000000000000F, $vgpr43_vgpr44:0x000000000000000F, $vgpr45_vgpr46:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F
; GCN-NEXT: {{ $}}
; GCN-NEXT: renamable $vgpr57 = COPY $vgpr9, implicit $exec
; GCN-NEXT: renamable $vgpr56 = COPY $vgpr8, implicit $exec
diff --git a/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll b/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
index 5a128c7541d1ec..24ef8ce1beb2db 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
@@ -808,7 +808,7 @@ define amdgpu_kernel void @f1(ptr addrspace(1) %arg, ptr addrspace(1) %arg1, i64
; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.58:
; GFX90A-NEXT: successors: %bb.7(0x80000000)
- ; GFX90A-NEXT: liveins: $exec:0x000000000000000F, $sgpr12, $sgpr13, $sgpr14, $sgpr15:0x0000000000000003, $sgpr23:0x0000000000000003, $vgpr30, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7:0x000000000000000F, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr24_sgpr25, $sgpr26_sgpr27, $sgpr28_sgpr29, $sgpr42_sgpr43, $sgpr16_sgpr17_sgpr18_sgpr19:0x00000000000000F0, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $vgpr2_vgpr3:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr18_vgpr19:0x000000000000000F, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
+ ; GFX90A-NEXT: liveins: $exec, $sgpr12, $sgpr13, $sgpr14, $sgpr15, $sgpr23, $vgpr30, $vgpr31, $sgpr4_sgpr5, $sgpr6_sgpr7:0x000000000000000F, $sgpr8_sgpr9, $sgpr10_sgpr11, $sgpr24_sgpr25, $sgpr26_sgpr27, $sgpr28_sgpr29, $sgpr42_sgpr43, $sgpr16_sgpr17_sgpr18_sgpr19:0x00000000000000F0, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $vgpr2_vgpr3:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr18_vgpr19:0x000000000000000F, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: renamable $vgpr15 = COPY killed renamable $sgpr23, implicit $exec
; GFX90A-NEXT: renamable $vgpr17 = COPY killed renamable $sgpr15, implicit $exec
diff --git a/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir b/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
index ff7a40bfde9580..86863c31753642 100644
--- a/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
+++ b/llvm/test/CodeGen/AMDGPU/postra-machine-sink.mir
@@ -5,7 +5,7 @@
# CHECK-LABEL: bb.0:
# CHECK: renamable $sgpr1 = COPY renamable $sgpr2
# CHECK-LABEL: bb.1:
-# CHECK: liveins: $sgpr0_sgpr1:0x000000000000000F
+# CHECK: liveins: $sgpr0_sgpr1
# CHECK: renamable $vgpr1_vgpr2 = COPY renamable $sgpr0_sgpr1
---
|
for (auto U : UsedOpsInCopy) { | ||
Register SrcReg = MI->getOperand(U).getReg(); | ||
LaneBitmask Mask; | ||
for (MCRegUnitMaskIterator S(SrcReg, TRI); S.isValid(); ++S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in https://reviews.llvm.org/D71132 by @alex-t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this add the test case from VE?
It would be nice to have a VE test case but the only one I've seen so far is main...kaz7:llvm-project:computeregunitlanemasks-test which is specific to the TableGen fix (#79462) - it is no use for this fix. @kaz7 |
This patch avoids the problem I mentioned. However, it won't solve the problem, in my opinion. We don't want to have 0x00000000 lanemask value for general registers, don't we? #79835 (#79462) correct such problems. Please consider to apply #79835 too, please. I think this patch itself is a good enhancement. So, I appreciate if you review both, this and #79835. I'll try to create a regression test for this patch too. |
We're talking about special regunits that are only created to represent ad hoc aliasing between two registers. What should the lanemask be for such a regunit - 0 or -1 or something else? I don't think there is a single correct answer to that question. Maybe the lanemask should be poison and we should make sure that no-one ever depends on its value? |
What you are saying is other places don't bother getting this extra detail. What does this impact?
I have mixed feeling about that. |
Yes.
I don't know for sure. I just know that it does not have much effect on the existing lit tests.
You mean the whole tablegen By the way I have yet another idea for fixing the reported problem. Currently if a leaf register has an ad hoc alias, tablegen will give it a regunit to represent the alias but it will not give it its own "leaf" regunit. My idea is that we could change that, so that leaf registers always get a unique leaf regunit, in addition to any regunits for ad hoc aliasing. This is something that I need anyway for another project: converting MRI / TRI to track reserved regunits instead of reserved registers. |
Sounds good let’s go with this patch and see if anybody complains about the removal of the lanes in the live ins. I expect that people that care about that would use the LiveIIntervals anyway.
I mean the part where they don’t actually share a regunit.
Sounds like a plan, although I’m a little worried about the impact on things like pressure sets. |
I was not aware of any uses of |
When a whole register is added a basic block's liveins, use
LaneBitmask::getAll for the live lanes instead of trying to calculate an
accurate mask of the lanes that comprise the register.
This simplifies the code and matches other places where a whole register
is marked as livein.
This also avoids problems when regunits that are synthesized by TableGen
to represent ad hoc aliasing have a lane mask of 0.
Fixes #78942