[MachineLICM] Work-around Incomplete RegUnits #95926

Pierre-vh · 2024-06-18T13:38:35Z

Reverts the behavior introduced by 770393b while keeping the refactored code.

Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU.
#96146 opened to investigate the issue

llvmbot · 2024-06-18T13:39:11Z

@llvm/pr-subscribers-backend-aarch64

Author: Pierre van Houtryve (Pierre-vh)

Changes

Reverts the behavior introduced by 770393b while keeping the refactored code.

Fixes a test failure on AArch64, at the cost of a small regression for AMDGPU which I will investigate. In the meantime, correctness prevails.

Full diff: https://github.com/llvm/llvm-project/pull/95926.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/MachineLICM.cpp (+6-8)
(added) llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir (+49)
(modified) llvm/test/CodeGen/AMDGPU/indirect-call.ll (+2-2)

diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index d81fe54fe844c..e76e7d9cd3eed 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -426,28 +426,26 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
 static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo &TRI,
                                                 BitVector &RUs,
                                                 const uint32_t *Mask) {
-  BitVector ClobberedRUs(TRI.getNumRegUnits(), true);
+  // FIXME: This is overly conservative when applying regmasks from, e.g. calls.
+  // See `test/CodeGen/AMDGPU/indirect-call.ll` regression.
+  BitVector RUsFromRegsNotInMask(TRI.getNumRegUnits());
   const unsigned NumRegs = TRI.getNumRegs();
   const unsigned MaskWords = (NumRegs + 31) / 32;
   for (unsigned K = 0; K < MaskWords; ++K) {
     const uint32_t Word = Mask[K];
-    if (!Word)
-      continue;
-
     for (unsigned Bit = 0; Bit < 32; ++Bit) {
       const unsigned PhysReg = (K * 32) + Bit;
       if (PhysReg == NumRegs)
         break;
 
-      // Check if we have a valid PhysReg that is set in the mask.
-      if ((Word >> Bit) & 1) {
+      if (PhysReg && !((Word >> Bit) & 1)) {
         for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
-          ClobberedRUs.reset(*RUI);
+          RUsFromRegsNotInMask.set(*RUI);
       }
     }
   }
 
-  RUs |= ClobberedRUs;
+  RUs |= RUsFromRegsNotInMask;
 }
 
 /// Examine the instruction for potentai LICM candidate. Also
diff --git a/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir b/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir
new file mode 100644
index 0000000000000..f6a0abfdc410b
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir
@@ -0,0 +1,49 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s
+
+# FIXME: Running RA is needed otherwise it runs pre-RA LICM.
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   B %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $q11 = MOVIv4i32 2, 8
+  ; CHECK-NEXT:   BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+  ; CHECK-NEXT:   renamable $q10 = MVNIv4i32 4, 0
+  ; CHECK-NEXT:   $xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   Bcc 11, %bb.1, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   liveins: $q10, $q11
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $q0 = COPY $q10
+  ; CHECK-NEXT:   $q1 = COPY $q11
+  bb.0:
+    liveins: $x0, $w1, $x2
+    B %bb.1
+
+  bb.1:
+    liveins: $x0, $w1, $x2
+    renamable $q11 = MOVIv4i32 2, 8
+    BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+    renamable $q10 = MVNIv4i32 4, 0
+    $xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv
+    Bcc 11, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    liveins: $q10, $q11
+    $q0 = COPY $q10
+    $q1 = COPY $q11
+...
diff --git a/llvm/test/CodeGen/AMDGPU/indirect-call.ll b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
index 7799b9509ceb0..da8aa54469835 100644
--- a/llvm/test/CodeGen/AMDGPU/indirect-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
@@ -886,12 +886,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
 ; GCN-NEXT:    v_writelane_b32 v40, s62, 30
 ; GCN-NEXT:    v_writelane_b32 v40, s63, 31
 ; GCN-NEXT:    s_mov_b64 s[6:7], exec
-; GCN-NEXT:    s_movk_i32 s4, 0x7b
 ; GCN-NEXT:  .LBB6_1: ; =>This Inner Loop Header: Depth=1
 ; GCN-NEXT:    v_readfirstlane_b32 s8, v0
 ; GCN-NEXT:    v_readfirstlane_b32 s9, v1
 ; GCN-NEXT:    v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
 ; GCN-NEXT:    s_and_saveexec_b64 s[10:11], vcc
+; GCN-NEXT:    s_movk_i32 s4, 0x7b
 ; GCN-NEXT:    s_swappc_b64 s[30:31], s[8:9]
 ; GCN-NEXT:    ; implicit-def: $vgpr0_vgpr1
 ; GCN-NEXT:    s_xor_b64 exec, exec, s[10:11]
@@ -980,12 +980,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
 ; GISEL-NEXT:    v_writelane_b32 v40, s62, 30
 ; GISEL-NEXT:    v_writelane_b32 v40, s63, 31
 ; GISEL-NEXT:    s_mov_b64 s[6:7], exec
-; GISEL-NEXT:    s_movk_i32 s4, 0x7b
 ; GISEL-NEXT:  .LBB6_1: ; =>This Inner Loop Header: Depth=1
 ; GISEL-NEXT:    v_readfirstlane_b32 s8, v0
 ; GISEL-NEXT:    v_readfirstlane_b32 s9, v1
 ; GISEL-NEXT:    v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
 ; GISEL-NEXT:    s_and_saveexec_b64 s[10:11], vcc
+; GISEL-NEXT:    s_movk_i32 s4, 0x7b
 ; GISEL-NEXT:    s_swappc_b64 s[30:31], s[8:9]
 ; GISEL-NEXT:    ; implicit-def: $vgpr0
 ; GISEL-NEXT:    s_xor_b64 exec, exec, s[10:11]

llvmbot · 2024-06-18T13:39:12Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Reverts the behavior introduced by 770393b while keeping the refactored code.

Fixes a test failure on AArch64, at the cost of a small regression for AMDGPU which I will investigate. In the meantime, correctness prevails.

Full diff: https://github.com/llvm/llvm-project/pull/95926.diff

3 Files Affected:

(modified) llvm/lib/CodeGen/MachineLICM.cpp (+6-8)
(added) llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir (+49)
(modified) llvm/test/CodeGen/AMDGPU/indirect-call.ll (+2-2)

diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index d81fe54fe844c..e76e7d9cd3eed 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -426,28 +426,26 @@ static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
 static void applyBitsNotInRegMaskToRegUnitsMask(const TargetRegisterInfo &TRI,
                                                 BitVector &RUs,
                                                 const uint32_t *Mask) {
-  BitVector ClobberedRUs(TRI.getNumRegUnits(), true);
+  // FIXME: This is overly conservative when applying regmasks from, e.g. calls.
+  // See `test/CodeGen/AMDGPU/indirect-call.ll` regression.
+  BitVector RUsFromRegsNotInMask(TRI.getNumRegUnits());
   const unsigned NumRegs = TRI.getNumRegs();
   const unsigned MaskWords = (NumRegs + 31) / 32;
   for (unsigned K = 0; K < MaskWords; ++K) {
     const uint32_t Word = Mask[K];
-    if (!Word)
-      continue;
-
     for (unsigned Bit = 0; Bit < 32; ++Bit) {
       const unsigned PhysReg = (K * 32) + Bit;
       if (PhysReg == NumRegs)
         break;
 
-      // Check if we have a valid PhysReg that is set in the mask.
-      if ((Word >> Bit) & 1) {
+      if (PhysReg && !((Word >> Bit) & 1)) {
         for (MCRegUnitIterator RUI(PhysReg, &TRI); RUI.isValid(); ++RUI)
-          ClobberedRUs.reset(*RUI);
+          RUsFromRegsNotInMask.set(*RUI);
       }
     }
   }
 
-  RUs |= ClobberedRUs;
+  RUs |= RUsFromRegsNotInMask;
 }
 
 /// Examine the instruction for potentai LICM candidate. Also
diff --git a/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir b/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir
new file mode 100644
index 0000000000000..f6a0abfdc410b
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir
@@ -0,0 +1,49 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s
+
+# FIXME: Running RA is needed otherwise it runs pre-RA LICM.
+---
+name: test
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   B %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $x0, $w1, $x2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $q11 = MOVIv4i32 2, 8
+  ; CHECK-NEXT:   BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+  ; CHECK-NEXT:   renamable $q10 = MVNIv4i32 4, 0
+  ; CHECK-NEXT:   $xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   Bcc 11, %bb.1, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   liveins: $q10, $q11
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $q0 = COPY $q10
+  ; CHECK-NEXT:   $q1 = COPY $q11
+  bb.0:
+    liveins: $x0, $w1, $x2
+    B %bb.1
+
+  bb.1:
+    liveins: $x0, $w1, $x2
+    renamable $q11 = MOVIv4i32 2, 8
+    BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+    renamable $q10 = MVNIv4i32 4, 0
+    $xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv
+    Bcc 11, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    liveins: $q10, $q11
+    $q0 = COPY $q10
+    $q1 = COPY $q11
+...
diff --git a/llvm/test/CodeGen/AMDGPU/indirect-call.ll b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
index 7799b9509ceb0..da8aa54469835 100644
--- a/llvm/test/CodeGen/AMDGPU/indirect-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/indirect-call.ll
@@ -886,12 +886,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
 ; GCN-NEXT:    v_writelane_b32 v40, s62, 30
 ; GCN-NEXT:    v_writelane_b32 v40, s63, 31
 ; GCN-NEXT:    s_mov_b64 s[6:7], exec
-; GCN-NEXT:    s_movk_i32 s4, 0x7b
 ; GCN-NEXT:  .LBB6_1: ; =>This Inner Loop Header: Depth=1
 ; GCN-NEXT:    v_readfirstlane_b32 s8, v0
 ; GCN-NEXT:    v_readfirstlane_b32 s9, v1
 ; GCN-NEXT:    v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
 ; GCN-NEXT:    s_and_saveexec_b64 s[10:11], vcc
+; GCN-NEXT:    s_movk_i32 s4, 0x7b
 ; GCN-NEXT:    s_swappc_b64 s[30:31], s[8:9]
 ; GCN-NEXT:    ; implicit-def: $vgpr0_vgpr1
 ; GCN-NEXT:    s_xor_b64 exec, exec, s[10:11]
@@ -980,12 +980,12 @@ define void @test_indirect_call_vgpr_ptr_inreg_arg(ptr %fptr) {
 ; GISEL-NEXT:    v_writelane_b32 v40, s62, 30
 ; GISEL-NEXT:    v_writelane_b32 v40, s63, 31
 ; GISEL-NEXT:    s_mov_b64 s[6:7], exec
-; GISEL-NEXT:    s_movk_i32 s4, 0x7b
 ; GISEL-NEXT:  .LBB6_1: ; =>This Inner Loop Header: Depth=1
 ; GISEL-NEXT:    v_readfirstlane_b32 s8, v0
 ; GISEL-NEXT:    v_readfirstlane_b32 s9, v1
 ; GISEL-NEXT:    v_cmp_eq_u64_e32 vcc, s[8:9], v[0:1]
 ; GISEL-NEXT:    s_and_saveexec_b64 s[10:11], vcc
+; GISEL-NEXT:    s_movk_i32 s4, 0x7b
 ; GISEL-NEXT:    s_swappc_b64 s[30:31], s[8:9]
 ; GISEL-NEXT:    ; implicit-def: $vgpr0
 ; GISEL-NEXT:    s_xor_b64 exec, exec, s[10:11]

arsenm · 2024-06-18T13:39:51Z

llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir

@@ -0,0 +1,49 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s


Don't see why this is using -debug, but that requires asserts

arsenm · 2024-06-18T13:41:52Z

llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir

+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s
+
+# FIXME: Running RA is needed otherwise it runs pre-RA LICM.


but machinelicm should be the post-rA one? early-machinelicm is pre-RA

For some reason it only triggers if I run RA first, otherwise it seems MRI->isSSA() returns true and that's what it uses to determine whether it's pre or post RA

I think you are working around the MIR parser bug where it assumes SSA until it finds something obviously not SSA. The pass needs to clear the properties and I think we need an explicit SSA property

jayfoad · 2024-06-18T14:19:51Z

llvm/test/CodeGen/AArch64/mlicm-csr-mask.mir

+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   renamable $q11 = MOVIv4i32 2, 8
+  ; CHECK-NEXT:   BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0
+  ; CHECK-NEXT:   renamable $q10 = MVNIv4i32 4, 0


What does this test actually demonstrate? Without this patch, I see that the def of q10 gets hoisted past this memset call:

BL &memset, <regmask $fp $lr $wzr $xzr $b8 $b9 $b10 $b11 $b12 $b13 $b14 $b15 $d8 $d9 $d10 $d11 $d12 $d13 $d14 $d15 $h8 $h9 $h10 $h11 $h12 $h13 $h14 $h15 $s8 $s9 $s10 $s11 $s12 $s13 $s14 $s15 $w19 $w20 $w21 $w22 $w23 $w24 $w25 $w26 $w27 $w28 $w29 $w30 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x28 $d8_d9 $d9_d10 $d10_d11 $d11_d12 $d12_d13 $d13_d14 $d14_d15 $d8_d9_d10_d11 $d9_d10_d11_d12 $d10_d11_d12_d13 $d11_d12_d 13_d14 $d12_d13_d14_d15 $d8_d9_d10 $d9_d10_d11 $d10_d11_d12 $d11_d12_d13 $d12_d13_d14 $d13_d14_d15 $x22_x23_x24_x25_x26_x27_x28_fp $x20_x21_x22 _x23_x24_x25_x26_x27 $w20_w21 $w22_w23 $w24_w25 $w26_w27 $w28_w29 $x28_fp $x20_x21 $x22_x23 $x24_x25 $x26_x27>, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0

I don't understand why this happened. q10 has subregs d20 and d21 but I don't see any of those listed in this regmask.

The D20 D21 being subregs of Q10 is an ARM thing that wasn't carried over into AArch64.

D'oh - I thought this test case was ARM! Let me take another look...

I was trying to get my head around what was going on. There is a rule in the AArch64 PCS which talks about the top bits of a Q register needing to be preserved across a call:

Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be preserved 7; it is the responsibility of the caller to preserve larger values.

OK so AArch64 registers work like this: b10 aliases the low 8 bits of h10 which aliases the low 16 bits of s10 which aliases the low 32 bits of d10 which aliases the low 64 bits of q10 which aliases the low 128 bits of z10.

My suspicion is that TableGen only creates one regunit for all of these *10 registers. If that's true then there's no way for regunit-based clobbering info to express that d10 is preserved but q10 is not (because its high bits are clobbered). So I think we would have to go back and revert #94608.

Add a flag to registers to tell TableGen that subregs cover the whole register (EDIT: we already have one)

What is that flag that we already have?

Anyway I think this kind of approach sounds reasonable, but should probably get some wider discussion. Possible alternative ideas:

Declare the width of every physical register, so that tablegen can work out by itself whether or not it is completely covered by subregs.

Change the AArch64 register definitions to declare some kind of dummy subreg for the high parts, which is only used to force tablegen to generate the extra regunits. The only advantage of this is that we might not have to modify tablegen at all.

Opened #96146 to continue the discussion

Opened #96146 to continue the discussion

Thanks. In the mean time I guess it's OK to commit the current patch to fix the AArch64 breakage, with a comment explaining that it's overly conservative (hence the AMDGPU test regression).

Hello. Sorry, all the subtleties of RegUnits are not something I know a lot about (and I may be a bit unresponsive today, sorry if I am). Perhaps @efriedma-quic or @TNorthover know more about how it should all work. We would usually revert back to a known-good state or fix quickly, and leaving it broken for any length of time isn't the llvm way of doing things. So I'm all in favour of this fix, thanks.

We have scalable registers in AArch64 aliasing the same registers too, to throw another potential spanner in the works. I don't believe those registers actually get marked as scalable at the moment though. If we go with option 3 then we should try and make sure that works OK too.

Sorry all, I'm just seeing this thread now.
Apologies if I repeat things that have already been said, I haven't read through the whole history.

I agree that would be nice but that is not what TableGen currently does. I've just checked and all of b10 h10 s10 d10 q10 z10 have a single regunit number 35.

@jayfoad is right this is how the regunits work and the issue with register mask is well known (well at least @MatzeB and I know :P). We shouldn't use regunits for regmask operands because they are indeed not precise enough until we go with option 2.

(And by the way changing regunits shouldn't change regalloc. I'm curious to see why you @Pierre-vh saw this.)

Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue.

Pierre-vh · 2024-06-20T08:49:25Z

I updated the patch to include a thorough description of the problem. I would like to move forward with this fix so we don't have a miscompile in trunk while we investigate the issue, without having to revert a patch that solves a major performance issue in MachineLICM in AMDGPU.

mstorsjo · 2024-06-20T09:21:51Z

Thanks; with this fix merged now, my original case (as explained in #95746 (comment)) works correctly again.

Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue

Pierre-vh requested review from jayfoad, arsenm and davemgreen June 18, 2024 13:38

llvmbot added backend:AArch64 backend:AMDGPU labels Jun 18, 2024

arsenm reviewed Jun 18, 2024

View reviewed changes

jayfoad reviewed Jun 18, 2024

View reviewed changes

[MachineLICM] Work-around Incomplete RegUnits

f8da46e

Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue.

Pierre-vh force-pushed the fix-machine-licm-again branch from 4792ee5 to f8da46e Compare June 20, 2024 08:47

Pierre-vh changed the title ~~[MachineLICM] Workaround - apply RegMasks conservatively~~ [MachineLICM] Work-around Incomplete RegUnits Jun 20, 2024

Pierre-vh requested review from arsenm and jayfoad June 20, 2024 08:48

jayfoad approved these changes Jun 20, 2024

View reviewed changes

Pierre-vh merged commit f0897ea into llvm:main Jun 20, 2024
4 of 6 checks passed

Pierre-vh deleted the fix-machine-licm-again branch June 20, 2024 08:59

Pierre-vh mentioned this pull request Jun 20, 2024

Potential Miscompiles with RegUnits-based MachineLICM liveness calculation #96146

Open

jayfoad mentioned this pull request Sep 24, 2024

Question: What is the correct interpretation of LaneBitmask? #109797

Closed

jayfoad mentioned this pull request May 19, 2025

[WIP][CodeGen] Modifying MBB's liveins representation as into regUnits #129847

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MachineLICM] Work-around Incomplete RegUnits #95926

[MachineLICM] Work-around Incomplete RegUnits #95926

Pierre-vh commented Jun 18, 2024 •

edited

Loading

llvmbot commented Jun 18, 2024

llvmbot commented Jun 18, 2024

arsenm Jun 18, 2024

arsenm Jun 18, 2024

Pierre-vh Jun 19, 2024

arsenm Jun 19, 2024

jayfoad Jun 18, 2024

davemgreen Jun 18, 2024

jayfoad Jun 18, 2024

davemgreen Jun 18, 2024

jayfoad Jun 18, 2024

jayfoad Jun 19, 2024

Pierre-vh Jun 20, 2024

jayfoad Jun 20, 2024

davemgreen Jun 20, 2024

qcolombet Sep 4, 2024

Pierre-vh commented Jun 20, 2024 •

edited

Loading

mstorsjo commented Jun 20, 2024

		@@ -0,0 +1,49 @@
		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
		# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s \| FileCheck %s

[MachineLICM] Work-around Incomplete RegUnits #95926

[MachineLICM] Work-around Incomplete RegUnits #95926

Conversation

Pierre-vh commented Jun 18, 2024 • edited Loading

llvmbot commented Jun 18, 2024

llvmbot commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pierre-vh commented Jun 20, 2024 • edited Loading

mstorsjo commented Jun 20, 2024

Pierre-vh commented Jun 18, 2024 •

edited

Loading

Pierre-vh commented Jun 20, 2024 •

edited

Loading