[CodeGen] Use 128bits for LaneBitmask. #111157

sdesmalen-arm · 2024-10-04T13:51:12Z

This follows on from the conversation on #109797.

FWIW, I've considered several alternatives to this patch;

(1) Using APInt as storage type rather than 'uint64_t Mask[2]'.

(1) makes the code a bit simpler to read, but APInt by default only allocates space for a 64-bit value and otherwise dynamically allocates a larger buffer to represent the larger value. Because we know the value is always 128bit, this extra dynamic allocation is undesirable. We also rarely need the full power of APInt since most of the tests are bitwise operations, so I made the choice to represent it as a uint64_t array instead, and only moving to/from APInt when this is necessary.

Because it is inconvenient that increasing the BitWidth applies to all targets, I tried to see if there is a way to make the bitwidth dynamic, by:

(2) Making the BitWidth dynamic per target by passing it to
constructors.
(3) Modification of (2) that describes 'none', 'all' and 'lane' with
an an enum which doesn't require a BitWidth, until doing
arithmetic with an explicit bit mask (which does have a BitWidth).

Unfortunately both these approaches don't seem feasible. For (2) that is because it would require passing the TargetRegisterInfo/MCRegisterInfo to many places where this info is not available, where it needs to instantiates a LaneBitmask value.

Approach (3) leads to other issues such as questions like 'what is the meaning of 'operator==' when one value is a mask and the other is a 'all' enum?' If we let 'operator==' discard the bitwidth such that a 64-bit all-true bitmask == LaneBitmask::all() (using 'all' enum), then we could end up in a situation where:

X == LaneBitmask::all() && Y == LaneBitmask::all()

but X != Y.

I considered replacing the equality operators by methods that take a RegisterInfo pointer, but the LaneBitmask struct is used in STL containers which require a plain 'operator==' or 'operator<'. We could work around that by providing custom lambdas (that call the method with the TargetInfo pointer), but this just gets increasingly more hacky.

Perhaps just using more bits isn't actually that bad in practice.

This follows on from the conversation on llvm#109797. FWIW, I've considered several alternatives to this patch; (1) Using APInt as storage type rather than 'uint64_t Mask[2]'. (1) makes the code a bit simpler to read, but APInt by default only allocates space for a 64-bit value and otherwise dynamically allocates a larger buffer to represent the larger value. Because we know the value is always 128bit, this extra dynamic allocation is undesirable. We also rarely need the full power of APInt since most of the tests are bitwise operations, so I made the choice to represent it as a uint64_t array instead, and only moving to/from APInt when this is necessary. Because it is inconvenient that increasing the BitWidth applies to all targets, I tried to see if there is a way to make the bitwidth dynamic, by: (2) Making the BitWidth dynamic per target by passing it to constructors. (3) Modification of (2) that describes 'none', 'all' and 'lane' with an an enum which doesn't require a BitWidth, until doing arithmetic with an explicit bit mask (which does have a BitWidth). Unfortunately both these approaches don't seem feasible. For (2) that is because it would require passing the TargetRegisterInfo/MCRegisterInfo to many places where this info is not available, where it needs to instantiates a LaneBitmask value. Approach (3) leads to other issues such as questions like 'what is the meaning of 'operator==' when one value is a mask and the other is a 'all' enum?' If we let 'operator==' discard the bitwidth such that a 64-bit all-true bitmask == LaneBitmask::all() (using 'all' enum), then we could end up in a situation where: X == LaneBitmask::all() && Y == LaneBitmask::all() but `X != Y`. I considered replacing the equality operators by methods that take a RegisterInfo pointer, but the LaneBitmask struct is used in STL containers which require a plain 'operator==' or 'operator<'. We could work around that by providing custom lambdas (that call the method with the TargetInfo pointer), but this just gets increasingly more hacky. Perhaps just using more bits isn't actually that bad in practice.

llvmbot · 2024-10-04T13:51:44Z

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-mc
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

This follows on from the conversation on #109797.

FWIW, I've considered several alternatives to this patch;

(1) Using APInt as storage type rather than 'uint64_t Mask[2]'.

(1) makes the code a bit simpler to read, but APInt by default only allocates space for a 64-bit value and otherwise dynamically allocates a larger buffer to represent the larger value. Because we know the value is always 128bit, this extra dynamic allocation is undesirable. We also rarely need the full power of APInt since most of the tests are bitwise operations, so I made the choice to represent it as a uint64_t array instead, and only moving to/from APInt when this is necessary.

Because it is inconvenient that increasing the BitWidth applies to all targets, I tried to see if there is a way to make the bitwidth dynamic, by:

(2) Making the BitWidth dynamic per target by passing it to
constructors.
(3) Modification of (2) that describes 'none', 'all' and 'lane' with
an an enum which doesn't require a BitWidth, until doing
arithmetic with an explicit bit mask (which does have a BitWidth).

Unfortunately both these approaches don't seem feasible. For (2) that is because it would require passing the TargetRegisterInfo/MCRegisterInfo to many places where this info is not available, where it needs to instantiates a LaneBitmask value.

Approach (3) leads to other issues such as questions like 'what is the meaning of 'operator==' when one value is a mask and the other is a 'all' enum?' If we let 'operator==' discard the bitwidth such that a 64-bit all-true bitmask == LaneBitmask::all() (using 'all' enum), then we could end up in a situation where:

X == LaneBitmask::all() && Y == LaneBitmask::all()

but X != Y.

I considered replacing the equality operators by methods that take a RegisterInfo pointer, but the LaneBitmask struct is used in STL containers which require a plain 'operator==' or 'operator<'. We could work around that by providing custom lambdas (that call the method with the TargetInfo pointer), but this just gets increasingly more hacky.

Perhaps just using more bits isn't actually that bad in practice.

Patch is 20.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111157.diff

12 Files Affected:

(modified) llvm/include/llvm/CodeGen/RDFLiveness.h (+2-2)
(modified) llvm/include/llvm/CodeGen/RDFRegisters.h (+2-2)
(modified) llvm/include/llvm/MC/LaneBitmask.h (+97-49)
(modified) llvm/lib/CodeGen/MIRParser/MIParser.cpp (+34-11)
(modified) llvm/lib/CodeGen/MIRPrinter.cpp (+8-2)
(modified) llvm/lib/CodeGen/RDFRegisters.cpp (+5-5)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+4-1)
(added) llvm/test/CodeGen/AArch64/lanebitmask.mir (+18)
(modified) llvm/unittests/CodeGen/MFCommon.inc (+1-1)
(modified) llvm/unittests/MC/CMakeLists.txt (+1-1)
(added) llvm/unittests/MC/LaneBitmaskTest.cpp (+68)
(modified) llvm/utils/TableGen/RegisterInfoEmitter.cpp (+13-10)

diff --git a/llvm/include/llvm/CodeGen/RDFLiveness.h b/llvm/include/llvm/CodeGen/RDFLiveness.h
index fe1034f9b6f8ef..46ec82e77ba492 100644
--- a/llvm/include/llvm/CodeGen/RDFLiveness.h
+++ b/llvm/include/llvm/CodeGen/RDFLiveness.h
@@ -43,8 +43,8 @@ namespace std {
 
 template <> struct hash<llvm::rdf::detail::NodeRef> {
   std::size_t operator()(llvm::rdf::detail::NodeRef R) const {
-    return std::hash<llvm::rdf::NodeId>{}(R.first) ^
-           std::hash<llvm::LaneBitmask::Type>{}(R.second.getAsInteger());
+    return llvm::hash_value<llvm::rdf::NodeId>(R.first) ^
+           llvm::hash_value(R.second.getAsPair());
   }
 };
 
diff --git a/llvm/include/llvm/CodeGen/RDFRegisters.h b/llvm/include/llvm/CodeGen/RDFRegisters.h
index 7eed0b4e1e7b8f..87c38a215e006e 100644
--- a/llvm/include/llvm/CodeGen/RDFRegisters.h
+++ b/llvm/include/llvm/CodeGen/RDFRegisters.h
@@ -106,8 +106,8 @@ struct RegisterRef {
   }
 
   size_t hash() const {
-    return std::hash<RegisterId>{}(Reg) ^
-           std::hash<LaneBitmask::Type>{}(Mask.getAsInteger());
+    return llvm::hash_value<RegisterId>(Reg) ^
+           llvm::hash_value(Mask.getAsPair());
   }
 
   static constexpr bool isRegId(unsigned Id) {
diff --git a/llvm/include/llvm/MC/LaneBitmask.h b/llvm/include/llvm/MC/LaneBitmask.h
index c06ca7dd5b8fcd..ba22257c650a70 100644
--- a/llvm/include/llvm/MC/LaneBitmask.h
+++ b/llvm/include/llvm/MC/LaneBitmask.h
@@ -29,72 +29,120 @@
 #ifndef LLVM_MC_LANEBITMASK_H
 #define LLVM_MC_LANEBITMASK_H
 
+#include "llvm/ADT/APInt.h"
+#include "llvm/ADT/SmallString.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Format.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/Printable.h"
 #include "llvm/Support/raw_ostream.h"
+#include <utility>
 
 namespace llvm {
 
-  struct LaneBitmask {
-    // When changing the underlying type, change the format string as well.
-    using Type = uint64_t;
-    enum : unsigned { BitWidth = 8*sizeof(Type) };
-    constexpr static const char *const FormatStr = "%016llX";
+struct LaneBitmask {
+  static constexpr unsigned int BitWidth = 128;
 
-    constexpr LaneBitmask() = default;
-    explicit constexpr LaneBitmask(Type V) : Mask(V) {}
-
-    constexpr bool operator== (LaneBitmask M) const { return Mask == M.Mask; }
-    constexpr bool operator!= (LaneBitmask M) const { return Mask != M.Mask; }
-    constexpr bool operator< (LaneBitmask M)  const { return Mask < M.Mask; }
-    constexpr bool none() const { return Mask == 0; }
-    constexpr bool any()  const { return Mask != 0; }
-    constexpr bool all()  const { return ~Mask == 0; }
-
-    constexpr LaneBitmask operator~() const {
-      return LaneBitmask(~Mask);
-    }
-    constexpr LaneBitmask operator|(LaneBitmask M) const {
-      return LaneBitmask(Mask | M.Mask);
-    }
-    constexpr LaneBitmask operator&(LaneBitmask M) const {
-      return LaneBitmask(Mask & M.Mask);
-    }
-    LaneBitmask &operator|=(LaneBitmask M) {
-      Mask |= M.Mask;
-      return *this;
-    }
-    LaneBitmask &operator&=(LaneBitmask M) {
-      Mask &= M.Mask;
-      return *this;
+  explicit LaneBitmask(APInt V) {
+    switch (V.getBitWidth()) {
+    case BitWidth:
+      Mask[0] = V.getRawData()[0];
+      Mask[1] = V.getRawData()[1];
+      break;
+    default:
+      llvm_unreachable("Unsupported bitwidth");
     }
+  }
+  constexpr explicit LaneBitmask(uint64_t Lo = 0, uint64_t Hi = 0) : Mask{Lo, Hi} {}
 
-    constexpr Type getAsInteger() const { return Mask; }
+  constexpr bool operator==(LaneBitmask M) const {
+    return Mask[0] == M.Mask[0] && Mask[1] == M.Mask[1];
+  }
+  constexpr bool operator!=(LaneBitmask M) const {
+    return Mask[0] != M.Mask[0] || Mask[1] != M.Mask[1];
+  }
+  constexpr bool operator<(LaneBitmask M) const {
+    return Mask[1] < M.Mask[1] || Mask[0] < M.Mask[0];
+  }
+  constexpr bool none() const { return Mask[0] == 0 && Mask[1] == 0; }
+  constexpr bool any() const { return Mask[0] != 0 || Mask[1] != 0; }
+  constexpr bool all() const { return ~Mask[0] == 0 && ~Mask[1] == 0; }
 
-    unsigned getNumLanes() const { return llvm::popcount(Mask); }
-    unsigned getHighestLane() const {
-      return Log2_64(Mask);
-    }
+  constexpr LaneBitmask operator~() const { return LaneBitmask(~Mask[0], ~Mask[1]); }
+  constexpr LaneBitmask operator|(LaneBitmask M) const {
+    return LaneBitmask(Mask[0] | M.Mask[0], Mask[1] | M.Mask[1]);
+  }
+  constexpr LaneBitmask operator&(LaneBitmask M) const {
+    return LaneBitmask(Mask[0] & M.Mask[0], Mask[1] & M.Mask[1]);
+  }
+  LaneBitmask &operator|=(LaneBitmask M) {
+    Mask[0] |= M.Mask[0];
+    Mask[1] |= M.Mask[1];
+    return *this;
+  }
+  LaneBitmask &operator&=(LaneBitmask M) {
+    Mask[0] &= M.Mask[0];
+    Mask[1] &= M.Mask[1];
+    return *this;
+  }
 
-    static constexpr LaneBitmask getNone() { return LaneBitmask(0); }
-    static constexpr LaneBitmask getAll() { return ~LaneBitmask(0); }
-    static constexpr LaneBitmask getLane(unsigned Lane) {
-      return LaneBitmask(Type(1) << Lane);
-    }
+  APInt getAsAPInt() const { return APInt(BitWidth, {Mask[0], Mask[1]}); }
+  constexpr std::pair<uint64_t, uint64_t> getAsPair() const { return {Mask[0], Mask[1]}; }
 
-  private:
-    Type Mask = 0;
-  };
+  unsigned getNumLanes() const {
+    return Mask[1] ? llvm::popcount(Mask[1]) + llvm::popcount(Mask[0])
+                   : llvm::popcount(Mask[0]);
+  }
+  unsigned getHighestLane() const {
+    return Mask[1] ? Log2_64(Mask[1]) + 64 : Log2_64(Mask[0]);
+  }
 
-  /// Create Printable object to print LaneBitmasks on a \ref raw_ostream.
-  inline Printable PrintLaneMask(LaneBitmask LaneMask) {
-    return Printable([LaneMask](raw_ostream &OS) {
-      OS << format(LaneBitmask::FormatStr, LaneMask.getAsInteger());
-    });
+  static constexpr LaneBitmask getNone() { return LaneBitmask(0, 0); }
+  static constexpr LaneBitmask getAll() { return ~LaneBitmask(0, 0); }
+  static constexpr LaneBitmask getLane(unsigned Lane) {
+    return Lane >= 64 ? LaneBitmask(0, 1ULL << (Lane % 64))
+                      : LaneBitmask(1ULL << Lane, 0);
   }
 
+private:
+  uint64_t Mask[2];
+};
+
+/// Create Printable object to print LaneBitmasks on a \ref raw_ostream.
+/// If \p FormatAsCLiterals is true, it will print the bitmask as
+/// a hexadecimal C literal with zero padding, or a list of such C literals if
+/// the value cannot be represented in 64 bits.
+/// For example (FormatAsCliterals == true)
+///   bitmask '1'       => "0x0000000000000001"
+///   bitmask '1 << 64' => "0x0000000000000000,0x0000000000000001"
+/// (FormatAsCLiterals == false)
+///   bitmask '1'       => "00000000000000000000000000000001"
+///   bitmask '1 << 64' => "00000000000000010000000000000000"
+inline Printable PrintLaneMask(LaneBitmask LaneMask,
+                               bool FormatAsCLiterals = false) {
+  return Printable([LaneMask, FormatAsCLiterals](raw_ostream &OS) {
+    SmallString<64> Buffer;
+    APInt V = LaneMask.getAsAPInt();
+    while (true) {
+      unsigned Bitwidth = FormatAsCLiterals ? 64 : LaneBitmask::BitWidth;
+      APInt VToPrint = V.trunc(Bitwidth);
+
+      Buffer.clear();
+      VToPrint.toString(Buffer, 16, /*Signed=*/false,
+                        /*formatAsCLiteral=*/false);
+      unsigned NumZeroesToPad =
+          (VToPrint.countLeadingZeros() / 4) - VToPrint.isZero();
+      OS << (FormatAsCLiterals ? "0x" : "") << std::string(NumZeroesToPad, '0')
+         << Buffer.str();
+      V = V.lshr(Bitwidth);
+      if (V.getActiveBits())
+        OS << ",";
+      else
+        break;
+    }
+  });
+}
+
 } // end namespace llvm
 
 #endif // LLVM_MC_LANEBITMASK_H
diff --git a/llvm/lib/CodeGen/MIRParser/MIParser.cpp b/llvm/lib/CodeGen/MIRParser/MIParser.cpp
index 27f0a9331a3e3e..6b6b5be910fdc4 100644
--- a/llvm/lib/CodeGen/MIRParser/MIParser.cpp
+++ b/llvm/lib/CodeGen/MIRParser/MIParser.cpp
@@ -870,17 +870,40 @@ bool MIParser::parseBasicBlockLiveins(MachineBasicBlock &MBB) {
     lex();
     LaneBitmask Mask = LaneBitmask::getAll();
     if (consumeIfPresent(MIToken::colon)) {
-      // Parse lane mask.
-      if (Token.isNot(MIToken::IntegerLiteral) &&
-          Token.isNot(MIToken::HexLiteral))
-        return error("expected a lane mask");
-      static_assert(sizeof(LaneBitmask::Type) == sizeof(uint64_t),
-                    "Use correct get-function for lane mask");
-      LaneBitmask::Type V;
-      if (getUint64(V))
-        return error("invalid lane mask value");
-      Mask = LaneBitmask(V);
-      lex();
+      if (consumeIfPresent(MIToken::lparen)) {
+        // We need to parse a list of literals
+        SmallVector<uint64_t, 2> Literals;
+        while (true) {
+          if (Token.isNot(MIToken::HexLiteral))
+            return error("expected a lane mask");
+          APInt V;
+          getHexUint(V);
+          Literals.push_back(V.getZExtValue());
+          // Lex past literal
+          lex();
+          if (Token.is(MIToken::rparen))
+            break;
+          else if (Token.isNot(MIToken::comma))
+            return error("expected a comma");
+          // Lex past comma
+          lex();
+        }
+        // Lex past rparen
+        lex();
+        Mask = LaneBitmask(APInt(LaneBitmask::BitWidth, Literals));
+      } else {
+        // Parse lane mask.
+        APInt V;
+        if (Token.is(MIToken::IntegerLiteral)) {
+          uint64_t UV;
+          if (getUint64(UV))
+            return error("invalid lane mask value");
+          V = APInt(LaneBitmask::BitWidth, UV);
+        } else if (getHexUint(V))
+          return error("expected a lane mask");
+        Mask = LaneBitmask(APInt(LaneBitmask::BitWidth, V.getZExtValue()));
+        lex();
+      }
     }
     MBB.addLiveIn(Reg, Mask);
   } while (consumeIfPresent(MIToken::comma));
diff --git a/llvm/lib/CodeGen/MIRPrinter.cpp b/llvm/lib/CodeGen/MIRPrinter.cpp
index cf6122bce22364..349e3aaa87f72a 100644
--- a/llvm/lib/CodeGen/MIRPrinter.cpp
+++ b/llvm/lib/CodeGen/MIRPrinter.cpp
@@ -732,8 +732,14 @@ void MIPrinter::print(const MachineBasicBlock &MBB) {
         OS << ", ";
       First = false;
       OS << printReg(LI.PhysReg, &TRI);
-      if (!LI.LaneMask.all())
-        OS << ":0x" << PrintLaneMask(LI.LaneMask);
+      if (!LI.LaneMask.all()) {
+        OS << ":";
+        if (LI.LaneMask.getAsAPInt().getActiveBits() <= 64)
+          OS << PrintLaneMask(LI.LaneMask, /*FormatAsCLiterals=*/true);
+        else
+          OS << '(' << PrintLaneMask(LI.LaneMask, /*FormatAsCLiterals=*/true)
+             << ')';
+      }
     }
     OS << "\n";
     HasLineAttributes = true;
diff --git a/llvm/lib/CodeGen/RDFRegisters.cpp b/llvm/lib/CodeGen/RDFRegisters.cpp
index 7ce00a66b3ae6c..3229647379207f 100644
--- a/llvm/lib/CodeGen/RDFRegisters.cpp
+++ b/llvm/lib/CodeGen/RDFRegisters.cpp
@@ -412,11 +412,11 @@ raw_ostream &operator<<(raw_ostream &OS, const PrintLaneMaskShort &P) {
   if (P.Mask.none())
     return OS << ":*none*";
 
-  LaneBitmask::Type Val = P.Mask.getAsInteger();
-  if ((Val & 0xffff) == Val)
-    return OS << ':' << format("%04llX", Val);
-  if ((Val & 0xffffffff) == Val)
-    return OS << ':' << format("%08llX", Val);
+  APInt Val = P.Mask.getAsAPInt();
+  if (Val.getActiveBits() <= 16)
+    return OS << ':' << format("%04llX", Val.getZExtValue());
+  if (Val.getActiveBits() <= 32)
+    return OS << ':' << format("%08llX", Val.getZExtValue());
   return OS << ':' << PrintLaneMask(P.Mask);
 }
 
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 88d5686720985e..a7796906779594 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -377,7 +377,10 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
   static unsigned getNumCoveredRegs(LaneBitmask LM) {
     // The assumption is that every lo16 subreg is an even bit and every hi16
     // is an adjacent odd bit or vice versa.
-    uint64_t Mask = LM.getAsInteger();
+    APInt MaskV = LM.getAsAPInt();
+    assert(MaskV.getActiveBits() <= 64 &&
+           "uint64_t is insufficient to represent lane bitmask operation");
+    uint64_t Mask = MaskV.getZExtValue();
     uint64_t Even = Mask & 0xAAAAAAAAAAAAAAAAULL;
     Mask = (Even >> 1) | Mask;
     uint64_t Odd = Mask & 0x5555555555555555ULL;
diff --git a/llvm/test/CodeGen/AArch64/lanebitmask.mir b/llvm/test/CodeGen/AArch64/lanebitmask.mir
new file mode 100644
index 00000000000000..13178ea5cff4e3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/lanebitmask.mir
@@ -0,0 +1,18 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -o - %s -mtriple=aarch64 -stop-before=greedy | FileCheck %s
+---
+name:            test_parse_lanebitmask
+tracksRegLiveness: true
+liveins:
+  - { reg: '$h0' }
+  - { reg: '$s1' }
+body:             |
+  bb.0:
+    liveins: $h0:0x0000000000000001, $s1:(0x0000000000000001,0x0000000000000000)
+    ; CHECK-LABEL: name: test_parse_lanebitmask
+    ; CHECK: liveins: $h0:0x0000000000000001, $s1:0x0000000000000001, $h0, $s1
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: RET_ReallyLR
+    RET_ReallyLR
+...
+
diff --git a/llvm/unittests/CodeGen/MFCommon.inc b/llvm/unittests/CodeGen/MFCommon.inc
index 5d5720c3162da9..720c5e43586d7e 100644
--- a/llvm/unittests/CodeGen/MFCommon.inc
+++ b/llvm/unittests/CodeGen/MFCommon.inc
@@ -23,7 +23,7 @@ class BogusRegisterInfo : public TargetRegisterInfo {
 public:
   BogusRegisterInfo()
       : TargetRegisterInfo(nullptr, BogusRegisterClasses, BogusRegisterClasses,
-                           nullptr, nullptr, nullptr, LaneBitmask(~0u), nullptr,
+                           nullptr, nullptr, nullptr, LaneBitmask::getAll(), nullptr,
                            nullptr) {
     InitMCRegisterInfo(nullptr, 0, 0, 0, nullptr, 0, nullptr, 0, nullptr,
                        nullptr, nullptr, nullptr, nullptr, 0, nullptr);
diff --git a/llvm/unittests/MC/CMakeLists.txt b/llvm/unittests/MC/CMakeLists.txt
index da8e219113f465..fabf0f8512786a 100644
--- a/llvm/unittests/MC/CMakeLists.txt
+++ b/llvm/unittests/MC/CMakeLists.txt
@@ -17,9 +17,9 @@ add_llvm_unittest(MCTests
   Disassembler.cpp
   DwarfLineTables.cpp
   DwarfLineTableHeaders.cpp
+  LaneBitmaskTest.cpp
   MCInstPrinter.cpp
   StringTableBuilderTest.cpp
   TargetRegistry.cpp
   MCDisassemblerTest.cpp
   )
-
diff --git a/llvm/unittests/MC/LaneBitmaskTest.cpp b/llvm/unittests/MC/LaneBitmaskTest.cpp
new file mode 100644
index 00000000000000..d7762122e03220
--- /dev/null
+++ b/llvm/unittests/MC/LaneBitmaskTest.cpp
@@ -0,0 +1,68 @@
+//===------------------ LaneBitmaskTest.cpp -------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "gtest/gtest.h"
+#include "llvm/MC/LaneBitmask.h"
+#include "llvm/Support/raw_ostream.h"
+#include <string>
+
+using namespace llvm;
+
+TEST(LaneBitmaskTest, Basic) {
+  EXPECT_EQ(LaneBitmask::getAll(), ~LaneBitmask::getNone());
+  EXPECT_EQ(LaneBitmask::getNone(), ~LaneBitmask::getAll());
+  EXPECT_EQ(LaneBitmask::getLane(0) | LaneBitmask::getLane(1), LaneBitmask(3));
+  EXPECT_EQ(LaneBitmask(3) & LaneBitmask::getLane(1), LaneBitmask::getLane(1));
+
+  EXPECT_EQ(LaneBitmask(APInt(128, 42)).getAsAPInt(), APInt(128, 42));
+  EXPECT_EQ(LaneBitmask(3).getNumLanes(), 2);
+  EXPECT_EQ(LaneBitmask::getLane(0).getHighestLane(), 0);
+  EXPECT_EQ(LaneBitmask::getLane(64).getHighestLane(), 64);
+  EXPECT_EQ(LaneBitmask::getLane(127).getHighestLane(), 127);
+
+  EXPECT_LT(LaneBitmask::getLane(64), LaneBitmask::getLane(65));
+  EXPECT_LT(LaneBitmask::getLane(63), LaneBitmask::getLane(64));
+  EXPECT_LT(LaneBitmask::getLane(62), LaneBitmask::getLane(63));
+
+  LaneBitmask X(1);
+  X |= LaneBitmask(2);
+  EXPECT_EQ(X, LaneBitmask(3));
+
+  LaneBitmask Y(3);
+  Y &= LaneBitmask(1);
+  EXPECT_EQ(Y, LaneBitmask(1));
+}
+
+TEST(LaneBitmaskTest, Print) {
+  std::string buffer;
+  raw_string_ostream OS(buffer);
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getAll(), /*FormatAsCLiterals=*/true);
+  EXPECT_STREQ(OS.str().data(), "0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF");
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getAll(), /*FormatAsCLiterals=*/false);
+  EXPECT_STREQ(OS.str().data(), "FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getLane(0), /*FormatAsCLiterals=*/true);
+  EXPECT_STREQ(OS.str().data(), "0x0000000000000001");
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getLane(63), /*FormatAsCLiterals=*/true);
+  EXPECT_STREQ(OS.str().data(), "0x8000000000000000");
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getNone(), /*FormatAsCLiterals=*/true);
+  EXPECT_STREQ(OS.str().data(), "0x0000000000000000");
+
+  buffer = "";
+  OS << PrintLaneMask(LaneBitmask::getLane(64), /*FormatAsCLiterals=*/true);
+  EXPECT_STREQ(OS.str().data(), "0x0000000000000000,0x0000000000000001");
+}
diff --git a/llvm/utils/TableGen/RegisterInfoEmitter.cpp b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
index 7d81a83ef2b0a6..bf8aea096c1cf1 100644
--- a/llvm/utils/TableGen/RegisterInfoEmitter.cpp
+++ b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
@@ -653,7 +653,8 @@ static DiffVec &diffEncode(DiffVec &V, unsigned InitVal, Iter Begin, Iter End) {
 static void printDiff16(raw_ostream &OS, int16_t Val) { OS << Val; }
 
 static void printMask(raw_ostream &OS, LaneBitmask Val) {
-  OS << "LaneBitmask(0x" << PrintLaneMask(Val) << ')';
+  OS << "LaneBitmask("
+     << PrintLaneMask(Val, /*FormatAsCLiteral=*/true) << ")";
 }
 
 // Try to combine Idx's compose map into Vec if it is compatible.
@@ -818,11 +819,11 @@ void RegisterInfoEmitter::emitComposeSubRegIndexLaneMask(raw_ostream &OS,
         "  for (const MaskRolOp *Ops =\n"
         "       &LaneMaskComposeSequences[CompositeSequences[IdxA]];\n"
         "       Ops->Mask.any(); ++Ops) {\n"
-        "    LaneBitmask::Type M = LaneMask.getAsInteger() & "
-        "Ops->Mask.getAsInteger();\n"
+        "    APInt M = LaneMask.getAsAPInt() & "
+        "Ops->Mask.getAsAPInt();\n"
         "    if (unsigned S = Ops->RotateLeft)\n"
-        "      Result |= LaneBitmask((M << S) | (M >> (LaneBitmask::BitWidth - "
-        "S)));\n"
+        "      Result |= LaneBitmask(M.shl(S) | M.lshr(LaneBitmask::BitWidth - "
+        "S));\n"
         "    else\n"
         "      Result |= LaneBitmask(M);\n"
         "  }\n"
@@ -840,10 +841,10 @@ void RegisterInfoEmitter::emitComposeSubRegIndexLaneMask(raw_ostream &OS,
         "  for (const MaskRolOp *Ops =\n"
         "       &LaneMaskComposeSequences[CompositeSequences[IdxA]];\n"
         "       Ops->Mask.any(); ++Ops) {\n"
-        "    LaneBitmask::Type M = LaneMask.getAsInteger();\n"
+        "    APInt M = LaneMask.getAsAPInt();\n"
         "    if (unsigned S = Ops->RotateLeft)\n"
-        "      Result |= LaneBitmask((M >> S) | (M << (LaneBitmask::BitWidth - "
-        "S)));\n"
+        "      Result |= LaneBitmask(M.lshr(S) | M.shl(LaneBitmask::BitWidth - "
+        "S));\n"
         "    else\n"
         "      Result |= LaneBitmask(M);\n"
         "  }\n"
@@ -1836,7 +1837,8 @@ void RegisterInfoEmitter::debugDump(raw_ostream &OS) {
     for (unsigned M = 0; M != NumModes; ++M)
       OS << ' ' << getModeName(M) << ':' << RC.RSI.get(M).SpillAlignment;
     OS << " }\n\tNumRegs: " << RC.getMembers().size() << '\n';
-    OS << "\tLaneMask: " << PrintLaneMask(RC.LaneMask) << '\n';
+    OS << "\tLaneMask: {"
+       << PrintLaneMask(RC.LaneMask, /*FormatAsCLiteral=*/true) << "}\n";
     OS << "\tHasDisjunctSubRegs: " << RC.HasDisjunctSubRegs << '\n';
     OS << "\tCoveredBySubRegs: " << RC.CoveredBySubRegs << '\n';
     OS << "\tAllocatable: " << RC.Allocatable << '\n';
@@ -1864,7 +1866,8 @@ void RegisterInfoEmitter::debugDump(raw_ostream &OS) {
 
   for (const CodeGenSubRegIndex &SRI : RegBank.getSubRegIndices()) {
     OS << "SubRegIndex " << SRI.getName() << ":\n";
-    OS << "\tLaneMask: " << PrintLaneMask(SRI.LaneMask) << '\n';
+    OS << "\tLaneMask: {"
+       << PrintLaneMask(SRI.LaneMask, /*FormatAsCLiteral=*/true) << "}\n";
     OS << "\tAllSuperRegsCovered: " << SRI.AllSuperRegsCovered << '\n';
     OS << "\tOffset: {";
     for (unsigned M = 0; M != NumMod...
[truncated]

github-actions · 2024-10-04T13:55:58Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 206408732bca2ef464732a39c8319d47c8a1dbea 136d33b74a1e18fe1b3675135907c6ceb4b80e15 --extensions h,inc,cpp -- llvm/unittests/MC/LaneBitmaskTest.cpp llvm/include/llvm/CodeGen/RDFLiveness.h llvm/include/llvm/CodeGen/RDFRegisters.h llvm/include/llvm/MC/LaneBitmask.h llvm/lib/CodeGen/MIRParser/MIParser.cpp llvm/lib/CodeGen/MIRPrinter.cpp llvm/lib/CodeGen/RDFRegisters.cpp llvm/lib/Target/AMDGPU/SIRegisterInfo.h llvm/unittests/CodeGen/MFCommon.inc llvm/utils/TableGen/RegisterInfoEmitter.cpp

View the diff from clang-format here.

diff --git a/llvm/include/llvm/MC/LaneBitmask.h b/llvm/include/llvm/MC/LaneBitmask.h
index 746f362f0d..89c0d9fa59 100644
--- a/llvm/include/llvm/MC/LaneBitmask.h
+++ b/llvm/include/llvm/MC/LaneBitmask.h
@@ -53,7 +53,8 @@ struct LaneBitmask {
       llvm_unreachable("Unsupported bitwidth");
     }
   }
-  constexpr explicit LaneBitmask(uint64_t Lo = 0, uint64_t Hi = 0) : Mask{Lo, Hi} {}
+  constexpr explicit LaneBitmask(uint64_t Lo = 0, uint64_t Hi = 0)
+      : Mask{Lo, Hi} {}
 
   constexpr bool operator==(LaneBitmask M) const {
     return Mask[0] == M.Mask[0] && Mask[1] == M.Mask[1];
@@ -68,7 +69,9 @@ struct LaneBitmask {
   constexpr bool any() const { return Mask[0] != 0 || Mask[1] != 0; }
   constexpr bool all() const { return ~Mask[0] == 0 && ~Mask[1] == 0; }
 
-  constexpr LaneBitmask operator~() const { return LaneBitmask(~Mask[0], ~Mask[1]); }
+  constexpr LaneBitmask operator~() const {
+    return LaneBitmask(~Mask[0], ~Mask[1]);
+  }
   constexpr LaneBitmask operator|(LaneBitmask M) const {
     return LaneBitmask(Mask[0] | M.Mask[0], Mask[1] | M.Mask[1]);
   }
@@ -87,7 +90,9 @@ struct LaneBitmask {
   }
 
   APInt getAsAPInt() const { return APInt(BitWidth, {Mask[0], Mask[1]}); }
-  constexpr std::pair<uint64_t, uint64_t> getAsPair() const { return {Mask[0], Mask[1]}; }
+  constexpr std::pair<uint64_t, uint64_t> getAsPair() const {
+    return {Mask[0], Mask[1]};
+  }
 
   unsigned getNumLanes() const {
     return Mask[1] ? llvm::popcount(Mask[1]) + llvm::popcount(Mask[0])
diff --git a/llvm/unittests/CodeGen/MFCommon.inc b/llvm/unittests/CodeGen/MFCommon.inc
index 720c5e4358..583c0281b7 100644
--- a/llvm/unittests/CodeGen/MFCommon.inc
+++ b/llvm/unittests/CodeGen/MFCommon.inc
@@ -23,8 +23,8 @@ class BogusRegisterInfo : public TargetRegisterInfo {
 public:
   BogusRegisterInfo()
       : TargetRegisterInfo(nullptr, BogusRegisterClasses, BogusRegisterClasses,
-                           nullptr, nullptr, nullptr, LaneBitmask::getAll(), nullptr,
-                           nullptr) {
+                           nullptr, nullptr, nullptr, LaneBitmask::getAll(),
+                           nullptr, nullptr) {
     InitMCRegisterInfo(nullptr, 0, 0, 0, nullptr, 0, nullptr, 0, nullptr,
                        nullptr, nullptr, nullptr, nullptr, 0, nullptr);
   }
diff --git a/llvm/unittests/MC/LaneBitmaskTest.cpp b/llvm/unittests/MC/LaneBitmaskTest.cpp
index 4db9ed5a74..1c7c01943d 100644
--- a/llvm/unittests/MC/LaneBitmaskTest.cpp
+++ b/llvm/unittests/MC/LaneBitmaskTest.cpp
@@ -6,9 +6,9 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "gtest/gtest.h"
 #include "llvm/MC/LaneBitmask.h"
 #include "llvm/Support/raw_ostream.h"
+#include "gtest/gtest.h"
 #include <string>
 
 using namespace llvm;
@@ -28,7 +28,8 @@ TEST(LaneBitmaskTest, Basic) {
   EXPECT_LT(LaneBitmask::getLane(64), LaneBitmask::getLane(65));
   EXPECT_LT(LaneBitmask::getLane(63), LaneBitmask::getLane(64));
   EXPECT_LT(LaneBitmask::getLane(62), LaneBitmask::getLane(63));
-  EXPECT_LT(LaneBitmask::getLane(64), LaneBitmask::getLane(64) | LaneBitmask::getLane(0));
+  EXPECT_LT(LaneBitmask::getLane(64),
+            LaneBitmask::getLane(64) | LaneBitmask::getLane(0));
 
   LaneBitmask X(1);
   X |= LaneBitmask(2);
diff --git a/llvm/utils/TableGen/RegisterInfoEmitter.cpp b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
index bf8aea096c..4a5b05b381 100644
--- a/llvm/utils/TableGen/RegisterInfoEmitter.cpp
+++ b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
@@ -653,8 +653,7 @@ static DiffVec &diffEncode(DiffVec &V, unsigned InitVal, Iter Begin, Iter End) {
 static void printDiff16(raw_ostream &OS, int16_t Val) { OS << Val; }
 
 static void printMask(raw_ostream &OS, LaneBitmask Val) {
-  OS << "LaneBitmask("
-     << PrintLaneMask(Val, /*FormatAsCLiteral=*/true) << ")";
+  OS << "LaneBitmask(" << PrintLaneMask(Val, /*FormatAsCLiteral=*/true) << ")";
 }
 
 // Try to combine Idx's compose map into Vec if it is compatible.

jayfoad · 2024-10-04T13:57:01Z

Can you test this with compile-time-tracker? Cc @nikic

arsenm · 2024-10-04T13:59:50Z

Can you test this with compile-time-tracker? Cc @nikic

I bet it will give worse results on amdgpu, or other targets that actually enable subreg liveness. We really need an amdgpu compile time tracker

arsenm · 2024-10-04T14:02:44Z

I still don't understand why AArch64 needs so many bits. Having sub registers that alias does not mean you need additional register units. You should only need one for each physically distinct bits, despite differences in access

jayfoad · 2024-10-04T13:58:44Z

llvm/include/llvm/MC/LaneBitmask.h

+    return Mask[0] != M.Mask[0] || Mask[1] != M.Mask[1];
+  }
+  constexpr bool operator<(LaneBitmask M) const {
+    return Mask[1] < M.Mask[1] || Mask[0] < M.Mask[0];


This isn't right!

I had changed this locally but forgot to squash the change!

jayfoad · 2024-10-04T13:59:21Z

llvm/include/llvm/MC/LaneBitmask.h

-    Type Mask = 0;
-  };
+  unsigned getNumLanes() const {
+    return Mask[1] ? llvm::popcount(Mask[1]) + llvm::popcount(Mask[0])


I would hope you could rely on modern host machines having a fast popcount instruction so you don't need to special-case Mask[1] here, but I'm not sure.

jayfoad · 2024-10-04T14:02:13Z

llvm/include/llvm/MC/LaneBitmask.h

+};
+
+/// Create Printable object to print LaneBitmasks on a \ref raw_ostream.
+/// If \p FormatAsCLiterals is true, it will print the bitmask as


Why do we need the option? Please can we just print it as a single hex literal with lots of digits?

This is mostly to avoid having to update many tests that rely on a pretty printed format. Changing it would require changing various tests (which I'm happy to do if that's preferred).

jayfoad · 2024-10-04T14:33:34Z

FYI I once added a similar 128-bit-int-like class here:

llvm-project/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h

Line 37 in bf488ed

class DecoderUInt128 {

Maybe one day it could migrate into ADT, if there is enough commonality.

nikic · 2024-10-04T14:36:20Z

Would using std::bitset<128> be an option here?

sdesmalen-arm · 2024-10-04T14:51:39Z

I still don't understand why AArch64 needs so many bits. Having sub registers that alias does not mean you need additional register units. You should only need one for each physically distinct bits, despite differences in access

I think this is mostly because defining register tuples (2x, 3x and 4x) replicates the regunits. When I define the top bits and do some post-processing of the table in AArch64GenRegisterInfo.inc, I get the following lane masks:

0x00000000000000000001 // bsub
0x00000000000000000002 // bsub_hi
0x00000000000000000004 // dsub_hi
0x00000000000000000008 // hsub_hi
0x00000000000000000010 // psub
0x00000000000000000020 // qsub_hi
0x00000000000000000040 // ssub_hi
0x00000000000000000080 // sub_32, sube64, x8sub_0
0x00000000000000000100 // sube32
0x00000000000000000200 // subo32
0x00000000000000000400 // zasubq0
0x00000000000000000800 // zasubq1
0x00000000000000001000 // zasubd1_then_zasubq0
0x00000000000000002000 // zasubd1_then_zasubq1
0x00000000000000004000 // zasubs1_then_zasubq0
0x00000000000000008000 // zasubs1_then_zasubq1
0x00000000000000010000 // zasubs1_then_zasubd1_then_zasubq0
0x00000000000000020000 // zasubs1_then_zasubd1_then_zasubq1
0x00000000000000040000 // zasubh1_then_zasubq0
0x00000000000000080000 // zasubh1_then_zasubq1
0x00000000000000100000 // zasubh1_then_zasubd1_then_zasubq0
0x00000000000000200000 // zasubh1_then_zasubd1_then_zasubq1
0x00000000000000400000 // zasubh1_then_zasubs1_then_zasubq0
0x00000000000000800000 // zasubh1_then_zasubs1_then_zasubq1
0x00000000000001000000 // zasubh1_then_zasubs1_then_zasubd1_then_zasubq0
0x00000000000002000000 // zasubh1_then_zasubs1_then_zasubd1_then_zasubq1
0x00000000000004000000 // dsub1_then_bsub
0x00000000000008000000 // dsub1_then_bsub_hi
0x00000000000010000000 // dsub1_then_hsub_hi
0x00000000000020000000 // dsub1_then_ssub_hi
0x00000000000040000000 // dsub3_then_bsub
0x00000000000080000000 // dsub3_then_bsub_hi
0x00000000000100000000 // dsub3_then_hsub_hi
0x00000000000200000000 // dsub3_then_ssub_hi
0x00000000000400000000 // dsub2_then_bsub
0x00000000000800000000 // dsub2_then_bsub_hi
0x00000000001000000000 // dsub2_then_hsub_hi
0x00000000002000000000 // dsub2_then_ssub_hi
0x00000000004000000000 // psub1, psub1_then_psub
0x00000000008000000000 // qsub1_then_bsub
0x00000000010000000000 // qsub1_then_bsub_hi
0x00000000020000000000 // qsub1_then_dsub_hi
0x00000000040000000000 // qsub1_then_hsub_hi
0x00000000080000000000 // qsub1_then_ssub_hi
0x00000000100000000000 // qsub3_then_bsub
0x00000000200000000000 // qsub3_then_bsub_hi
0x00000000400000000000 // qsub3_then_dsub_hi
0x00000000800000000000 // qsub3_then_hsub_hi
0x00000001000000000000 // qsub3_then_ssub_hi
0x00000002000000000000 // qsub2_then_bsub
0x00000004000000000000 // qsub2_then_bsub_hi
0x00000008000000000000 // qsub2_then_dsub_hi
0x00000010000000000000 // qsub2_then_hsub_hi
0x00000020000000000000 // qsub2_then_ssub_hi
0x00000040000000000000 // x8sub_7, x8sub_7_then_sub_32
0x00000080000000000000 // x8sub_6, x8sub_6_then_sub_32
0x00000100000000000000 // x8sub_5, x8sub_5_then_sub_32
0x00000200000000000000 // x8sub_4, x8sub_4_then_sub_32
0x00000400000000000000 // x8sub_3, x8sub_3_then_sub_32
0x00000800000000000000 // x8sub_2, x8sub_2_then_sub_32
0x00001000000000000000 // x8sub_1, x8sub_1_then_sub_32
0x00002000000000000000 // subo64, subo64_then_sub_32
0x00004000000000000000 // zsub1_then_bsub
0x00008000000000000000 // zsub1_then_bsub_hi
0x00010000000000000000 // zsub1_then_dsub_hi
0x00020000000000000000 // zsub1_then_hsub_hi
0x00040000000000000000 // zsub1_then_qsub_hi
0x00080000000000000000 // zsub1_then_ssub_hi
0x00100000000000000000 // zsub3_then_bsub
0x00200000000000000000 // zsub3_then_bsub_hi
0x00400000000000000000 // zsub3_then_dsub_hi
0x00800000000000000000 // zsub3_then_hsub_hi
0x01000000000000000000 // zsub3_then_qsub_hi
0x02000000000000000000 // zsub3_then_ssub_hi
0x04000000000000000000 // zsub2_then_bsub
0x08000000000000000000 // zsub2_then_bsub_hi
0x10000000000000000000 // zsub2_then_dsub_hi
0x20000000000000000000 // zsub2_then_hsub_hi
0x40000000000000000000 // zsub2_then_qsub_hi
0x80000000000000000000 // zsub2_then_ssub_hi

Where:

zasub => Matrix registers
bsub/hsub/ssub/dsub/qsub => FP/vector registers
zsub => SVE (scalable data vectors)
psub => SVE (scalable predicate vectors)
anything else => GPR registers

arsenm · 2024-10-04T15:03:22Z

I think this is mostly because defining register tuples (2x, 3x and 4x) replicates the regunits. When I define the top bits and do some post-processing of the table in AArch64GenRegisterInfo.inc, I get the following lane masks:

This doesn't sound right. AMDGPU nearly exclusively uses register tuples, and we get one regunit per lane (well, one for each 16-bit half of each lane).

sdesmalen-arm · 2024-10-04T15:03:42Z

Would using std::bitset<128> be an option here?

Yes, I agree that's better than reinventing. I guess llvm::bitset would be preferred over std::bitset though?

sdesmalen-arm · 2024-10-04T16:34:18Z

I think this is mostly because defining register tuples (2x, 3x and 4x) replicates the regunits. When I define the top bits and do some post-processing of the table in AArch64GenRegisterInfo.inc, I get the following lane masks:

This doesn't sound right. AMDGPU nearly exclusively uses register tuples, and we get one regunit per lane (well, one for each 16-bit half of each lane).

I'm not really sure what AMDGPU does that is different or how it encodes the information more efficiently. Are there any lane masks in the table I shared above that you believe use unnecessary regunits?

arsenm · 2024-10-04T17:20:19Z

I'm not really sure what AMDGPU does that is different or how it encodes the information more efficiently. Are there any lane masks in the table I shared above that you believe use unnecessary regunits?

This table seems to have one bit for every subregister index. I would expect overlapping tuples to use multiple bits of mask. e.g. AMDGPU has this:

hi16 :   L0000000000000001 EMPTY
lo16 :   L0000000000000002 EMPTY
sub0 :   L0000000000000003 EMPTY
sub0_sub1 :   L000000000000000F EMPTY
sub0_sub1_sub2 :   L000000000000003F EMPTY
...

topperc · 2024-10-04T22:38:16Z

llvm/include/llvm/MC/LaneBitmask.h

+  explicit LaneBitmask(APInt V) {
+    switch (V.getBitWidth()) {
+    case BitWidth:
+      Mask[0] = V.getRawData()[0];


Probably should use APInt::extractBitsAsZExtValue() instead exposing APInts internals.

sdesmalen-arm · 2024-10-07T14:18:31Z

I'm not really sure what AMDGPU does that is different or how it encodes the information more efficiently. Are there any lane masks in the table I shared above that you believe use unnecessary regunits?

This table seems to have one bit for every subregister index. I would expect overlapping tuples to use multiple bits of mask. e.g. AMDGPU has this:
hi16 :   L0000000000000001 EMPTY
lo16 :   L0000000000000002 EMPTY
sub0 :   L0000000000000003 EMPTY
sub0_sub1 :   L000000000000000F EMPTY
sub0_sub1_sub2 :   L000000000000003F EMPTY
...

For the following acronyms:

bl = bsub low 8 bits
bh = bsub high 8 bits
hh = hsub high 16 bits
sh = ssub high 32 bits
dh = dsub high 64 bits
qh = qsub high 128+ bits

Such that:

16-bit subregister 'hsub' <=> bl | bh
32-bit subregister 'ssub' <=> bl | bh | hh
..
128-bit subregister 'qsub' <=> bl | bh | hh | sh | dh
128+ bit subregister 'zsub' <=> bl | bh | hh | sh | dh | qh (z registers are scalable vector registers of 128-bits or more)

I would expect to have at least 6 regunits for: qh,dh,sh,hh,bh,bl to represent all addressable sub-registers in a single 128+ bit reg. At the moment it would add new regunits for tuples such that for 64-bit D register tuples (DD, DDD, DDDD) TableGen allocates the following regunits:

             sh,hh,bh,bl           sh,hh,bh,bl
  sh,hh,bh,bl           sh,hh,bh,bl
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  16                                         0

For 128-bit Q register tuples (QQ, QQQ, QQQQ) it would then create the following (additional) regunits:

                dh,sh,hh,bh,bl              dh,sh,hh,bh,bl
  dh,sh,hh,bh,bl              dh,sh,hh,bh,bl
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  20                                                     0

And then it would do a similar thing for Z register tuples (ZZ, ZZZ, ZZZZ). In total, this takes up (4 * 4) + (4 * 5) + (4 * 6) = 60 bits, which I agree is unnecessary.

The way I think we want to represent this, is as follows:

                          z2,q2,d2,qh,dh,sh,hh,bh,bl                          z0,q0,d0,qh,dh,sh,hh,bh,bl
z3,q3,d3,qh,dh,sh,hh,bh,bl                          z1,q1,d1,qh,dh,sh,hh,bh,bl

Which would only require 4 x 9 = 36 bits in total and would allow representing all tuples.

That said, I'm at wits end of how to represent this in TableGen, because with every try and turn I run into some TableGen assertion failure. I've tried using ComposedSubRegIndex to no avail. Not sure if there's some TableGen bugs I'm running into, or whether I'm just not describing this the right way using the existing constructs.

Could you give me some suggestions on what the right way is to represent it?

I do wonder if extending the number of bits for LaneBitmask is such a big problem in practice. I suspect at some point we'll need to have a wider bitmask anyway, given that all registers for a target use the same encoding space in lanebitmask. As was pointed out for AMDGPU it's already at the limit.

arsenm · 2024-10-07T14:42:04Z

That said, I'm at wits end of how to represent this in TableGen, because with every try and turn I run into some TableGen assertion failure. I've tried using ComposedSubRegIndex to no avail. Not sure if there's some TableGen bugs I'm running into, or whether I'm just not describing this the right way using the existing constructs.

Yes, tablegen isn't the best software

Could you give me some suggestions on what the right way is to represent it?

I think above you described it correctly, it's a matter of just making it work

I do wonder if extending the number of bits for LaneBitmask is such a big problem in practice. I suspect at some point we'll need to have a wider bitmask anyway, given that all registers for a target use the same encoding space in lanebitmask. As was pointed out for AMDGPU it's already at the limit.

This is used in the inner loops of everything involving register liveness, which is already the slowest part in the slowest compiles. I think increasing this should only be done as a last resort.

sdesmalen-arm · 2024-10-07T14:49:58Z

Could you give me some suggestions on what the right way is to represent it?

I think above you described it correctly, it's a matter of just making it work

Diving deeper into TableGen it is then!

sdesmalen-arm · 2024-10-18T14:17:51Z

I've found the right way to model this now, avoiding the need for >64 lanes, so will close this PR. Thanks for the feedback!

sdesmalen-arm requested review from jayfoad, arsenm and qcolombet October 4, 2024 13:51

llvmbot added backend:AArch64 backend:AMDGPU tablegen mc Machine (object) code labels Oct 4, 2024

jayfoad reviewed Oct 4, 2024

View reviewed changes

Fix lanebitmask operator lt

136d33b

topperc reviewed Oct 4, 2024

View reviewed changes

sdesmalen-arm closed this Oct 18, 2024

[CodeGen] Use 128bits for LaneBitmask. #111157

[CodeGen] Use 128bits for LaneBitmask. #111157

Uh oh!

Conversation

sdesmalen-arm commented Oct 4, 2024

Uh oh!

llvmbot commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Oct 4, 2024

Uh oh!

arsenm commented Oct 4, 2024

Uh oh!

arsenm commented Oct 4, 2024

Uh oh!

jayfoad Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

jayfoad Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

jayfoad Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

jayfoad commented Oct 4, 2024

Uh oh!

nikic commented Oct 4, 2024

Uh oh!

sdesmalen-arm commented Oct 4, 2024

Uh oh!

arsenm commented Oct 4, 2024

Uh oh!

sdesmalen-arm commented Oct 4, 2024

Uh oh!

sdesmalen-arm commented Oct 4, 2024

Uh oh!

arsenm commented Oct 4, 2024

Uh oh!

topperc Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm commented Oct 7, 2024

Uh oh!

arsenm commented Oct 7, 2024

Uh oh!

sdesmalen-arm commented Oct 7, 2024

Uh oh!

sdesmalen-arm commented Oct 18, 2024

Uh oh!

Uh oh!

llvmbot commented Oct 4, 2024 •

edited

Loading

github-actions bot commented Oct 4, 2024 •

edited

Loading