[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

sushgokh · 2024-11-29T06:37:11Z

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Tested the patch on Neoverse-v2 with SPEC17(C/C++). No regressions observed.

llvmbot · 2024-11-29T06:37:46Z

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-backend-powerpc

Author: Sushant Gokhale (sushgokh)

Changes

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Patch is 93.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118055.diff

29 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-5)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+7-5)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+2-1)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-5)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34-22)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+3-2)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+3-1)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+3-3)
(modified) llvm/test/Analysis/CostModel/AArch64/div.ll (+141-141)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/div.ll (+7-29)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 985ca1532e0149..50f1210e82f1e2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1297,13 +1297,16 @@ class TargetTransformInfo {
   /// provide even more information.
   /// \p TLibInfo is used to search for platform specific vector library
   /// functions for instructions that might be converted to calls (e.g. frem).
+  /// \p Scalars refers to individual scalars/instructions being used for
+  /// vectorization.
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty,
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
       ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
-      const TargetLibraryInfo *TLibInfo = nullptr) const;
+      const TargetLibraryInfo *TLibInfo = nullptr,
+      ArrayRef<Value *> Scalars = {}) const;
 
   /// Returns the cost estimation for alternating opcode pattern that can be
   /// lowered to a single instruction on the target. In X86 this is for the
@@ -2099,7 +2102,8 @@ class TargetTransformInfo::Concept {
   virtual InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr) = 0;
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) = 0;
   virtual InstructionCost getAltInstrCost(
       VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
       const SmallBitVector &OpcodeMask,
@@ -2780,10 +2784,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) override {
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) override {
     return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
-                                       Args, CxtI);
+                                       Args, CxtI, Scalars);
   }
   InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0,
                                   unsigned Opcode1,
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 38aba183f6a173..4aa255a909edbf 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -581,11 +581,13 @@ class TargetTransformInfoImplBase {
 
   unsigned getMaxInterleaveFactor(ElementCount VF) const { return 1; }
 
-  InstructionCost getArithmeticInstrCost(
-      unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
-      TTI::OperandValueInfo Opd1Info, TTI::OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) const {
+  InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty,
+                                         TTI::TargetCostKind CostKind,
+                                         TTI::OperandValueInfo Opd1Info,
+                                         TTI::OperandValueInfo Opd2Info,
+                                         ArrayRef<const Value *> Args,
+                                         const Instruction *CxtI = nullptr,
+                                         ArrayRef<Value *> Scalars = {}) const {
     // Widenable conditions will eventually lower into constants, so some
     // operations with them will be trivially optimized away.
     auto IsWidenableCondition = [](const Value *V) {
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 98cbb4886642bf..4a27470763db83 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -924,7 +924,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     // Check if any of the operands are vector operands.
     const TargetLoweringBase *TLI = getTLI();
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1fb2b9836de0cc..3f8745b1e06ecc 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -927,7 +927,7 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     OperandValueInfo Op1Info, OperandValueInfo Op2Info,
     ArrayRef<const Value *> Args, const Instruction *CxtI,
-    const TargetLibraryInfo *TLibInfo) const {
+    const TargetLibraryInfo *TLibInfo, ArrayRef<Value *> Scalars) const {
 
   // Use call cost for frem intructions that have platform specific vector math
   // functions, as those will be replaced with calls later by SelectionDAG or
@@ -942,10 +942,8 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
       return getCallInstrCost(nullptr, VecTy, {VecTy, VecTy}, CostKind);
   }
 
-  InstructionCost Cost =
-      TTIImpl->getArithmeticInstrCost(Opcode, Ty, CostKind,
-                                      Op1Info, Op2Info,
-                                      Args, CxtI);
+  InstructionCost Cost = TTIImpl->getArithmeticInstrCost(
+      Opcode, Ty, CostKind, Op1Info, Op2Info, Args, CxtI, Scalars);
   assert(Cost >= 0 && "TTI should not produce negative costs!");
   return Cost;
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d1536a276a9040..f9434b1691d746 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3376,8 +3376,8 @@ InstructionCost AArch64TTIImpl::getScalarizationOverhead(
 InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // The code-generator is currently not able to handle scalable vectors
   // of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
@@ -3442,8 +3442,8 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!VT.isVector() && VT.getSizeInBits() > 64)
       return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
 
-    InstructionCost Cost = BaseT::getArithmeticInstrCost(
-        Opcode, Ty, CostKind, Op1Info, Op2Info);
+    InstructionCost Cost =
+        BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);
     if (Ty->isVectorTy()) {
       if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
         // SDIV/UDIV operations are lowered using SVE, then we can have less
@@ -3472,29 +3472,41 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
-        // If one of the operands is a uniform constant then the cost for each
-        // element is Cost for insertion, extraction and division.
-        // Insertion cost = 2, Extraction Cost = 2, Division = cost for the
-        // operation with scalar type
-        if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
-            (Op2Info.isConstant() && Op2Info.isUniform())) {
-          if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+          if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
+              (Op2Info.isConstant() && Op2Info.isUniform())) {
             InstructionCost DivCost = BaseT::getArithmeticInstrCost(
                 Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info);
-            return (4 + DivCost) * VTy->getNumElements();
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost + DivCost) * VTy->getNumElements();
+          } else if (!Scalars.empty()) {
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+            for (auto *V : Scalars) {
+              auto *I = cast<Instruction>(V);
+              Cost +=
+                  getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind,
+                                         TTI::getOperandInfo(I->getOperand(0)),
+                                         TTI::getOperandInfo(I->getOperand(1)));
+            }
+          } else {
+            // FIXME: The initial cost calculated should have considered extract
+            // cost twice. For now, we just add additional cost to avoid
+            // underestimating the total cost.
+            Cost += Cost;
           }
+        } else {
+          // We can't predict the cost of div/extract/insert without knowing the
+          // vector width.
+          Cost.setInvalid();
         }
-        // On AArch64, without SVE, vector divisions are expanded
-        // into scalar divisions of each pair of elements.
-        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
-                                       CostKind, Op1Info, Op2Info);
-        Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
-                                       Op1Info, Op2Info);
       }
-
-      // TODO: if one of the arguments is scalar, then it's not necessary to
-      // double the cost of handling the vector elements.
-      Cost += Cost;
     }
     return Cost;
   }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 201bc831b816b3..8845efd241f8f0 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -218,7 +218,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
                                             const SCEV *Ptr);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 5160851f8c4424..e1d718aa8a3509 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -546,8 +546,8 @@ bool GCNTTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst,
 InstructionCost GCNTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 10956861650ab3..414e4b8f7284a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -157,7 +157,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
                                  const Instruction *I = nullptr);
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 0e29648a7a284f..9bf37d8457b02f 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -10,6 +10,7 @@
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/ADT/APInt.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/CodeGen/CostTable.h"
@@ -1349,8 +1350,8 @@ InstructionCost ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 InstructionCost ARMTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
   if (ST->isThumb() && CostKind == TTI::TCK_CodeSize && Ty->isIntegerTy(1)) {
     // Make operations on i1 relatively expensive as this often involves
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 3a4f940088b2e3..bef278e211a418 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -258,7 +258,8 @@ class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost
   getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
diff --git a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
index bf0bef3a2b2f98..ca52bd375021c2 100644
--- a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
+++ b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
@@ -16,6 +16,7 @@
 #define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H
 
 #include "BPFTargetMachine.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
@@ -61,7 +62,8 @@ class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
     if (ISD == ISD::ADD && CostKind == TTI::TCK_RecipThroughput)
       return SCEVCheapExpansionBudget.getValue() + 1;
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
index bbb9d065b62435..53572321ecec62 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
@@ -273,8 +273,8 @@ InstructionCost HexagonTTIImpl::getCmpSelInstrCost(
 InstructionCost HexagonTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // TODO: Handle more cost kinds.
   if (CostKind != TTI::TCK_RecipThroughput)
     return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
index 826644d08d1ac0..abaa25136574ae 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
@@ -142,7 +142,8 @@ class HexagonTTIImpl : public BasicTTIImplBase<HexagonTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
   InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
                                    TTI::CastContextHint CCH,
                                    TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
index 5fe63e4a2e0312..5000a93a818b0a 100644
--- a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
+++ b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
@@ -94,7 +94,8 @@ class LanaiTTIImpl : public BasicTTIImplBase<LanaiTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
 
     switch (ISD) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index 4ec2ec100ab08d..2c194d9fff1ad4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -485,8 +485,8 @@ NVPTXTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
 InstructionCost NVPTXTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index 0f4fb280b2d996..3d34f8d97ddc47 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -98,7 +98,8 @@ class NVPTXTTIImpl : public BasicTTIImplBase<NVPTXTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKi...
[truncated]

llvmbot · 2024-11-29T06:37:47Z

@llvm/pr-subscribers-backend-aarch64

Author: Sushant Gokhale (sushgokh)

Changes

…vision

In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases.

For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

Patch is 93.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118055.diff

29 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+9-5)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+7-5)
(modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+2-1)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+3-5)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34-22)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+3-2)
(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/BPF/BPFTargetTransformInfo.h (+3-1)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2-1)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+2-2)
(modified) llvm/lib/Target/X86/X86TargetTransformInfo.h (+2-1)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+3-3)
(modified) llvm/test/Analysis/CostModel/AArch64/div.ll (+141-141)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/div.ll (+7-29)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 985ca1532e0149..50f1210e82f1e2 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1297,13 +1297,16 @@ class TargetTransformInfo {
   /// provide even more information.
   /// \p TLibInfo is used to search for platform specific vector library
   /// functions for instructions that might be converted to calls (e.g. frem).
+  /// \p Scalars refers to individual scalars/instructions being used for
+  /// vectorization.
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty,
       TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
       ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
-      const TargetLibraryInfo *TLibInfo = nullptr) const;
+      const TargetLibraryInfo *TLibInfo = nullptr,
+      ArrayRef<Value *> Scalars = {}) const;
 
   /// Returns the cost estimation for alternating opcode pattern that can be
   /// lowered to a single instruction on the target. In X86 this is for the
@@ -2099,7 +2102,8 @@ class TargetTransformInfo::Concept {
   virtual InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr) = 0;
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) = 0;
   virtual InstructionCost getAltInstrCost(
       VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
       const SmallBitVector &OpcodeMask,
@@ -2780,10 +2784,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
   InstructionCost getArithmeticInstrCost(
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       OperandValueInfo Opd1Info, OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) override {
+      ArrayRef<const Value *> Args, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) override {
     return Impl.getArithmeticInstrCost(Opcode, Ty, CostKind, Opd1Info, Opd2Info,
-                                       Args, CxtI);
+                                       Args, CxtI, Scalars);
   }
   InstructionCost getAltInstrCost(VectorType *VecTy, unsigned Opcode0,
                                   unsigned Opcode1,
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 38aba183f6a173..4aa255a909edbf 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -581,11 +581,13 @@ class TargetTransformInfoImplBase {
 
   unsigned getMaxInterleaveFactor(ElementCount VF) const { return 1; }
 
-  InstructionCost getArithmeticInstrCost(
-      unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
-      TTI::OperandValueInfo Opd1Info, TTI::OperandValueInfo Opd2Info,
-      ArrayRef<const Value *> Args,
-      const Instruction *CxtI = nullptr) const {
+  InstructionCost getArithmeticInstrCost(unsigned Opcode, Type *Ty,
+                                         TTI::TargetCostKind CostKind,
+                                         TTI::OperandValueInfo Opd1Info,
+                                         TTI::OperandValueInfo Opd2Info,
+                                         ArrayRef<const Value *> Args,
+                                         const Instruction *CxtI = nullptr,
+                                         ArrayRef<Value *> Scalars = {}) const {
     // Widenable conditions will eventually lower into constants, so some
     // operations with them will be trivially optimized away.
     auto IsWidenableCondition = [](const Value *V) {
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 98cbb4886642bf..4a27470763db83 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -924,7 +924,8 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Opd1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Opd2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     // Check if any of the operands are vector operands.
     const TargetLoweringBase *TLI = getTLI();
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1fb2b9836de0cc..3f8745b1e06ecc 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -927,7 +927,7 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     OperandValueInfo Op1Info, OperandValueInfo Op2Info,
     ArrayRef<const Value *> Args, const Instruction *CxtI,
-    const TargetLibraryInfo *TLibInfo) const {
+    const TargetLibraryInfo *TLibInfo, ArrayRef<Value *> Scalars) const {
 
   // Use call cost for frem intructions that have platform specific vector math
   // functions, as those will be replaced with calls later by SelectionDAG or
@@ -942,10 +942,8 @@ InstructionCost TargetTransformInfo::getArithmeticInstrCost(
       return getCallInstrCost(nullptr, VecTy, {VecTy, VecTy}, CostKind);
   }
 
-  InstructionCost Cost =
-      TTIImpl->getArithmeticInstrCost(Opcode, Ty, CostKind,
-                                      Op1Info, Op2Info,
-                                      Args, CxtI);
+  InstructionCost Cost = TTIImpl->getArithmeticInstrCost(
+      Opcode, Ty, CostKind, Op1Info, Op2Info, Args, CxtI, Scalars);
   assert(Cost >= 0 && "TTI should not produce negative costs!");
   return Cost;
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d1536a276a9040..f9434b1691d746 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -3376,8 +3376,8 @@ InstructionCost AArch64TTIImpl::getScalarizationOverhead(
 InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // The code-generator is currently not able to handle scalable vectors
   // of <vscale x 1 x eltty> yet, so return an invalid cost to avoid selecting
@@ -3442,8 +3442,8 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!VT.isVector() && VT.getSizeInBits() > 64)
       return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
 
-    InstructionCost Cost = BaseT::getArithmeticInstrCost(
-        Opcode, Ty, CostKind, Op1Info, Op2Info);
+    InstructionCost Cost =
+        BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);
     if (Ty->isVectorTy()) {
       if (TLI->isOperationLegalOrCustom(ISD, LT.second) && ST->hasSVE()) {
         // SDIV/UDIV operations are lowered using SVE, then we can have less
@@ -3472,29 +3472,41 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
           Cost *= 4;
         return Cost;
       } else {
-        // If one of the operands is a uniform constant then the cost for each
-        // element is Cost for insertion, extraction and division.
-        // Insertion cost = 2, Extraction Cost = 2, Division = cost for the
-        // operation with scalar type
-        if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
-            (Op2Info.isConstant() && Op2Info.isUniform())) {
-          if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty)) {
+          if ((Op1Info.isConstant() && Op1Info.isUniform()) ||
+              (Op2Info.isConstant() && Op2Info.isUniform())) {
             InstructionCost DivCost = BaseT::getArithmeticInstrCost(
                 Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info);
-            return (4 + DivCost) * VTy->getNumElements();
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost + DivCost) * VTy->getNumElements();
+          } else if (!Scalars.empty()) {
+            // If #vector_elements = n then we need
+            // n inserts + 2n extracts + n divisions.
+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+            for (auto *V : Scalars) {
+              auto *I = cast<Instruction>(V);
+              Cost +=
+                  getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind,
+                                         TTI::getOperandInfo(I->getOperand(0)),
+                                         TTI::getOperandInfo(I->getOperand(1)));
+            }
+          } else {
+            // FIXME: The initial cost calculated should have considered extract
+            // cost twice. For now, we just add additional cost to avoid
+            // underestimating the total cost.
+            Cost += Cost;
           }
+        } else {
+          // We can't predict the cost of div/extract/insert without knowing the
+          // vector width.
+          Cost.setInvalid();
         }
-        // On AArch64, without SVE, vector divisions are expanded
-        // into scalar divisions of each pair of elements.
-        Cost += getArithmeticInstrCost(Instruction::ExtractElement, Ty,
-                                       CostKind, Op1Info, Op2Info);
-        Cost += getArithmeticInstrCost(Instruction::InsertElement, Ty, CostKind,
-                                       Op1Info, Op2Info);
       }
-
-      // TODO: if one of the arguments is scalar, then it's not necessary to
-      // double the cost of handling the vector elements.
-      Cost += Cost;
     }
     return Cost;
   }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 201bc831b816b3..8845efd241f8f0 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -218,7 +218,8 @@ class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getAddressComputationCost(Type *Ty, ScalarEvolution *SE,
                                             const SCEV *Ptr);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 5160851f8c4424..e1d718aa8a3509 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -546,8 +546,8 @@ bool GCNTTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst,
 InstructionCost GCNTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
 
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 10956861650ab3..414e4b8f7284a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -157,7 +157,8 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
                                  const Instruction *I = nullptr);
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index 0e29648a7a284f..9bf37d8457b02f 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -10,6 +10,7 @@
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/ADT/APInt.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/CodeGen/CostTable.h"
@@ -1349,8 +1350,8 @@ InstructionCost ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
 InstructionCost ARMTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
   if (ST->isThumb() && CostKind == TTI::TCK_CodeSize && Ty->isIntegerTy(1)) {
     // Make operations on i1 relatively expensive as this often involves
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
index 3a4f940088b2e3..bef278e211a418 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.h
@@ -258,7 +258,8 @@ class ARMTTIImpl : public BasicTTIImplBase<ARMTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
 
   InstructionCost
   getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
diff --git a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
index bf0bef3a2b2f98..ca52bd375021c2 100644
--- a/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
+++ b/llvm/lib/Target/BPF/BPFTargetTransformInfo.h
@@ -16,6 +16,7 @@
 #define LLVM_LIB_TARGET_BPF_BPFTARGETTRANSFORMINFO_H
 
 #include "BPFTargetMachine.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
@@ -61,7 +62,8 @@ class BPFTTIImpl : public BasicTTIImplBase<BPFTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
     if (ISD == ISD::ADD && CostKind == TTI::TCK_RecipThroughput)
       return SCEVCheapExpansionBudget.getValue() + 1;
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
index bbb9d065b62435..53572321ecec62 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp
@@ -273,8 +273,8 @@ InstructionCost HexagonTTIImpl::getCmpSelInstrCost(
 InstructionCost HexagonTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // TODO: Handle more cost kinds.
   if (CostKind != TTI::TCK_RecipThroughput)
     return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
diff --git a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
index 826644d08d1ac0..abaa25136574ae 100644
--- a/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
+++ b/llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h
@@ -142,7 +142,8 @@ class HexagonTTIImpl : public BasicTTIImplBase<HexagonTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr);
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {});
   InstructionCost getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src,
                                    TTI::CastContextHint CCH,
                                    TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
index 5fe63e4a2e0312..5000a93a818b0a 100644
--- a/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
+++ b/llvm/lib/Target/Lanai/LanaiTargetTransformInfo.h
@@ -94,7 +94,8 @@ class LanaiTTIImpl : public BasicTTIImplBase<LanaiTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
       TTI::OperandValueInfo Op1Info = {TTI::OK_AnyValue, TTI::OP_None},
       TTI::OperandValueInfo Op2Info = {TTI::OK_AnyValue, TTI::OP_None},
-      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr) {
+      ArrayRef<const Value *> Args = {}, const Instruction *CxtI = nullptr,
+      ArrayRef<Value *> Scalars = {}) {
     int ISD = TLI->InstructionOpcodeToISD(Opcode);
 
     switch (ISD) {
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
index 4ec2ec100ab08d..2c194d9fff1ad4 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
@@ -485,8 +485,8 @@ NVPTXTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
 InstructionCost NVPTXTTIImpl::getArithmeticInstrCost(
     unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
     TTI::OperandValueInfo Op1Info, TTI::OperandValueInfo Op2Info,
-    ArrayRef<const Value *> Args,
-    const Instruction *CxtI) {
+    ArrayRef<const Value *> Args, const Instruction *CxtI,
+    ArrayRef<Value *> Scalars) {
   // Legalize the type.
   std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
 
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
index 0f4fb280b2d996..3d34f8d97ddc47 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h
@@ -98,7 +98,8 @@ class NVPTXTTIImpl : public BasicTTIImplBase<NVPTXTTIImpl> {
       unsigned Opcode, Type *Ty, TTI::TargetCostKi...
[truncated]

david-arm · 2024-11-29T13:49:33Z

llvm/test/Analysis/CostModel/AArch64/div.ll

@@ -7,21 +7,21 @@ define i32 @sdiv() {
 ; CHECK-LABEL: 'sdiv'
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %I128 = sdiv i128 undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I64 = sdiv i64 undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 28 for instruction: %V2i64 = sdiv <2 x i64> undef, undef


It feels like this patch is doing two things at once:

Changing the costs of divides when not passing in the new Scalars parameter,

Changing the costs of divides when passing in a non-null Scalars.

It's probably worth splitting the changes up into two different PRs so we can see review them independently. It also allows us to monitor the performance impact of each change on it's own.

changes in llvm/test/Analysis/CostModel/AArch64/div.ll are not completely orthogonal to my patch (or (1) is not completely orthogonal to (2) ).

This piece of code needs change for following reasons:

Now that I have extra information about the scalars and information about the scalar type can be passed i.e. instead of passing Op1Info and Op2Info, I need to pass information about the scalar. I forgot to update this. Will update the patch.

The extract cost, which contributes to the entire cost, is considered once instead of twice here which is not correct I think.

In short, updating this small piece of code would cause changes in llvm/test/Analysis/CostModel/AArch64/div.ll

The perf results, as mentioned in the description, are neutral.

@david-arm segregated two things as you mentioned.

alexey-bataev · 2024-11-29T13:52:10Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+            InstructionCost InsertExtractCost =
+                ST->getVectorInsertExtractBaseCost();
+            Cost = (3 * InsertExtractCost + DivCost) * VTy->getNumElements();
+          } else if (!Scalars.empty()) {


Why Args is not enough for this analysis?

I initially tried passing scalars through Args but this is crashing one of the tests. Didnt debug this.

Need to investigate this rather than adding some extra args

sushgokh · 2024-12-12T05:05:54Z

ping

alexey-bataev · 2024-12-12T11:54:15Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

-             CommonCost;
+      return CommonCost +
+             TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info,
+                                         Op2Info, E->Scalars, nullptr, TLI);


This is incorrect, it should pass operands, not the original scalars

But, how does it matter if we pass scalars instead of its operands here?

It breaks the the API contract

alexey-bataev · 2024-12-12T18:11:26Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+        return I->getType()->getPrimitiveSizeInBits() >= 32;
+      };
+      if (all_of(E->Scalars, [ShuffleOrOp, IsAllowedScalarTy](Value *V) {
+            return !IsaPred<UndefValue, PoisonValue>(V) &&


return !isa<UndefValue, PoisonValue>(V) &&

alexey-bataev · 2024-12-12T18:12:13Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+      auto IsAllowedScalarTy = [](const Value *I) {
+        // Just to avoid regression with RISCV unittest.
+        return I->getType()->getPrimitiveSizeInBits() >= 32;
+      };


Hacks are the bad way.

These Args are being used by the RISCV target for calculating the cast cost. To avoid miscalculation with the patch, had to use this hack.
If we dont want this, we need to introduce other parameter. There's no way out here.

Or adjust RISCVTargetTransformInfo.cpp to handle it correctly, if there is something wrong, or update RISCV tests, if everything is fine

adjusting RISCVTargetTransformInfo.cpp or updating the test & justifying the results is beyond my scope . w/o patch, casting is done only for scalar. w/ patch, its being done for all operands of the all the scalars.

adjusting RISCVTargetTransformInfo.cpp or updating the test & justifying the results is beyond my scope . w/o patch, casting is done only for scalar. w/ patch, its being done for all operands of the all the scalars.

In upstream you should care about all targets, you cannot introduce hacks just for your target. You can do this in your downstream version.

Thats the reason I wanted to introduce one more API parameter

I have tried to debug the RISCV backend for the failing test SLPVectorizer/RISCV/strided-loads-vectorized.ll and looking at the comment
// f16 with zvfhmin and bf16 will be promoted to f32., calculating the cast cost for the args seems a valid choice which isnt happening currently.

But if this is incorrect, could you please help me to avoid this regression?

alexey-bataev · 2024-12-16T19:27:41Z

Do not use force push

alexey-bataev · 2024-12-16T19:35:15Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+        // If the information about individual scalars being vectorized is
+        // available, this yeilds better cost estimation.
+        if (auto *VTy = dyn_cast<FixedVectorType>(Ty); VTy && !Args.empty()) {
+          InstructionCost InsertExtractCost =
+              ST->getVectorInsertExtractBaseCost();
+          Cost = (3 * InsertExtractCost) * VTy->getNumElements();
+          for (int i = 0, Sz = Args.size(); i < Sz; i += 2) {
+            Cost += getArithmeticInstrCost(
+                Opcode, VTy->getScalarType(), CostKind,
+                TTI::getOperandInfo(Args[i]), TTI::getOperandInfo(Args[i + 1]));
+          }
+          return Cost;
+        }


I don't understand what are you trying to do here. Why are you adding the cost of the scalar ops here? They are already counted in the cost model

I don't understand what are you trying to do here. Why are you adding the cost of the scalar ops here? They are already counted in the cost model

AArch64 Neon does not support vector division natively. It scalarizes the vector division as can be seen here.

Now coming to the cost modelling part: This code section is calculating the cost of scalarizing the vector division. Thus,

vector cost = for each lane(cost of extract for both div operands) + for each lane(cost of scalar division) + for each lane(cost of insert into the result vector)

This vector cost was being over-calculated thus preferring scalar code. This patch just tries to rectify this.

Then, you should not build the vector node at all. I would suggest instead that you check how the getScalarsVectorizationState function in SLP works, add a check for NEON division (maybe add a new entry in TTI to check if the vector operation is legal and won't be scalarized), and return that for NEON, it should build the TreeEntry::NeedToGather node.

Then, you should not build the vector node at all. I would suggest instead that you check how the getScalarsVectorizationState function in SLP works, add a check for NEON division (maybe add a new entry in TTI to check if the vector operation is legal and won't be scalarized), and return that for NEON, it should build the TreeEntry::NeedToGather node.

Two issues which I can think of with this approach:

The operands of the div operation wont get vectorized.

Lot of checks are in place that alter the div cost based on information about its operands(e.g. if operands are uniform/constant/pow-of-2 etc.). Hence, factoring out code just to check if the div operation would be scalarized is difficult and would result in duplication of code.

That's fine. If you think they should be vectorized, a new vectorization strategy should be implemented instead.

No, it is just a simple legality check. If the instruction should be scalarized, it should be built into a build vector node; it should not be vectorized.

That's fine. If you think they should be vectorized, a new vectorization strategy should be implemented instead.

Don't you think lowering the division vector in some late IR pass is easy than stopping vectorization at div ? One more advantage of lowering later is lowering is transparent to the user and done at single place rather than at multiple places.
This type of vector lowering is already being done at other places e.g. masked_load

2. No, it is just a simple legality check. If the instruction should be scalarized, it should be built into a build vector node; it should not be vectorized.

Just need a clarification: my understanding of TreeEntry::NeedToGather is you build a vector out of scalars. Is that correct?
Now, coming to the legality checks. Yes, in the end, its legality check but I need to check all the contraints, which are currently in getArithmeticInstrCost once again. And I am trying to avoid that route just for reasons mentioned in my comment on (1).

That's fine. If you think they should be vectorized, a new vectorization strategy should be implemented instead.

Don't you think lowering the division vector in some late IR pass is easy than stopping vectorization at div ? One more advantage of lowering later is lowering is transparent to the user and done at single place rather than at multiple places. This type of vector lowering is already being done at other places e.g. masked_load

It just breaks the vectorizer design. If the node should be scalarized, the vectorization model should know about it and scalarize it. Otherwise, it complicates the whole vectorization process and makes it miss some possible vectorization opportunities. I assume, you missed the cost of extraction of the operands in your cost model changes?

No, it is just a simple legality check. If the instruction should be scalarized, it should be built into a build vector node; it should not be vectorized.

Just need a clarification: my understanding of TreeEntry::NeedToGather is you build a vector out of scalars. Is that correct? Now, coming to the legality checks. Yes, in the end, its legality check but I need to check all the contraints, which are currently in getArithmeticInstrCost once again. And I am trying to avoid that route just for reasons mentioned in my comment on (1).

You can check these constraints before and command vectorizer to build a NeedToGather node. You should not use getArithmeticInstrCost in this case, some other function should check if it is legal to vectorize these scalars here

Otherwise, it complicates the whole vectorization process and makes it miss some possible vectorization opportunities. I assume, you missed the cost of extraction of the operands in your cost model changes?

No, in fact the patch is accurately calculating the cost of:

for every lane: 2 extracts and 1 div and 1 insert See Cost = (3 * InsertExtractCost) * VTy->getNumElements(); // Insert/Extract cost is same

So, no way it is missing out on opportunities and by no means its complicating the process.

sushgokh · 2025-01-03T12:07:33Z

ping

davemgreen · 2025-01-08T22:55:40Z

I happened to be looking at urem costs recently - not exactly related to this but led to a similar area. I was only going to update the urem by constant values but the sdiv costs are similar.

I'm not sure this needs to alter the SLP vectorizer for the example tests. So long as all the second operands are constant or power-of-two it should be expanding to a not use a div (and not scalarize). I think we can update this bit of code:

llvm-project/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Line 3495 in 2b26ee6

if (Op2Info.isConstant() && Op2Info.isUniform() && Op2Info.isPowerOf2()) {

It probably should not matter that the constant is uniform.
It should handle negative-powers of 2 too for sdiv.
The cost should use Add + ICmp + Sel + AShr. (The sel goes through the wrong method).
We can maybe add SVE asrd but that can happen in a later patch.
I believe vector types have the same expansion and costs as scalars.
srem has a similar cost to sdiv for these values, but that should probably be added later too, as there is more than just power-2 values.

davemgreen · 2025-01-09T09:18:41Z

The urem patch I mentioned is now in #122236.

sushgokh · 2025-01-10T13:07:17Z

The urem patch I mentioned is now in #122236.

Thanks for the patch. This serves as direction for sdiv/srem and will take this up next.

Although it resolves the motivating case, it does not resolve the generic case where Op2 is unknown. Consider below example(motivating example modified with sdiv->udiv and divisor changed to unknowns)

define <2 x i32> @vectorize_udiv_v2i32(<2 x i32> %a, <2 x i32> %x, <2 x i32> %y, <2 x i32> %z)
{
  %a0 = extractelement <2 x i32> %a, i64 0
  %a1 = extractelement <2 x i32> %a, i64 1
  %1 = udiv i32 %a0, %a1
  %2 = udiv i32 %a1, %a0
  %x0 = extractelement <2 x i32> %x, i64 0
  %x1 = extractelement <2 x i32> %x, i64 1
  %3 = add i32 %1, %x0
  %4 = add i32 %2, %x1
  %y0 = extractelement <2 x i32> %y, i64 0
  %y1 = extractelement <2 x i32> %y, i64 1
  %5 = sub i32 %3, %y0
  %6 = sub i32 %4, %y1
  %z0 = extractelement <2 x i32> %z, i64 0
  %z1 = extractelement <2 x i32> %z, i64 1
  %7 = mul i32 %5, %z0
  %8 = mul i32 %6, %z1
  %res0 = insertelement <2 x i32> poison, i32 %7, i32 0
  %res1 = insertelement <2 x i32> %res0, i32 %8, i32 1
  ret <2 x i32> %res1
}

urem patch fails to vectorize this while this patch vectorizes this.

In summary, this patch is necessary as it refines the scalar cost.

Last thing: I will update the motivating case as above.

This patch updates the motivating test for the above PR so that it does not conflict with urem PR #122236

…vision In the backend, last resort of finding the vector division cost is to use its scalar cost. However, without knowledge about the division operands, the cost can be off in certain cases. For SLP, this patch tries to pass scalars for better scalar cost estimation in the backend.

sushgokh · 2025-01-23T15:14:30Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

@@ -18,6 +18,7 @@
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/CodeGen/CostTable.h"
 #include "llvm/CodeGen/TargetLowering.h"
+#include "llvm/IR/Constants.h"


will remove this in next version. Got this accidently

sushgokh · 2025-01-23T15:15:24Z

ping

alexey-bataev · 2025-01-23T15:17:21Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+          if (Args.size() == 2 &&
+              !(isa<ConstantVector>(Args[1]) ||
+                isa<ConstantDataVector>(Args[1]) ||
+                isa<ConstantExpr>(Args[1])) &&


I assume you do not need to pass args for this, instead you can rely on Op1Info and Op2Info

for the test where Op1 only or Op2 only is constant(see the added tests), simply relying on Op1Info and Op2Info does not work. In such cases, we need to pass args

the cost of the division when the divisor is constant is less than the case where the divisor is unknown.

When we are considering scalarizing cost, we are considering div cost of each lane and additional insert/extract cost. This is where this patch yeilds less cost with extra knowledge about the values that go into that lanes

Op2Info provides info about constantness of the divisor

Also, I told this before already, if the node is scalarized, it must be represented as a buildvector node.

Please check the case @sdiv_v2i32_Op1_unknown_Op2_const where only one of the divisors is constant. In that case, Op2Info would be something generic and doesnt give sufficient information. I hope this makes sense

Also, I told this before already, if the node is scalarized, it must be represented as a buildvector node.

I have tried to justify this why this wont be a better option already

alexey-bataev

Scalarized nodes must be represented as buildvectors. If the operands should be vectorized, need to add special processing for such operands. Vectorizing must be scalarized nodes comes against design of the vectorizer.

sushgokh requested review from madhur13490, davemgreen, sjoerdmeijer and david-arm November 29, 2024 06:37

llvmbot added backend:ARM backend:AArch64 backend:AMDGPU backend:Hexagon backend:RISC-V backend:PowerPC backend:SystemZ backend:WebAssembly backend:X86 vectorizers backend:NVPTX llvm:analysis llvm:transforms labels Nov 29, 2024

david-arm reviewed Nov 29, 2024

View reviewed changes

alexey-bataev reviewed Nov 29, 2024

View reviewed changes

sushgokh force-pushed the alter-div-cost branch from 37a5288 to de74fc6 Compare December 4, 2024 13:20

alexey-bataev reviewed Dec 12, 2024

View reviewed changes

sushgokh force-pushed the alter-div-cost branch from de74fc6 to 6596bc9 Compare December 12, 2024 17:19

alexey-bataev reviewed Dec 12, 2024

View reviewed changes

sushgokh force-pushed the alter-div-cost branch from 6596bc9 to 4f6d01d Compare December 13, 2024 05:50

alexey-bataev requested changes Dec 16, 2024

View reviewed changes

sushgokh added a commit to sushgokh/llvm-project that referenced this pull request Jan 13, 2025

[NFC] Update test for PR llvm#118055

894d877

sushgokh added a commit to sushgokh/llvm-project that referenced this pull request Jan 13, 2025

[NFC] Update test for PR llvm#118055

af5a289

sushgokh added a commit to sushgokh/llvm-project that referenced this pull request Jan 13, 2025

[NFC] Update test for PR llvm#118055

eb43bdf

sushgokh added a commit to sushgokh/llvm-project that referenced this pull request Jan 15, 2025

[NFC] Update test for PR llvm#118055

9effba7

sushgokh added a commit to sushgokh/llvm-project that referenced this pull request Jan 22, 2025

[NFC] Update test for PR llvm#118055

653c51c

sushgokh added a commit that referenced this pull request Jan 22, 2025

[SLP][NFC] Update test for PR #118055 (#122696)

c6c6475

This patch updates the motivating test for the above PR so that it does not conflict with urem PR #122236

sushgokh added 2 commits January 23, 2025 17:13

Rebase to update the motivating case, add checks to prevent test failure

1bdc03a

sushgokh force-pushed the alter-div-cost branch from 4f6d01d to 1bdc03a Compare January 23, 2025 15:12

sushgokh commented Jan 23, 2025

View reviewed changes

sushgokh removed backend:ARM backend:AMDGPU backend:Hexagon backend:RISC-V backend:PowerPC backend:SystemZ backend:WebAssembly backend:X86 backend:NVPTX llvm:analysis labels Jan 23, 2025

alexey-bataev reviewed Jan 23, 2025

View reviewed changes

alexey-bataev requested changes Jan 23, 2025

View reviewed changes

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

Are you sure you want to change the base?

[AArch64][CostModel] Improve cost estimate of scalarizing a vector di… #118055

Uh oh!

Conversation

sushgokh commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh commented Dec 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-bataev commented Dec 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sushgokh commented Jan 3, 2025

Uh oh!

davemgreen commented Jan 8, 2025

Uh oh!

davemgreen commented Jan 9, 2025

sushgokh commented Nov 29, 2024 •

edited

Loading

llvmbot commented Nov 29, 2024 •

edited

Loading

sushgokh Dec 12, 2024 •

edited

Loading

sushgokh Dec 13, 2024 •

edited

Loading