[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437

zhaoqi5 · 2024-12-03T07:13:18Z

Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:

getCacheLineSize(): 64 for loongarch64.
getPrefetchDistance(): After testing SPEC CPU 2017, improvements
taken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
enableWritePrefetching(): store prefetch is supported by LoongArch,
so set WritePrefetching to true in default.
getMinPrefetchStride() and getMaxPrefetchIterationsAhead() still
use default values: 1 and UINT_MAX, so not override them.

After this commit, the test added by https://reviews.llvm.org/D146600 can
generate llvm.prefetch intrinsic IR correctly.

Results of spec2017rate benchmarks (testing date: ref, copies: 1):

For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can
bring about -1.58%/0.31%/0.07% performance improvement for int benchmarks
and 3.26%/3.73%/3.78% improvement for floating point benchmarks. (Only
O3+novec+prefetch decreases when testing intrate.)
But prefetch results in performance reduction almost for every Fortran
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.

FIXME: Keep loongarch-enable-loop-data-prefetch option default to false
for now due to the bad effect for Fortran.

llvmbot · 2024-12-03T07:13:50Z

@llvm/pr-subscribers-backend-loongarch

@llvm/pr-subscribers-llvm-transforms

Author: ZhaoQi (zhaoqi5)

Changes

Inspired by https://reviews.llvm.org/D146600, this commit adds some TTI hooks for LoongArch to make LoopDataPrefetch pass really work. Including:

getCacheLineSize(): 64 for loongarch64.
getPrefetchDistance(): After testing SPEC CPU 2017, improvements taken by prefetching are more obvious when set PrefetchDistance to 200(results shown blow), although different benchmarks fit for different best choice.
enableWritePrefetching(): store prefetch is supported by LoongArch, so set WritePrefetching to true in default.
getMinPrefetchStride() and getMaxPrefetchIterationsAhead() still use default values: 1 and UINT_MAX, so not override them.

After this commit, the test added by https://reviews.llvm.org/D146600 can generate llvm.prefetch intrinsic IR correctly.

TODO: SPEC CPU 2017 is retesting, results waiting for add here.

TODO: Set loongarch-enable-loop-data-prefetch option default to true.

Full diff: https://github.com/llvm/llvm-project/pull/118437.diff

3 Files Affected:

(modified) llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp (+6)
(modified) llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h (+4)
(modified) llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll (+27-6)

diff --git a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
index 5fbc7c734168d1..cbc9c3f3beca00 100644
--- a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
@@ -89,4 +89,10 @@ LoongArchTTIImpl::getPopcntSupport(unsigned TyWidth) {
   return ST->hasExtLSX() ? TTI::PSK_FastHardware : TTI::PSK_Software;
 }
 
+unsigned LoongArchTTIImpl::getCacheLineSize() const { return 64; }
+
+unsigned LoongArchTTIImpl::getPrefetchDistance() const { return 200; }
+
+bool LoongArchTTIImpl::enableWritePrefetching() const { return true; }
+
 // TODO: Implement more hooks to provide TTI machinery for LoongArch.
diff --git a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
index f7ce75173be203..b3edf131c584c4 100644
--- a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
+++ b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
@@ -47,6 +47,10 @@ class LoongArchTTIImpl : public BasicTTIImplBase<LoongArchTTIImpl> {
   const char *getRegisterClassName(unsigned ClassID) const;
   TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
 
+  unsigned getCacheLineSize() const override;
+  unsigned getPrefetchDistance() const override;
+  bool enableWritePrefetching() const override;
+
   // TODO: Implement more hooks to provide TTI machinery for LoongArch.
 };
 
diff --git a/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll b/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
index 8553171ac68ac9..0313bbd8832876 100644
--- a/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
+++ b/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
@@ -1,16 +1,38 @@
-;; Tag this 'XFAIL' because we need a few more TTIs and ISels.
-; XFAIL: *
-; RUN: opt --mtriple=loongarch64 -mattr=+d --passes=loop-data-prefetch -loongarch-enable-loop-data-prefetch -S < %s | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt --mtriple=loongarch64 -mattr=+d --passes=loop-data-prefetch -S < %s | FileCheck %s
 
 define void @foo(ptr %a, ptr %b) {
+; CHECK-LABEL: define void @foo(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK:       [[FOR_BODY]]:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
+; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[TMP0]], 200
+; CHECK-NEXT:    [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
+; CHECK-NEXT:    [[TMP3:%.*]] = add i64 [[TMP2]], 200
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[B]], i64 [[TMP3]]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[INDVARS_IV]]
+; CHECK-NEXT:    call void @llvm.prefetch.p0(ptr [[SCEVGEP]], i32 0, i32 3, i32 1)
+; CHECK-NEXT:    [[TMP4:%.*]] = load double, ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[ADD:%.*]] = fadd double [[TMP4]], 1.000000e+00
+; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds double, ptr [[A]], i64 [[INDVARS_IV]]
+; CHECK-NEXT:    call void @llvm.prefetch.p0(ptr [[SCEVGEP1]], i32 1, i32 3, i32 1)
+; CHECK-NEXT:    store double [[ADD]], ptr [[ARRAYIDX2]], align 8
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1600
+; CHECK-NEXT:    br i1 [[EXITCOND]], label %[[FOR_END:.*]], label %[[FOR_BODY]]
+; CHECK:       [[FOR_END]]:
+; CHECK-NEXT:    ret void
+;
 entry:
   br label %for.body
 
-; CHECK: for.body:
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds double, ptr %b, i64 %indvars.iv
-; CHECK: call void @llvm.prefetch
   %0 = load double, ptr %arrayidx, align 8
   %add = fadd double %0, 1.000000e+00
   %arrayidx2 = getelementptr inbounds double, ptr %a, i64 %indvars.iv
@@ -19,7 +41,6 @@ for.body:                                         ; preds = %for.body, %entry
   %exitcond = icmp eq i64 %indvars.iv.next, 1600
   br i1 %exitcond, label %for.end, label %for.body
 
-; CHECK: for.end:
 for.end:                                          ; preds = %for.body
   ret void
 }

…pass Inspired by https://reviews.llvm.org/D146600, this commit adds some TTI hooks for LoongArch to make LoopDataPrefetch pass really work. Including: - `getCacheLineSize()`: 64 for loongarch64. - `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements taken by prefetching are more obvious when set PrefetchDistance to 200(results shown blow), although different benchmarks fit for different best choice. - `enableWritePrefetching()`: store prefetch is supported by LoongArch, so set WritePrefetching to true in default. - `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still use default values: 1 and UINT_MAX, so not override them. After this commit, the test added by https://reviews.llvm.org/D146600 can generate llvm.prefetch intrinsic IR correctly. Results of spec2017rate benchmarks (testing date: ref, copies: 1): - For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can bring about -1.58%/0.31%/0.07% performance improvement for int benchmarks and 3.26%/3.73%/3.78% improvement for floating point benchmarks. (Only O3+novec+prefetch decreases when testing intrate.) - But prefetch results in performance reduction almost for every Fortran benchmark compiled by flang. While considering all C/C++/Fortran benchmarks, prefetch performance will decrease about 1% ~ 5%. FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to false for now due to the bad effect for Fortran.

llvmbot added backend:loongarch llvm:transforms labels Dec 3, 2024

zhaoqi5 marked this pull request as draft December 3, 2024 07:14

zhaoqi5 force-pushed the enable-loopdataprefetch-pass branch from 54a8ca4 to b13547b Compare December 11, 2024 08:47

Merge branch 'main' into enable-loopdataprefetch-pass

dd93755

zhaoqi5 marked this pull request as ready for review December 11, 2024 08:55

SixWeining approved these changes Jan 17, 2025

View reviewed changes

zhaoqi5 merged commit ca4886b into llvm:main Jan 20, 2025
10 checks passed

zhaoqi5 deleted the enable-loopdataprefetch-pass branch January 20, 2025 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437

[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437

Uh oh!

zhaoqi5 commented Dec 3, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437

[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437

Uh oh!

Conversation

zhaoqi5 commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaoqi5 commented Dec 3, 2024 •

edited

Loading

llvmbot commented Dec 3, 2024 •

edited

Loading