-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass #118437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-llvm-transforms Author: ZhaoQi (zhaoqi5) ChangesInspired by https://reviews.llvm.org/D146600, this commit adds some TTI hooks for LoongArch to make LoopDataPrefetch pass really work. Including:
After this commit, the test added by https://reviews.llvm.org/D146600 can generate llvm.prefetch intrinsic IR correctly. TODO: SPEC CPU 2017 is retesting, results waiting for add here. TODO: Set Full diff: https://github.com/llvm/llvm-project/pull/118437.diff 3 Files Affected:
diff --git a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
index 5fbc7c734168d1..cbc9c3f3beca00 100644
--- a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.cpp
@@ -89,4 +89,10 @@ LoongArchTTIImpl::getPopcntSupport(unsigned TyWidth) {
return ST->hasExtLSX() ? TTI::PSK_FastHardware : TTI::PSK_Software;
}
+unsigned LoongArchTTIImpl::getCacheLineSize() const { return 64; }
+
+unsigned LoongArchTTIImpl::getPrefetchDistance() const { return 200; }
+
+bool LoongArchTTIImpl::enableWritePrefetching() const { return true; }
+
// TODO: Implement more hooks to provide TTI machinery for LoongArch.
diff --git a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
index f7ce75173be203..b3edf131c584c4 100644
--- a/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
+++ b/llvm/lib/Target/LoongArch/LoongArchTargetTransformInfo.h
@@ -47,6 +47,10 @@ class LoongArchTTIImpl : public BasicTTIImplBase<LoongArchTTIImpl> {
const char *getRegisterClassName(unsigned ClassID) const;
TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth);
+ unsigned getCacheLineSize() const override;
+ unsigned getPrefetchDistance() const override;
+ bool enableWritePrefetching() const override;
+
// TODO: Implement more hooks to provide TTI machinery for LoongArch.
};
diff --git a/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll b/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
index 8553171ac68ac9..0313bbd8832876 100644
--- a/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
+++ b/llvm/test/Transforms/LoopDataPrefetch/LoongArch/basic.ll
@@ -1,16 +1,38 @@
-;; Tag this 'XFAIL' because we need a few more TTIs and ISels.
-; XFAIL: *
-; RUN: opt --mtriple=loongarch64 -mattr=+d --passes=loop-data-prefetch -loongarch-enable-loop-data-prefetch -S < %s | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt --mtriple=loongarch64 -mattr=+d --passes=loop-data-prefetch -S < %s | FileCheck %s
define void @foo(ptr %a, ptr %b) {
+; CHECK-LABEL: define void @foo(
+; CHECK-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: br label %[[FOR_BODY:.*]]
+; CHECK: [[FOR_BODY]]:
+; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
+; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], 200
+; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
+; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
+; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], 200
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[B]], i64 [[TMP3]]
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[INDVARS_IV]]
+; CHECK-NEXT: call void @llvm.prefetch.p0(ptr [[SCEVGEP]], i32 0, i32 3, i32 1)
+; CHECK-NEXT: [[TMP4:%.*]] = load double, ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT: [[ADD:%.*]] = fadd double [[TMP4]], 1.000000e+00
+; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, ptr [[A]], i64 [[INDVARS_IV]]
+; CHECK-NEXT: call void @llvm.prefetch.p0(ptr [[SCEVGEP1]], i32 1, i32 3, i32 1)
+; CHECK-NEXT: store double [[ADD]], ptr [[ARRAYIDX2]], align 8
+; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1600
+; CHECK-NEXT: br i1 [[EXITCOND]], label %[[FOR_END:.*]], label %[[FOR_BODY]]
+; CHECK: [[FOR_END]]:
+; CHECK-NEXT: ret void
+;
entry:
br label %for.body
-; CHECK: for.body:
for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds double, ptr %b, i64 %indvars.iv
-; CHECK: call void @llvm.prefetch
%0 = load double, ptr %arrayidx, align 8
%add = fadd double %0, 1.000000e+00
%arrayidx2 = getelementptr inbounds double, ptr %a, i64 %indvars.iv
@@ -19,7 +41,6 @@ for.body: ; preds = %for.body, %entry
%exitcond = icmp eq i64 %indvars.iv.next, 1600
br i1 %exitcond, label %for.end, label %for.body
-; CHECK: for.end:
for.end: ; preds = %for.body
ret void
}
|
…pass Inspired by https://reviews.llvm.org/D146600, this commit adds some TTI hooks for LoongArch to make LoopDataPrefetch pass really work. Including: - `getCacheLineSize()`: 64 for loongarch64. - `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements taken by prefetching are more obvious when set PrefetchDistance to 200(results shown blow), although different benchmarks fit for different best choice. - `enableWritePrefetching()`: store prefetch is supported by LoongArch, so set WritePrefetching to true in default. - `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still use default values: 1 and UINT_MAX, so not override them. After this commit, the test added by https://reviews.llvm.org/D146600 can generate llvm.prefetch intrinsic IR correctly. Results of spec2017rate benchmarks (testing date: ref, copies: 1): - For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can bring about -1.58%/0.31%/0.07% performance improvement for int benchmarks and 3.26%/3.73%/3.78% improvement for floating point benchmarks. (Only O3+novec+prefetch decreases when testing intrate.) - But prefetch results in performance reduction almost for every Fortran benchmark compiled by flang. While considering all C/C++/Fortran benchmarks, prefetch performance will decrease about 1% ~ 5%. FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to false for now due to the bad effect for Fortran.
54a8ca4
to
b13547b
Compare
Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:
getCacheLineSize()
: 64 for loongarch64.getPrefetchDistance()
: After testing SPEC CPU 2017, improvementstaken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
enableWritePrefetching()
: store prefetch is supported by LoongArch,so set WritePrefetching to true in default.
getMinPrefetchStride()
andgetMaxPrefetchIterationsAhead()
stilluse default values: 1 and UINT_MAX, so not override them.
After this commit, the test added by https://reviews.llvm.org/D146600 can
generate llvm.prefetch intrinsic IR correctly.
Results of spec2017rate benchmarks (testing date: ref, copies: 1):
bring about -1.58%/0.31%/0.07% performance improvement for int benchmarks
and 3.26%/3.73%/3.78% improvement for floating point benchmarks. (Only
O3+novec+prefetch decreases when testing intrate.)
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.
FIXME: Keep
loongarch-enable-loop-data-prefetch
option default to falsefor now due to the bad effect for Fortran.