You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[LoongArch] Impl TTI hooks for LoongArch to support LoopDataPrefetch pass
Inspired by https://reviews.llvm.org/D146600, this commit adds
some TTI hooks for LoongArch to make LoopDataPrefetch pass
really work. Including:
- `getCacheLineSize()`: 64 for loongarch64.
- `getPrefetchDistance()`: After testing SPEC CPU 2017, improvements
taken by prefetching are more obvious when set PrefetchDistance to
200(results shown blow), although different benchmarks fit for different
best choice.
- `enableWritePrefetching()`: store prefetch is supported by LoongArch,
so set WritePrefetching to true in default.
- `getMinPrefetchStride()` and `getMaxPrefetchIterationsAhead()` still
use default values: 1 and UINT_MAX, so not override them.
After this commit, the test added by https://reviews.llvm.org/D146600 can
generate llvm.prefetch intrinsic IR correctly.
Results of spec2017rate benchmarks (testing date: ref, copies: 1):
- For all C/C++ benchmarks, compared to O3+novec/lsx/lasx, prefetch can
bring about -1.58%/0.31%/0.07% performance improvement for int benchmarks
and 3.26%/3.73%/3.78% improvement for floating point benchmarks. (Only
O3+novec+prefetch decreases when testing intrate.)
- But prefetch results in performance reduction almost for every Fortran
benchmark compiled by flang. While considering all C/C++/Fortran
benchmarks, prefetch performance will decrease about 1% ~ 5%.
FIXME: Keep `loongarch-enable-loop-data-prefetch` option default to false
for now due to the bad effect for Fortran.
0 commit comments