You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encounter a performance regression on the trunk for aarch64
Test case is not very easy (unfortunately).
But in general this regression can be described as follows:
Clang tries to replace branches with select instructions but in this test case it leads to greater number of instructions and it became x1.5 slower (on my arm64 Mate10 device).
attribute((noinline)) int do_select(const int max_iters_count,
const unsigned long in,
const unsigned long out,
const unsigned long ex,
const unsigned long bit_init_val,
const unsigned long mask)
{
int retval = 0;
for(int k =0 ; k < max_iters_count; k++)
{
fd_set_bits *fds = gv_fds;
unsigned long *rinp = fds->res_in;
for(int i= 0; i < g_max_i; ++i, ++rinp)
{
unsigned long bit = bit_init_val;
unsigned long res_in = 0;
//===== INNER LOOP, WHICH HAS A PROBLEM ========
for(int j = 0; j < BITS_PER_LONG; ++j, bit <<= 1)
{
if (in & bit) {
res_in |= bit;
retval++;
fds->proc = NULL;
}
if (mask & POLLOUT_SET) {
fds->proc = NULL;
}
}
*rinp = res_in;
}
}
return retval;
}
Source code with test case is attached to the message (select.c).
I have tested this regression with a plenty of input data, and there are no cases when clang with default phi-node-folding-threshold=2
is better than clang with phi-node-folding-threshold=1.
Additional (unoptimal) instructions are generated in SimplifyCFG pass
Function mergeConditionalStoreToAddress transforms the blocks, moving store instructions out of two blocks
===============================================================================================================
IR before mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15)
===============================================================================================================
IR after mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15 - how they were transformed)
Function FoldTwoEntryPHINode tries to merge these blocks and inserts two select instructions which leads to regression
===============================================================================================================
IR after FoldTwoEntryPHINode (attention to blocks if.inc were merged to for.body12 - and it leads to regression)
After FoldTwoEntryPHINode one IR instruction (or) will be splitted into two machine instructions (csel and orr )
This regression can be avoided in two ways:
By adding an option -mllvm -phi-node-folding-threshold=1 (default threshold=2) to compiler and linker (because I use a LTO). But it will influence some others transformations too
By changing source code of SimplifyCFG.cpp with adding some heuristics to increase cost of folding these basic blocks (depending on backend).
In this case changes can be applied to lambda IsWorthwhile in function mergeConditionalStoreToAddress (SimplifyCFG.cpp).
Something like that (it is just a concept of the patch):
auto IsWorthwhile = [&](BasicBlock *BB) {
if (!BB)
return true;
// Heuristic: if the block can be if-converted/phi-folded and the
// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to
// thread this store.
unsigned N = 0;
for (auto &I : BB->instructionsWithoutDebug()) {
// Cheap instructions viable for folding.
if (isa(I) || isa(I) ||
isa(I))
isa(I)) {
++N;
// Check if this "cheap" instruction has additional cost
// for blocks folding
if (TTI.isExpensiveForFolding(I)) {
++N;
}
}
But maybe this solution is too naive and the real reason of the regression somewhere in another optimization,
which cannot handle this pattern correctly. I will appreciate any ideas about it.
As I see this bug is similar to this one #22990 but I am not sure if it is a duplicate
The text was updated successfully, but these errors were encountered:
Extended Description
I encounter a performance regression on the trunk for aarch64
Test case is not very easy (unfortunately).
But in general this regression can be described as follows:
Clang tries to replace branches with select instructions but in this test case it leads to greater number of instructions and it became x1.5 slower (on my arm64 Mate10 device).
attribute((noinline)) int do_select(const int max_iters_count,
const unsigned long in,
const unsigned long out,
const unsigned long ex,
const unsigned long bit_init_val,
const unsigned long mask)
{
int retval = 0;
}
Source code with test case is attached to the message (select.c).
Asm for body of inner loop :
latest clang:
78c: 8a010043 and x3, x2, x1
790: ea01005f tst x2, x1
794: aa030165 orr x5, x11, x3
798: 1a800400 cinc w0, w0, ne
79c: 9a8203e3 csel x3, xzr, x2, eq
7a0: b4000045 cbz x5, 7a8 <do_select+0x60>
7a4: f9001dbf str xzr, [x13,#56]
7a8: aa030210 orr x16, x16, x3
latest clang with -mllvm -phi-node-folding-threshold=1:
78c: ea01025f tst x18, x1
790: 54000080 b.eq 7a0 <do_select+0x58>
794: aa120210 orr x16, x16, x18
798: 11000400 add w0, w0, #0x1
79c: f9001dbf str xzr, [x13,#56]
7a0: b400004b cbz x11, 7a8 <do_select+0x60>
7a4: f9001dbf str xzr, [x13,#56]
I have tested this regression with a plenty of input data, and there are no cases when clang with default phi-node-folding-threshold=2
is better than clang with phi-node-folding-threshold=1.
Additional (unoptimal) instructions are generated in SimplifyCFG pass
===============================================================================================================
IR before mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15)
; Function Attrs: noinline norecurse nounwind
define dso_local i32 @do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #
0 !dbg !59 {
entry:
call void @llvm.dbg.value(metadata i32 %max_iters_count, metadata !65, metadata !DIExpression()), !dbg !86
call void @llvm.dbg.value(metadata i64 %in, metadata !66, metadata !DIExpression()), !dbg !87
call void @llvm.dbg.value(metadata i64 %out, metadata !67, metadata !DIExpression()), !dbg !88
call void @llvm.dbg.value(metadata i64 %ex, metadata !68, metadata !DIExpression()), !dbg !89
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !69, metadata !DIExpression()), !dbg !90
call void @llvm.dbg.value(metadata i64 %mask, metadata !70, metadata !DIExpression()), !dbg !91
call void @llvm.dbg.value(metadata i32 0, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !72, metadata !DIExpression()), !dbg !93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !95
for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !95
for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !92
call void @llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
ret i32 %retval1.0.lcssa, !dbg !96
for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %k.053, metadata
!72, metadata !DIExpression()), !dbg !93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @gv_fds111, align 8, !dbg !97, !tbaa !98
call void @llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !74, metadata !DIExpression()), !dbg !102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !103
%1 = load i64*, i64** %res_in, align 8, !dbg !103, !tbaa !104
call void @llvm.dbg.value(metadata i64* %1, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 0, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
%2 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp348 = icmp eq i64 %2, 0, !dbg !112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !113
for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !113
for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @llvm.dbg.value(metadata i64 %indvars.iv, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i64* %rinp.050, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 0, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
br label %for.body12, !dbg !117
for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !118
call void @llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %inc23, metadata !72, metadata !DIExpression()), !dbg !93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !95, !llvm.loop !119
for.cond.cleanup11: ; preds = %for.inc
%retval1.3.lcssa = phi i32 [ %retval1.3, %for.inc ], !dbg !121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %for.inc ], !dbg !122
call void @llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !83, metadata !DIExpression()), !dbg !115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !123, !tbaa !110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !125
call void @llvm.dbg.value(metadata i32 undef, metadata !78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !108
call void @llvm.dbg.value(metadata i64* %incdec.ptr, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
%3 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !113, !llvm.loop !126
for.body12: ; preds = %for.inc, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %for.inc ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %for.inc ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %for.inc ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %for.inc ]
call void @llvm.dbg.value(metadata i32 %j.047, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.046, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit.044, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.243, metadata !71, metadata !DIExpression()), !dbg !92
%and = and i64 %bit.044, %in, !dbg !128
%tobool = icmp eq i64 %and, 0, !dbg !128
br i1 %tobool, label %if.end, label %if.then, !dbg !132
if.then: ; preds = %for.body12
%or = or i64 %res_in7.046, %bit.044, !dbg !133
call void @llvm.dbg.value(metadata i64 %or, metadata !83, metadata !DIExpression()), !dbg !115
%inc = add nsw i32 %retval1.243, 1, !dbg !135
call void @llvm.dbg.value(metadata i32 %inc, metadata !71, metadata !DIExpression()), !dbg !92
store i8* null, i8** %proc, align 8, !dbg !136, !tbaa !137
br label %if.end, !dbg !138
if.end: ; preds = %for.body12, %if.then
%retval1.3 = phi i32 [ %inc, %if.then ], [ %retval1.243, %for.body12 ], !dbg !121
%res_in7.1 = phi i64 [ %or, %if.then ], [ %res_in7.046, %for.body12 ], !dbg !122
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !139
if.then15: ; preds = %if.end
store i8* null, i8** %proc, align 8, !dbg !140, !tbaa !137
br label %for.inc, !dbg !143
for.inc: ; preds = %if.end, %if.then15
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !144
%shl = shl i64 %bit.044, 1, !dbg !145
call void @llvm.dbg.value(metadata i32 %inc18, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.1, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %shl, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.3, metadata !71, metadata !DIExpression()), !dbg !92
%exitcond = icmp eq i32 %inc18, 64, !dbg !146
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !117, !llvm.loop !147
}
===============================================================================================================
IR after mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15 - how they were transformed)
; Function Attrs: noinline norecurse nounwind
define dso_local i32 @do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #0 !dbg !59 {
entry:
call void @llvm.dbg.value(metadata i32 %max_iters_count, metadata !65, metadata !DIExpression()), !dbg !86
call void @llvm.dbg.value(metadata i64 %in, metadata !66, metadata !DIExpression()), !dbg !87
call void @llvm.dbg.value(metadata i64 %out, metadata !67, metadata !DIExpression()), !dbg !88
call void @llvm.dbg.value(metadata i64 %ex, metadata !68, metadata !DIExpression()), !dbg !89
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !69, metadata !DIExpression()), !dbg !90
call void @llvm.dbg.value(metadata i64 %mask, metadata !70, metadata !DIExpression()), !dbg !91
call void @llvm.dbg.value(metadata i32 0, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !72, metadata !DIExpression()), !dbg !93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !95
for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !95
for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !92
call void @llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
ret i32 %retval1.0.lcssa, !dbg !96
for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %k.053, metadata !72, metadata !DIExpression()), !dbg !93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @gv_fds111, align 8, !dbg !97, !tbaa !98
call void @llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !74, metadata !DIExpression()), !dbg !102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !103
%1 = load i64*, i64** %res_in, align 8, !dbg !103, !tbaa !104
call void @llvm.dbg.value(metadata i64* %1, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 0, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
%2 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp348 = icmp eq i64 %2, 0, !dbg !112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !113
for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !113
for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @llvm.dbg.value(metadata i64 %indvars.iv, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i64* %rinp.050, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 0, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
br label %for.body12, !dbg !117
for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !118
call void @llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %inc23, metadata !72, metadata !DIExpression()), !dbg !93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !95, !llvm.loop !119
for.cond.cleanup11: ; preds = %8
%retval1.3.lcssa = phi i32 [ %retval1.3, %8 ], !dbg !121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %8 ], !dbg !122
call void @llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !83, metadata !DIExpression()), !dbg !115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !123, !tbaa !110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !125
call void @llvm.dbg.value(metadata i32 undef, metadata !78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !108
call void @llvm.dbg.value(metadata i64* %incdec.ptr, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
%3 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !113, !llvm.loop !126
for.body12: ; preds = %8, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %8 ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %8 ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %8 ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %8 ]
call void @llvm.dbg.value(metadata i32 %j.047, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.046, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit.044, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.243, metadata !71, metadata !DIExpression()), !dbg !92
%and = and i64 %bit.044, %in, !dbg !128
%tobool = icmp eq i64 %and, 0, !dbg !128
br i1 %tobool, label %if.end, label %if.then, !dbg !132
if.then: ; preds = %for.body12
%or = or i64 %res_in7.046, %bit.044, !dbg !133
call void @llvm.dbg.value(metadata i64 %or, metadata !83, metadata !DIExpression()), !dbg !115
%inc = add nsw i32 %retval1.243, 1, !dbg !135
call void @llvm.dbg.value(metadata i32 %inc, metadata !71, metadata !DIExpression()), !dbg !92
br label %if.end, !dbg !136
if.end: ; preds = %for.body12, %if.then
%retval1.3 = phi i32 [ %inc, %if.then ], [ %retval1.243, %for.body12 ], !dbg !121
%res_in7.1 = phi i64 [ %or, %if.then ], [ %res_in7.046, %for.body12 ], !dbg !122
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !137
if.then15: ; preds = %if.end
br label %for.inc, !dbg !138
for.inc: ; preds = %if.end, %if.then15
%simplifycfg.merge = phi i8* [ null, %if.then15 ], [ null, %if.end ]
%4 = xor i1 %tobool, true, !dbg !141
%5 = xor i1 %tobool14, true, !dbg !141
%6 = or i1 %4, %5, !dbg !141
br i1 %6, label %7, label %8, !dbg !141
; :7: ; preds = %for.inc
store i8* %simplifycfg.merge, i8** %proc, align 8, !dbg !141, !tbaa !142
br label %8, !dbg !141
; :8: ; preds = %for.inc, %7
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !141
%shl = shl i64 %bit.044, 1, !dbg !143
call void @llvm.dbg.value(metadata i32 %inc18, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.1, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %shl, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.3, metadata !71, metadata !DIExpression()), !dbg !92
%exitcond = icmp eq i32 %inc18, 64, !dbg !144
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !117, !llvm.loop !145
}
===============================================================================================================
IR after FoldTwoEntryPHINode (attention to blocks if.inc were merged to for.body12 - and it leads to regression)
; Function Attrs: noinline norecurse nounwind
define dso_local i32 @do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #0 !dbg !59 {
entry:
call void @llvm.dbg.value(metadata i32 %max_iters_count, metadata !65, metadata !DIExpression()), !dbg !86
call void @llvm.dbg.value(metadata i64 %in, metadata !66, metadata !DIExpression()), !dbg !87
call void @llvm.dbg.value(metadata i64 %out, metadata !67, metadata !DIExpression()), !dbg !88
call void @llvm.dbg.value(metadata i64 %ex, metadata !68, metadata !DIExpression()), !dbg !89
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !69, metadata !DIExpression()), !dbg !90
call void @llvm.dbg.value(metadata i64 %mask, metadata !70, metadata !DIExpression()), !dbg !91
call void @llvm.dbg.value(metadata i32 0, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !72, metadata !DIExpression()), !dbg !93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !95
for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !95
for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !92
call void @llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
ret i32 %retval1.0.lcssa, !dbg !96
for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %k.053, metadata !72, metadata !DIExpression()), !dbg !93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @gv_fds111, align 8, !dbg !97, !tbaa !98
call void @llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !74, metadata !DIExpression()), !dbg !102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !103
%1 = load i64*, i64** %res_in, align 8, !dbg !103, !tbaa !104
call void @llvm.dbg.value(metadata i64* %1, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 0, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i32 %retval1.054, metadata !71, metadata !DIExpression()), !dbg !92
%2 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp348 = icmp eq i64 %2, 0, !dbg !112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !113
for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !113
for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @llvm.dbg.value(metadata i64 %indvars.iv, metadata !78, metadata !DIExpression()), !dbg !108
call void @llvm.dbg.value(metadata i64* %rinp.050, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 0, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 0, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit_init_val, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.149, metadata !71, metadata !DIExpression()), !dbg !92
br label %for.body12, !dbg !117
for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !118
call void @llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
call void @llvm.dbg.value(metadata i32 %inc23, metadata !72, metadata !DIExpression()), !dbg !93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !95, !llvm.loop !119
for.cond.cleanup11: ; preds = %8
%retval1.3.lcssa = phi i32 [ %retval1.3, %8 ], !dbg !121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %8 ], !dbg !122
call void @llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !83, metadata !DIExpression()), !dbg !115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !123, !tbaa !110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !125
call void @llvm.dbg.value(metadata i32 undef, metadata !78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !108
call void @llvm.dbg.value(metadata i64* %incdec.ptr, metadata !77, metadata !DIExpression()), !dbg !107
call void @llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !71, metadata !DIExpression()), !dbg !92
%3 = load volatile i64, i64* @g_max_i, align 8, !dbg !109, !tbaa !110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !113, !llvm.loop !126
for.body12: ; preds = %8, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %8 ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %8 ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %8 ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %8 ]
call void @llvm.dbg.value(metadata i32 %j.047, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.046, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %bit.044, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.243, metadata !71, metadata !DIExpression()), !dbg !92
%and = and i64 %bit.044, %in, !dbg !128
%tobool = icmp eq i64 %and, 0, !dbg !128
%or = or i64 %res_in7.046, %bit.044, !dbg !132
%inc = add nsw i32 %retval1.243, 1, !dbg !132
%retval1.3 = select i1 %tobool, i32 %retval1.243, i32 %inc, !dbg !132
%res_in7.1 = select i1 %tobool, i64 %res_in7.046, i64 %or, !dbg !132
br label %if.end, !dbg !132
if.then: ; No predecessors!
br label %if.end, !dbg !132
if.end: ; preds = %for.body12, %if.then
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !133
if.then15: ; preds = %if.end
br label %for.inc, !dbg !134
for.inc: ; preds = %if.end, %if.then15
%simplifycfg.merge = phi i8* [ null, %if.then15 ], [ null, %if.end ]
%4 = xor i1 %tobool, true, !dbg !137
%5 = xor i1 %tobool14, true, !dbg !137
%6 = or i1 %4, %5, !dbg !137
br i1 %6, label %7, label %8, !dbg !137
; :7: ; preds = %for.inc
store i8* %simplifycfg.merge, i8** %proc, align 8, !dbg !137, !tbaa !138
br label %8, !dbg !137
; :8: ; preds = %for.inc, %7
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !137
%shl = shl i64 %bit.044, 1, !dbg !139
call void @llvm.dbg.value(metadata i32 %inc18, metadata !84, metadata !DIExpression()), !dbg !114
call void @llvm.dbg.value(metadata i64 %res_in7.1, metadata !83, metadata !DIExpression()), !dbg !115
call void @llvm.dbg.value(metadata i64 %shl, metadata !80, metadata !DIExpression()), !dbg !116
call void @llvm.dbg.value(metadata i32 %retval1.3, metadata !71, metadata !DIExpression()), !dbg !92
%exitcond = icmp eq i32 %inc18, 64, !dbg !140
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !117, !llvm.loop !141
}
After FoldTwoEntryPHINode one IR instruction (
or
) will be splitted into two machine instructions (csel
andorr
)This regression can be avoided in two ways:
In this case changes can be applied to lambda IsWorthwhile in function mergeConditionalStoreToAddress (SimplifyCFG.cpp).
Something like that (it is just a concept of the patch):
auto IsWorthwhile = [&](BasicBlock *BB) {
if (!BB)
return true;
// Heuristic: if the block can be if-converted/phi-folded and the
// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to
// thread this store.
unsigned N = 0;
for (auto &I : BB->instructionsWithoutDebug()) {
// Cheap instructions viable for folding.
if (isa(I) || isa(I) ||
isa(I))
isa(I)) {
++N;
But maybe this solution is too naive and the real reason of the regression somewhere in another optimization,
which cannot handle this pattern correctly. I will appreciate any ideas about it.
As I see this bug is similar to this one #22990 but I am not sure if it is a duplicate
The text was updated successfully, but these errors were encountered: