Skip to content

[SimplifyCFG] Performance regression in mergeConditionalStoreToAddress #42550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
llvmbot opened this issue Sep 3, 2019 · 0 comments
Open

Comments

@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2019

Bugzilla Link 43205
Version trunk
OS Linux
Attachments Test case for the regression
Reporter LLVM Bugzilla Contributor

Extended Description

I encounter a performance regression on the trunk for aarch64

Test case is not very easy (unfortunately).
But in general this regression can be described as follows:
Clang tries to replace branches with select instructions but in this test case it leads to greater number of instructions and it became x1.5 slower (on my arm64 Mate10 device).

attribute((noinline)) int do_select(const int max_iters_count,
const unsigned long in,
const unsigned long out,
const unsigned long ex,
const unsigned long bit_init_val,
const unsigned long mask)
{
int retval = 0;

for(int k =0 ; k < max_iters_count; k++) 
{
    fd_set_bits *fds = gv_fds;

    unsigned long *rinp = fds->res_in; 

	for(int i= 0; i < g_max_i; ++i, ++rinp)
    {
        unsigned long bit = bit_init_val;
	    unsigned long res_in = 0;

        //===== INNER LOOP, WHICH HAS A PROBLEM ========
	    for(int j = 0; j < BITS_PER_LONG; ++j, bit <<= 1) 
        {
        	if (in & bit) {
        		res_in |= bit;
        		retval++;
        		fds->proc = NULL;
        	}
        	
        	if (mask & POLLOUT_SET) {
        		fds->proc = NULL;
        	}
        }
        *rinp = res_in;
    }
}
return retval;

}

Source code with test case is attached to the message (select.c).

Asm for body of inner loop :

latest clang:
78c: 8a010043 and x3, x2, x1
790: ea01005f tst x2, x1
794: aa030165 orr x5, x11, x3
798: 1a800400 cinc w0, w0, ne
79c: 9a8203e3 csel x3, xzr, x2, eq
7a0: b4000045 cbz x5, 7a8 <do_select+0x60>
7a4: f9001dbf str xzr, [x13,#56]
7a8: aa030210 orr x16, x16, x3

latest clang with -mllvm -phi-node-folding-threshold=1:
78c: ea01025f tst x18, x1
790: 54000080 b.eq 7a0 <do_select+0x58>
794: aa120210 orr x16, x16, x18
798: 11000400 add w0, w0, #​0x1
79c: f9001dbf str xzr, [x13,#56]
7a0: b400004b cbz x11, 7a8 <do_select+0x60>
7a4: f9001dbf str xzr, [x13,#56]

I have tested this regression with a plenty of input data, and there are no cases when clang with default phi-node-folding-threshold=2
is better than clang with phi-node-folding-threshold=1.

Additional (unoptimal) instructions are generated in SimplifyCFG pass

  1. Function mergeConditionalStoreToAddress transforms the blocks, moving store instructions out of two blocks

===============================================================================================================
IR before mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15)

; Function Attrs: noinline norecurse nounwind
define dso_local i32 @​do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #
0 !dbg !​59 {
entry:
call void @​llvm.dbg.value(metadata i32 %max_iters_count, metadata !​65, metadata !DIExpression()), !dbg !​86
call void @​llvm.dbg.value(metadata i64 %in, metadata !​66, metadata !DIExpression()), !dbg !​87
call void @​llvm.dbg.value(metadata i64 %out, metadata !​67, metadata !DIExpression()), !dbg !​88
call void @​llvm.dbg.value(metadata i64 %ex, metadata !​68, metadata !DIExpression()), !dbg !​89
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​69, metadata !DIExpression()), !dbg !​90
call void @​llvm.dbg.value(metadata i64 %mask, metadata !​70, metadata !DIExpression()), !dbg !​91
call void @​llvm.dbg.value(metadata i32 0, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​72, metadata !DIExpression()), !dbg !​93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !​94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !​95

for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !​95

for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !​92
call void @​llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
ret i32 %retval1.0.lcssa, !dbg !​96

for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %k.053, metadata
!​72, metadata !DIExpression()), !dbg !​93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @​gv_fds111, align 8, !dbg !​97, !tbaa !​98
call void @​llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !​74, metadata !DIExpression()), !dbg !​102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !​103
%1 = load i64*, i64** %res_in, align 8, !dbg !​103, !tbaa !​104
call void @​llvm.dbg.value(metadata i64* %1, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 0, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
%2 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp348 = icmp eq i64 %2, 0, !dbg !​112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !​113

for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !​113

for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @​llvm.dbg.value(metadata i64 %indvars.iv, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %rinp.050, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 0, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
br label %for.body12, !dbg !​117

for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !​92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !​118
call void @​llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %inc23, metadata !​72, metadata !DIExpression()), !dbg !​93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !​94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !​95, !llvm.loop !​119

for.cond.cleanup11: ; preds = %for.inc
%retval1.3.lcssa = phi i32 [ %retval1.3, %for.inc ], !dbg !​121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %for.inc ], !dbg !​122
call void @​llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !​83, metadata !DIExpression()), !dbg !​115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !​123, !tbaa !​110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !​124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !​125
call void @​llvm.dbg.value(metadata i32 undef, metadata !​78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %incdec.ptr, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
%3 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !​112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !​113, !llvm.loop !​126

for.body12: ; preds = %for.inc, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %for.inc ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %for.inc ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %for.inc ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %for.inc ]
call void @​llvm.dbg.value(metadata i32 %j.047, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.046, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit.044, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.243, metadata !​71, metadata !DIExpression()), !dbg !​92
%and = and i64 %bit.044, %in, !dbg !​128
%tobool = icmp eq i64 %and, 0, !dbg !​128
br i1 %tobool, label %if.end, label %if.then, !dbg !​132

if.then: ; preds = %for.body12
%or = or i64 %res_in7.046, %bit.044, !dbg !​133
call void @​llvm.dbg.value(metadata i64 %or, metadata !​83, metadata !DIExpression()), !dbg !​115
%inc = add nsw i32 %retval1.243, 1, !dbg !​135
call void @​llvm.dbg.value(metadata i32 %inc, metadata !​71, metadata !DIExpression()), !dbg !​92
store i8* null, i8** %proc, align 8, !dbg !​136, !tbaa !​137
br label %if.end, !dbg !​138

if.end: ; preds = %for.body12, %if.then
%retval1.3 = phi i32 [ %inc, %if.then ], [ %retval1.243, %for.body12 ], !dbg !​121
%res_in7.1 = phi i64 [ %or, %if.then ], [ %res_in7.046, %for.body12 ], !dbg !​122
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !​139

if.then15: ; preds = %if.end
store i8* null, i8** %proc, align 8, !dbg !​140, !tbaa !​137
br label %for.inc, !dbg !​143

for.inc: ; preds = %if.end, %if.then15
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !​144
%shl = shl i64 %bit.044, 1, !dbg !​145
call void @​llvm.dbg.value(metadata i32 %inc18, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.1, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %shl, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.3, metadata !​71, metadata !DIExpression()), !dbg !​92
%exitcond = icmp eq i32 %inc18, 64, !dbg !​146
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !​117, !llvm.loop !​147
}

===============================================================================================================
IR after mergeConditionalStoreToAddress (attention to blocks if.end if.then if.then15 - how they were transformed)

; Function Attrs: noinline norecurse nounwind
define dso_local i32 @​do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #​0 !dbg !​59 {
entry:
call void @​llvm.dbg.value(metadata i32 %max_iters_count, metadata !​65, metadata !DIExpression()), !dbg !​86
call void @​llvm.dbg.value(metadata i64 %in, metadata !​66, metadata !DIExpression()), !dbg !​87
call void @​llvm.dbg.value(metadata i64 %out, metadata !​67, metadata !DIExpression()), !dbg !​88
call void @​llvm.dbg.value(metadata i64 %ex, metadata !​68, metadata !DIExpression()), !dbg !​89
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​69, metadata !DIExpression()), !dbg !​90
call void @​llvm.dbg.value(metadata i64 %mask, metadata !​70, metadata !DIExpression()), !dbg !​91
call void @​llvm.dbg.value(metadata i32 0, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​72, metadata !DIExpression()), !dbg !​93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !​94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !​95

for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !​95

for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !​92
call void @​llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
ret i32 %retval1.0.lcssa, !dbg !​96

for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %k.053, metadata !​72, metadata !DIExpression()), !dbg !​93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @​gv_fds111, align 8, !dbg !​97, !tbaa !​98
call void @​llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !​74, metadata !DIExpression()), !dbg !​102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !​103
%1 = load i64*, i64** %res_in, align 8, !dbg !​103, !tbaa !​104
call void @​llvm.dbg.value(metadata i64* %1, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 0, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
%2 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp348 = icmp eq i64 %2, 0, !dbg !​112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !​113

for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !​113

for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @​llvm.dbg.value(metadata i64 %indvars.iv, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %rinp.050, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 0, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
br label %for.body12, !dbg !​117

for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !​92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !​118
call void @​llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %inc23, metadata !​72, metadata !DIExpression()), !dbg !​93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !​94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !​95, !llvm.loop !​119

for.cond.cleanup11: ; preds = %8
%retval1.3.lcssa = phi i32 [ %retval1.3, %8 ], !dbg !​121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %8 ], !dbg !​122
call void @​llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !​83, metadata !DIExpression()), !dbg !​115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !​123, !tbaa !​110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !​124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !​125
call void @​llvm.dbg.value(metadata i32 undef, metadata !​78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %incdec.ptr, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
%3 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !​112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !​113, !llvm.loop !​126

for.body12: ; preds = %8, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %8 ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %8 ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %8 ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %8 ]
call void @​llvm.dbg.value(metadata i32 %j.047, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.046, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit.044, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.243, metadata !​71, metadata !DIExpression()), !dbg !​92
%and = and i64 %bit.044, %in, !dbg !​128
%tobool = icmp eq i64 %and, 0, !dbg !​128
br i1 %tobool, label %if.end, label %if.then, !dbg !​132

if.then: ; preds = %for.body12
%or = or i64 %res_in7.046, %bit.044, !dbg !​133
call void @​llvm.dbg.value(metadata i64 %or, metadata !​83, metadata !DIExpression()), !dbg !​115
%inc = add nsw i32 %retval1.243, 1, !dbg !​135
call void @​llvm.dbg.value(metadata i32 %inc, metadata !​71, metadata !DIExpression()), !dbg !​92
br label %if.end, !dbg !​136

if.end: ; preds = %for.body12, %if.then
%retval1.3 = phi i32 [ %inc, %if.then ], [ %retval1.243, %for.body12 ], !dbg !​121
%res_in7.1 = phi i64 [ %or, %if.then ], [ %res_in7.046, %for.body12 ], !dbg !​122
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !​137

if.then15: ; preds = %if.end
br label %for.inc, !dbg !​138

for.inc: ; preds = %if.end, %if.then15
%simplifycfg.merge = phi i8* [ null, %if.then15 ], [ null, %if.end ]
%4 = xor i1 %tobool, true, !dbg !​141
%5 = xor i1 %tobool14, true, !dbg !​141
%6 = or i1 %4, %5, !dbg !​141
br i1 %6, label %7, label %8, !dbg !​141

; :7: ; preds = %for.inc
store i8* %simplifycfg.merge, i8** %proc, align 8, !dbg !​141, !tbaa !​142
br label %8, !dbg !​141

; :8: ; preds = %for.inc, %7
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !​141
%shl = shl i64 %bit.044, 1, !dbg !​143
call void @​llvm.dbg.value(metadata i32 %inc18, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.1, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %shl, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.3, metadata !​71, metadata !DIExpression()), !dbg !​92
%exitcond = icmp eq i32 %inc18, 64, !dbg !​144
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !​117, !llvm.loop !​145
}

  1. Function FoldTwoEntryPHINode tries to merge these blocks and inserts two select instructions which leads to regression
    ===============================================================================================================
    IR after FoldTwoEntryPHINode (attention to blocks if.inc were merged to for.body12 - and it leads to regression)

; Function Attrs: noinline norecurse nounwind
define dso_local i32 @​do_select(i32 %max_iters_count, i64 %in, i64 %out, i64 %ex, i64 %bit_init_val, i64 %mask) local_unnamed_addr #​0 !dbg !​59 {
entry:
call void @​llvm.dbg.value(metadata i32 %max_iters_count, metadata !​65, metadata !DIExpression()), !dbg !​86
call void @​llvm.dbg.value(metadata i64 %in, metadata !​66, metadata !DIExpression()), !dbg !​87
call void @​llvm.dbg.value(metadata i64 %out, metadata !​67, metadata !DIExpression()), !dbg !​88
call void @​llvm.dbg.value(metadata i64 %ex, metadata !​68, metadata !DIExpression()), !dbg !​89
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​69, metadata !DIExpression()), !dbg !​90
call void @​llvm.dbg.value(metadata i64 %mask, metadata !​70, metadata !DIExpression()), !dbg !​91
call void @​llvm.dbg.value(metadata i32 0, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​72, metadata !DIExpression()), !dbg !​93
%cmp52 = icmp sgt i32 %max_iters_count, 0, !dbg !​94
br i1 %cmp52, label %for.body.lr.ph, label %for.cond.cleanup, !dbg !​95

for.body.lr.ph: ; preds = %entry
%and13 = and i64 %mask, 780
%tobool14 = icmp eq i64 %and13, 0
br label %for.body, !dbg !​95

for.cond.cleanup: ; preds = %for.cond.cleanup5, %entry
%retval1.0.lcssa = phi i32 [ 0, %entry ], [ %retval1.1.lcssa, %for.cond.cleanup5 ], !dbg !​92
call void @​llvm.dbg.value(metadata i32 %retval1.0.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
ret i32 %retval1.0.lcssa, !dbg !​96

for.body: ; preds = %for.cond.cleanup5, %for.body.lr.ph
%retval1.054 = phi i32 [ 0, %for.body.lr.ph ], [ %retval1.1.lcssa, %for.cond.cleanup5 ]
%k.053 = phi i32 [ 0, %for.body.lr.ph ], [ %inc23, %for.cond.cleanup5 ]
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %k.053, metadata !​72, metadata !DIExpression()), !dbg !​93
%0 = load volatile %struct.fd_set_bits*, %struct.fd_set_bits** @​gv_fds111, align 8, !dbg !​97, !tbaa !​98
call void @​llvm.dbg.value(metadata %struct.fd_set_bits* %0, metadata !​74, metadata !DIExpression()), !dbg !​102
%res_in = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 3, !dbg !​103
%1 = load i64*, i64** %res_in, align 8, !dbg !​103, !tbaa !​104
call void @​llvm.dbg.value(metadata i64* %1, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 0, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i32 %retval1.054, metadata !​71, metadata !DIExpression()), !dbg !​92
%2 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp348 = icmp eq i64 %2, 0, !dbg !​112
br i1 %cmp348, label %for.cond.cleanup5, label %for.cond8.preheader.lr.ph, !dbg !​113

for.cond8.preheader.lr.ph: ; preds = %for.body
%proc = getelementptr inbounds %struct.fd_set_bits, %struct.fd_set_bits* %0, i64 0, i32 7
br label %for.cond8.preheader, !dbg !​113

for.cond8.preheader: ; preds = %for.cond8.preheader.lr.ph, %for.cond.cleanup11
%indvars.iv = phi i64 [ 0, %for.cond8.preheader.lr.ph ], [ %indvars.iv.next, %for.cond.cleanup11 ]
%rinp.050 = phi i64* [ %1, %for.cond8.preheader.lr.ph ], [ %incdec.ptr, %for.cond.cleanup11 ]
%retval1.149 = phi i32 [ %retval1.054, %for.cond8.preheader.lr.ph ], [ %retval1.3.lcssa, %for.cond.cleanup11 ]
call void @​llvm.dbg.value(metadata i64 %indvars.iv, metadata !​78, metadata !DIExpression()), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %rinp.050, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 0, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 0, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit_init_val, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.149, metadata !​71, metadata !DIExpression()), !dbg !​92
br label %for.body12, !dbg !​117

for.cond.cleanup5: ; preds = %for.cond.cleanup11, %for.body
%retval1.1.lcssa = phi i32 [ %retval1.054, %for.body ], [ %retval1.3.lcssa, %for.cond.cleanup11 ], !dbg !​92
%inc23 = add nuw nsw i32 %k.053, 1, !dbg !​118
call void @​llvm.dbg.value(metadata i32 %retval1.1.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
call void @​llvm.dbg.value(metadata i32 %inc23, metadata !​72, metadata !DIExpression()), !dbg !​93
%exitcond56 = icmp eq i32 %inc23, %max_iters_count, !dbg !​94
br i1 %exitcond56, label %for.cond.cleanup, label %for.body, !dbg !​95, !llvm.loop !​119

for.cond.cleanup11: ; preds = %8
%retval1.3.lcssa = phi i32 [ %retval1.3, %8 ], !dbg !​121
%res_in7.1.lcssa = phi i64 [ %res_in7.1, %8 ], !dbg !​122
call void @​llvm.dbg.value(metadata i64 %res_in7.1.lcssa, metadata !​83, metadata !DIExpression()), !dbg !​115
store i64 %res_in7.1.lcssa, i64* %rinp.050, align 8, !dbg !​123, !tbaa !​110
%indvars.iv.next = add nuw i64 %indvars.iv, 1, !dbg !​124
%incdec.ptr = getelementptr inbounds i64, i64* %rinp.050, i64 1, !dbg !​125
call void @​llvm.dbg.value(metadata i32 undef, metadata !​78, metadata !DIExpression(DW_OP_plus_uconst, 1, DW_OP_stack_value)), !dbg !​108
call void @​llvm.dbg.value(metadata i64* %incdec.ptr, metadata !​77, metadata !DIExpression()), !dbg !​107
call void @​llvm.dbg.value(metadata i32 %retval1.3.lcssa, metadata !​71, metadata !DIExpression()), !dbg !​92
%3 = load volatile i64, i64* @​g_max_i, align 8, !dbg !​109, !tbaa !​110
%cmp3 = icmp ugt i64 %3, %indvars.iv.next, !dbg !​112
br i1 %cmp3, label %for.cond8.preheader, label %for.cond.cleanup5, !dbg !​113, !llvm.loop !​126

for.body12: ; preds = %8, %for.cond8.preheader
%j.047 = phi i32 [ 0, %for.cond8.preheader ], [ %inc18, %8 ]
%res_in7.046 = phi i64 [ 0, %for.cond8.preheader ], [ %res_in7.1, %8 ]
%bit.044 = phi i64 [ %bit_init_val, %for.cond8.preheader ], [ %shl, %8 ]
%retval1.243 = phi i32 [ %retval1.149, %for.cond8.preheader ], [ %retval1.3, %8 ]
call void @​llvm.dbg.value(metadata i32 %j.047, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.046, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %bit.044, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.243, metadata !​71, metadata !DIExpression()), !dbg !​92
%and = and i64 %bit.044, %in, !dbg !​128
%tobool = icmp eq i64 %and, 0, !dbg !​128
%or = or i64 %res_in7.046, %bit.044, !dbg !​132
%inc = add nsw i32 %retval1.243, 1, !dbg !​132
%retval1.3 = select i1 %tobool, i32 %retval1.243, i32 %inc, !dbg !​132
%res_in7.1 = select i1 %tobool, i64 %res_in7.046, i64 %or, !dbg !​132
br label %if.end, !dbg !​132

if.then: ; No predecessors!
br label %if.end, !dbg !​132

if.end: ; preds = %for.body12, %if.then
br i1 %tobool14, label %for.inc, label %if.then15, !dbg !​133

if.then15: ; preds = %if.end
br label %for.inc, !dbg !​134

for.inc: ; preds = %if.end, %if.then15
%simplifycfg.merge = phi i8* [ null, %if.then15 ], [ null, %if.end ]
%4 = xor i1 %tobool, true, !dbg !​137
%5 = xor i1 %tobool14, true, !dbg !​137
%6 = or i1 %4, %5, !dbg !​137
br i1 %6, label %7, label %8, !dbg !​137

; :7: ; preds = %for.inc
store i8* %simplifycfg.merge, i8** %proc, align 8, !dbg !​137, !tbaa !​138
br label %8, !dbg !​137

; :8: ; preds = %for.inc, %7
%inc18 = add nuw nsw i32 %j.047, 1, !dbg !​137
%shl = shl i64 %bit.044, 1, !dbg !​139
call void @​llvm.dbg.value(metadata i32 %inc18, metadata !​84, metadata !DIExpression()), !dbg !​114
call void @​llvm.dbg.value(metadata i64 %res_in7.1, metadata !​83, metadata !DIExpression()), !dbg !​115
call void @​llvm.dbg.value(metadata i64 %shl, metadata !​80, metadata !DIExpression()), !dbg !​116
call void @​llvm.dbg.value(metadata i32 %retval1.3, metadata !​71, metadata !DIExpression()), !dbg !​92
%exitcond = icmp eq i32 %inc18, 64, !dbg !​140
br i1 %exitcond, label %for.cond.cleanup11, label %for.body12, !dbg !​117, !llvm.loop !​141
}

After FoldTwoEntryPHINode one IR instruction (or) will be splitted into two machine instructions (csel and orr )

This regression can be avoided in two ways:

  1. By adding an option -mllvm -phi-node-folding-threshold=1 (default threshold=2) to compiler and linker (because I use a LTO). But it will influence some others transformations too
  2. By changing source code of SimplifyCFG.cpp with adding some heuristics to increase cost of folding these basic blocks (depending on backend).
    In this case changes can be applied to lambda IsWorthwhile in function mergeConditionalStoreToAddress (SimplifyCFG.cpp).
    Something like that (it is just a concept of the patch):

auto IsWorthwhile = [&](BasicBlock *BB) {
if (!BB)
return true;
// Heuristic: if the block can be if-converted/phi-folded and the
// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to
// thread this store.
unsigned N = 0;
for (auto &I : BB->instructionsWithoutDebug()) {
// Cheap instructions viable for folding.
if (isa(I) || isa(I) ||
isa(I))
isa(I)) {
++N;

  •   // Check if this "cheap" instruction has additional cost 
    
  •   // for blocks folding
    
  •   if (TTI.isExpensiveForFolding(I)) {
    
  •     ++N;
    
  •   }
    
    }
    

But maybe this solution is too naive and the real reason of the regression somewhere in another optimization,
which cannot handle this pattern correctly. I will appreciate any ideas about it.

As I see this bug is similar to this one #22990 but I am not sure if it is a duplicate

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants