-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[IR][JumpThreading] Fix infinite recursion on compare self-reference [updated] #129501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In case of an unreachable loop, for example, the JumpThreading pass might attempt to remove the PHI node in the loop entry and replace it with its constant value. However, this fails in case the value is an instruction using the PHI node. This LLVM defect was identified via the AMD Fuzzing project.
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-transforms Author: Robert Imschweiler (ro-i) ChangesIn case of an unreachable loop, for example, the JumpThreading pass might attempt to remove the PHI node in the loop entry and replace it with its constant value. However, this fails in case the value is an instruction using the PHI node. This LLVM defect was identified via the AMD Fuzzing project. PS: I felt a bit discouraged from fixing this on the basic block level, but I think that the added check only verifies what should hold anyway. Full diff: https://github.com/llvm/llvm-project/pull/129501.diff 2 Files Affected:
diff --git a/llvm/lib/IR/BasicBlock.cpp b/llvm/lib/IR/BasicBlock.cpp
index dca42a57fa9e3..9c644e0330258 100644
--- a/llvm/lib/IR/BasicBlock.cpp
+++ b/llvm/lib/IR/BasicBlock.cpp
@@ -556,8 +556,14 @@ void BasicBlock::removePredecessor(BasicBlock *Pred,
if (NumPreds == 1)
continue;
- // Try to replace the PHI node with a constant value.
- if (Value *PhiConstant = Phi.hasConstantValue()) {
+ // Try to replace the PHI node with a constant value, but make sure that
+ // this value isn't using the PHI node. (Except it's a PHI node itself. PHI
+ // nodes are allowed to reference themselves.)
+ if (Value *PhiConstant = Phi.hasConstantValue();
+ PhiConstant &&
+ (!isa<Instruction>(PhiConstant) || isa<PHINode>(PhiConstant) ||
+ llvm::all_of(Phi.users(),
+ [PhiConstant](User *U) { return U != PhiConstant; }))) {
Phi.replaceAllUsesWith(PhiConstant);
Phi.eraseFromParent();
}
diff --git a/llvm/test/Transforms/JumpThreading/unreachable-loops.ll b/llvm/test/Transforms/JumpThreading/unreachable-loops.ll
index d8bd3f389aae8..1a1a4ca15f649 100644
--- a/llvm/test/Transforms/JumpThreading/unreachable-loops.ll
+++ b/llvm/test/Transforms/JumpThreading/unreachable-loops.ll
@@ -180,4 +180,64 @@ cleanup2343.loopexit4: ; preds = %cleanup1491
unreachable
}
+; This segfaults due to recursion in %C4. Reason: %L6 is identified to be a
+; "partially redundant load" and is replaced by a PHI node. The PHI node is then
+; simplified to be constant and is removed. This leads to %L6 being replaced by
+; %C4, which makes %C4 invalid since it uses %L6.
+; The test case has been generated by the AMD Fuzzing project and simplified
+; manually and by llvm-reduce.
+
+define i32 @constant_phi_leads_to_self_reference() {
+; CHECK-LABEL: @constant_phi_leads_to_self_reference(
+; CHECK-NEXT: [[A9:%.*]] = alloca i1, align 1
+; CHECK-NEXT: br label [[F6:%.*]]
+; CHECK: T3:
+; CHECK-NEXT: [[L6:%.*]] = phi i1 [ [[C4:%.*]], [[BB6:%.*]] ]
+; CHECK-NEXT: br label [[BB5:%.*]]
+; CHECK: BB5:
+; CHECK-NEXT: [[L10:%.*]] = load i1, ptr [[A9]], align 1
+; CHECK-NEXT: br i1 [[L10]], label [[BB6]], label [[F6]]
+; CHECK: BB6:
+; CHECK-NEXT: [[LGV3:%.*]] = load i1, ptr null, align 1
+; CHECK-NEXT: [[C4]] = icmp sle i1 [[L6]], true
+; CHECK-NEXT: store i1 [[C4]], ptr null, align 1
+; CHECK-NEXT: br i1 [[L6]], label [[F6]], label [[T3:%.*]]
+; CHECK: F6:
+; CHECK-NEXT: ret i32 0
+; CHECK: F7:
+; CHECK-NEXT: br label [[BB5]]
+;
+ %A9 = alloca i1, align 1
+ br i1 false, label %BB4, label %F6
+
+BB4: ; preds = %0
+ br i1 false, label %F6, label %F1
+
+F1: ; preds = %BB4
+ br i1 false, label %T4, label %T3
+
+T3: ; preds = %T4, %BB6, %F1
+ %L6 = load i1, ptr null, align 1
+ br label %BB5
+
+BB5: ; preds = %F7, %T3
+ %L10 = load i1, ptr %A9, align 1
+ br i1 %L10, label %BB6, label %F6
+
+BB6: ; preds = %BB5
+ %LGV3 = load i1, ptr null, align 1
+ %C4 = icmp sle i1 %L6, true
+ store i1 %C4, ptr null, align 1
+ br i1 %L6, label %F6, label %T3
+
+T4: ; preds = %F1
+ br label %T3
+
+F6: ; preds = %BB6, %BB5, %BB4, %0
+ ret i32 0
+
+F7: ; No predecessors!
+ br label %BB5
+}
+
!0 = !{!"branch_weights", i32 2146410443, i32 1073205}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is legal to have self-referential instructions in unreachable code. If JumpThreading can't handle the result, that needs to be fixed, not the creation of the self-reference.
I added another implementation. This checks much more aggressively for blocks that become unreachable during pass execution. Consequently, I had to update a few of the existing tests because their checks contained unreachable blocks, which get deleted now by the pass. (If you agree that this is a viable solution, I may further need to adapt some of these tests because right now, they may not check for their actual test case anymore. |
✅ With the latest revision this PR passed the undef deprecator. |
Ok, I now created a third approach :) Let me first show the situation more in detail:
This is the state of the function before the problem occurs while handling T4. Why is T4 a problem? It's the pre-header of T3 (which is a loop header). L6.pr is used by L6. The pass now detects that T4 is unreachable and removes it. Consequently, L6 become
|
Edit: My terminology wasn't precise. I actually talked about a loop predecessor block, which in my test case is also a pre-header, but in general, we cannot determine whether it's only a loop predecessor or pre-header without further analysis (which I assume shouldn't be done here). |
I am assuming this is what causes the crash: llvm-project/llvm/lib/IR/Verifier.cpp Lines 5150 to 5151 in eef5ea0
Having looked closer into it, I think the fault is in the Verifier. Since dominance is not defined in unreachable code, checking dominance is explictly neutred: llvm-project/llvm/include/llvm/Support/GenericDomTree.h Lines 474 to 475 in b00ad36
An instruction referencing itself is a special case of violating dominance. Hence, it should be skipped in unreachable code. Alternatively, InstSimplify needs to be changed in that before replacing an PHIInst (which is allowed to reference itself IF the incoming edge is dominated by the PHI -- the check that is skipped in unreachable code) with its only value does not cause any other instruction to reference itself (which is not allowed without exception for unreachable code). In any case, this is not a problem with JumpThreading and there might also be cases in SimplifyCFG that trigger this. For instance, JumpThreading could just replace the instruction in the BB with an UnreachableInst instead of deleting it. That would just toss the same problem to SimplifyCFG. |
Hm, I see that this issue might be even larger :D But I'm not sure whether I completely get what you're saying, @Meinersbur .
|
I didn't see that its throug the actions of JumpThreading. Where exactly does crash occur. Within DeleteDeadBlock? Why does it try to fix dead BBs? At the verifier afterwards? After InstCombine on it? Simplest fix would be to not to try to cleanup unreachable code in JumpThreading. Just let ADCE/SimplifyCFG handle it. Or call EliminateUnreachableBlocks at the end. That would even be cheaper than There is comment saying:
Which is AFAIK wrong. Unreachable code in itself is not invalid IR. |
The error occurs while handling a self-referencing icmp instruction,
That sounds reasonable. I implemented this yesterday and added an
A few months before that (so in 2017), there was https://reviews.llvm.org/D34135. They also found that the JumpThreading pass has a fundamental problem, so they just prevented the bug they were dealing with instead of fixing the root cause :/ I mean, all this unreachable mess could be solved by materializing the lazy domtree info and just use the dominator tree to check for reachability, but that would be expensive. Also, I could call TLDR:
|
This reverts commit c97a6c6.
This reverts commit b1b2f0a.
In unreachable code, constant PHI nodes may appear and be replaced by their single value. As a result, instructions may become self-referencing. This commit adds checks to avoid going into infinite recursion when handling self-referencing compare instructions in `evaluateOnPredecessorEdge()`.
✅ With the latest revision this PR passed the C/C++ code formatter. |
evaluateOnPredecessorEdge(BB, PredPredBB, CondCmp->getOperand(0), DL); | ||
Constant *Op1 = | ||
evaluateOnPredecessorEdge(BB, PredPredBB, CondCmp->getOperand(1), DL); | ||
Constant *Op0 = CondCmp->getOperand(0) == CondCmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles one specific kind of recursion, but there may be recursion across multiple instructions. It would be better to add a Visited set instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean directly in evaluateOnPredecessorEdge
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do something like this (see the commit I just pushed), I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(+ I guess this could/should be a private method)
After talking about this again with @Meinersbur , I think I got a better view on this issue. |
When using a Could it be sufficient to check EDIT: Didn't see the discussion already took place at #129501 (comment) The concern about reverting the |
Hm, I'm not 100% sure about the issue. I'm currently using a set to track all visited values when visiting an instruction. As soon as another instruction is visited, there will be a new set. Am I missing something? |
When handling %y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version seems plausible to me but I haven't looked in detail
Added a fix and a testcase for the issue @Meinersbur mentioned. (@arsenm yeah, ig this approach does make sense. From what I've read in past discussions, others think that the pass needs a rewrite, but I'm glad it does not seem necessary at the moment and that I can fix the current issue in a more straightforward way :) ) |
Values might not just be reused by different icmp instructions, but also by different arguments and/or using different paths: Preheader:
br label %LoopHeader
LoopHeader:
%common_phi = phi i32 [%Preheader, 0], [%Latch, %phi1]
br label %Latch
Latch:
%phi1 = phi i32 [%LoopHeader, %common_phi]
%phi2 = phi i32 [%LoopHeader, %common_phi]
%cmp = icmp eq %phi1, %phi2
br %cmp, label %LoopHeader, label %Exit
Exit: Assuming that Preheader:
br label %LoopHeader
LoopHeader:
%common_phi = phi i32 [%Preheader, 0], [%Latch, %common_phi]
br label %Latch
Latch:
%cmp = icmp eq %common_phi, %common_phi
br %cmp, label %LoopHeader, label %Exit
Exit: IIUC, the problem is when we do not have an external entry into the loop, in which case LoopHeader:
%common_phi = phi i32 [%Latch, %phi1]
br label %Latch
Latch:
%phi1 = phi i32 [%LoopHeader, %common_phi]
%phi2 = phi i32 [%LoopHeader, %common_phi]
%cmp = icmp eq %phi1, %phi2
br %cmp, label %LoopHeader, label %Exit
Exit: This is the condition we want to detect. |
But When I run your example: define void @foo() {
Preheader:
br label %LoopHeader
LoopHeader:
%common_phi = phi i32 [0, %Preheader], [%phi1, %Latch]
br label %Latch
Latch:
%phi1 = phi i32 [%common_phi, %LoopHeader]
%phi2 = phi i32 [%common_phi, %LoopHeader]
%cmp = icmp eq i32 %phi1, %phi2
br i1 %cmp, label %LoopHeader, label %Exit
Exit:
ret void
} I get: define void @foo() {
Preheader:
br label %Latch
Latch: ; preds = %Preheader, %Latch
%common_phi = phi i32 [ 0, %Preheader ], [ %common_phi, %Latch ]
%cmp = icmp eq i32 %common_phi, %common_phi
br i1 %cmp, label %Latch, label %Exit
Exit: ; preds = %Latch
ret void
} which should be what you intended? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the responses after #129501 (comment). So yes, de6ae29 addresses my concern. The PHI was an example, I did not look into what evaluateOnPredecessorEdge
recurses over.
I would prefer if the insert
and erase
would be done in the same call. A final review by @nikic would be good.
implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks fine now
Constant *Op0 = nullptr; | ||
Constant *Op1 = nullptr; | ||
if (Value *V0 = CondCmp->getOperand(0); !Visited.contains(V0)) { | ||
Visited.insert(V0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to do these checks in evaluateOnPredecessorEdge(), not at each call to evaluateOnPredecessorEdge. Doesn't really matter right now, but will make sure this keeps getting handled if this code handles more than icmp in the future.
Please also combine the contains and insert calls by using the return value of insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to do these checks in evaluateOnPredecessorEdge(), not at each call to evaluateOnPredecessorEdge. Doesn't really matter right now, but will make sure this keeps getting handled if this code handles more than icmp in the future.
I adapted the code to insert and double check at the start of evaluateOnPredecessorEdge, see my latest commit. However, there still needs to be a contains
before each call because, otherwise, the caller doesn't know whether it's allowed/required to erase the Value after the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also handle the erase in the same place, e.g. by adding a auto _ = make_scope_exit([V]() { Visited.erase(V); });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know make_scope_exit
, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks to everyone for your reviews! |
EDIT March 21: updated title. See commits and current discussion for the current state of this PR.
In case of an unreachable loop, for example, the JumpThreading pass might attempt to remove the PHI node in the loop entry and replace it with its constant value. However, this fails in case the value is an instruction using the PHI node.
This LLVM defect was identified via the AMD Fuzzing project.
PS: I felt a bit discouraged from fixing this on the basic block level, but I think that the added check only verifies what should hold anyway.