-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer] Serious slowdown on FFmpeg sheervideo.c when using ArrayBoundV2 #89045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-clang-static-analyzer Author: Balazs Benics (steakhal)
When release testing on top of clang-18, I noticed some significant (many X) slowdown on certain projects, e.g. on FFmpeg.
When investigated, the most impacted TU was `sheervideo.c`, where we had 33 seconds in the past, but now it takes around 1.23 **hours** to analyze. At that point, I'd count it as a hang.
After bisecting, the blamed change was #72107 I used The I've also checked that prior to this change, the maximum I dumped the state when the I'm not there yet to make suggestions for a fix, but I anyways wanted to let you know. @NagyDonat Here is the preprocessed reproducer: Here is the command to run it:
|
I used this patch to find the maximum symbol complexity: diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -258,6 +258,12 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
if (!Sym)
return TaintedSymbols;
+ static unsigned maxComplexity = 0;
+ if (unsigned comp = Sym->computeComplexity(); comp > maxComplexity) {
+ maxComplexity = comp;
+ llvm::errs() << "New max complexity: " << comp << "\n";
+ }
+
// Traverse all the symbols this symbol depends on to see if any are tainted.
for (SymbolRef SubSym : Sym->symbols()) {
if (!isa<SymbolData>(SubSym)) |
If I skip taint checking in OOBv2, the running times improve. I applied this for disabling taint checking: diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -401,7 +401,7 @@ void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs, E);
return;
}
- if (isTainted(State, ByteOffset)) {
+ if (false && isTainted(State, ByteOffset)) {
// Both cases are possible, but the offset is tainted, so report.
std::string RegName = getRegionName(Reg); |
Thanks for alerting me, I'll investigate this issue. I'm not too familiar with the tainted symbol algorithm, but I roughly recall that it has bad worst-case complexity, so I'm not very surprised that it blew up. |
Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 8 overloaded variants under this name) was handling element regions in a highly inefficent manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would proudce 2^N calls. I suspect that this issue might be behind llvm#89045 (note that `sheervideo.c` does very complex pointer arithmetic). This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls.
I found some exponentially inefficient recursion in Could you check whether my PR #89606 resolves the performance issue that you observed? |
Thanks for looking into this issue. I really appreciate that! If we accept how we check If we say, everything works as intended, but If we ask ourselves, why was the maximal symbol complexity of the symbols passed to I'd be interested in what is the value of having so complex symbols at times, and what is actually causing those huge symbols. |
I think you're misunderstanding my commit -- an
I agree that this would require separate investigation. My wild guess is that there is some normalization step that happens between the |
I had a look at the patch and that seems solid. That should land even if we decide to continue this investigation here. |
Previously the function ``` std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State, const MemRegion *Reg, TaintTagType K, bool returnFirstOnly) ``` (one of the 4 overloaded variants under this name) was handling element regions in a highly inefficient manner: it performed the "also examine the super-region" step twice. (Once in the branch for element regions, and once in the more general branch for all `SubRegion`s -- note that `ElementRegion` is a subclass of `SubRegion`.) As pointer arithmetic produces `ElementRegion`s, it's not too difficult to get a chain of N nested element regions where this inefficient recursion would produce 2^N calls. This commit is essentially NFC, apart from the performance improvements and the removal of (probably irrelevant) duplicate entries from the return value of `getTaintedSymbols()` calls. Fixes #89045
I merged my commit and created a new issue for the symbol complexity part of this issue. I'll start working on that question, probably by examining the logic that happens between the |
When release testing on top of clang-18, I noticed some significant (many X) slowdown on certain projects, e.g. on FFmpeg.
When investigated, the most impacted TU was
sheervideo.c
, where we had 33 seconds in the past, but now it takes around 1.23 hours to analyze. At that point, I'd count it as a hang.After bisecting, the blamed change was #72107
Switch to PostStmt callbacks in ArrayBoundV2
.I used

perf
to get a glimps on where we spend our time, and here is how it looks:The
OOBv2
checker checks array subscripts, and then it spends most of its time on traversing symbols to see if any of it is tainted.I've also checked that prior to this change, the maximum
Sym->computeComplexity()
withingetTaintedSymbolsImpl
was significantly lower than after the change. After the change this maximal complexity was more around the threshold (35), 30-33.I dumped the state when the
getTaintedSymbolsImpl
appeared 63 times in the call stack and the state dump itself, was huge.Several if not all lines individually taking up 26 Megabytes.
Many of these lines encoded the history of some "hashing-like" computation.
I'm not there yet to make suggestions for a fix, but I anyways wanted to let you know. @NagyDonat
Here is the preprocessed reproducer:
sheervideo.zip
Here is the command to run it:
The text was updated successfully, but these errors were encountered: