-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check `array[index].field` while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR llvm#70187. Note that after this change `&array[idx]` will be handled as an access to the `idx`th element of `array`, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced). This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as `&arr[length]`. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as `(arr+length)` and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it. In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases. The main change of this commit (replacing `check::Location` with `check::PostStmt<...>` callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.
@llvm/pr-subscribers-clang Author: None (DonatNagyE) Changes...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check Note that after this change This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases. The main change of this commit (replacing Full diff: https://github.com/llvm/llvm-project/pull/72107.diff 3 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
namespace {
enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
-class ArrayBoundCheckerV2 :
- public Checker<check::Location> {
+struct Messages {
+ std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+ check::PostStmt<UnaryOperator>,
+ check::PostStmt<MemberExpr>> {
BugType BT{this, "Out-of-bound access"};
BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
+ void performCheck(const Expr *E, CheckerContext &C) const;
+
void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
- NonLoc Offset, std::string RegName, std::string Msg) const;
+ NonLoc Offset, Messages Msgs) const;
static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
public:
- void checkLocation(SVal l, bool isLoad, const Stmt *S,
- CheckerContext &C) const;
+ void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+ performCheck(E, C);
+ }
+ void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+ if (E->getOpcode() == UO_Deref)
+ performCheck(E, C);
+ }
+ void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+ if (E->isArrow())
+ performCheck(E->getBase(), C);
+ }
};
+
} // anonymous namespace
/// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
return formatv(ShortMsgTemplates[Kind], RegName);
}
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+ std::string RegName = getRegionName(Region);
SmallString<128> Buf;
llvm::raw_svector_ostream Out(Buf);
Out << "Access of " << RegName << " at negative byte offset";
if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
Out << ' ' << ConcreteIdx->getValue();
- return std::string(Buf);
+ return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
}
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
- NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+ NonLoc Offset, NonLoc Extent, SVal Location) {
+ std::string RegName = getRegionName(Region);
const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
assert(EReg && "this checker only handles element access");
QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
Out << "s";
}
- return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
- SmallString<128> Buf;
- llvm::raw_svector_ostream Out(Buf);
- Out << "Access of " << RegName
- << " with a tainted offset that may be too large";
- return std::string(Buf);
+ return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
}
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
- const Stmt *LoadS,
- CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+ std::string RegName = getRegionName(Region);
+ return {formatv("Potential out of bound access to {0} with tainted {1}",
+ RegName, OffsetName),
+ formatv("Access of {0} with a tainted {1} that may be too large",
+ RegName, OffsetName)};
+}
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
// NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
// some new logic here that reasons directly about memory region extents.
// Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
// have some flexibility in defining the base region, we can achieve
// various levels of conservatism in our buffer overflow checking.
+ const SVal Location = C.getSVal(E);
+
// The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
// #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
// and incomplete analysis of these leads to false positives. As even
// accurate reports would be confusing for the users, just disable reports
// from these macros:
- if (isFromCtypeMacro(LoadS, C.getASTContext()))
+ if (isFromCtypeMacro(E, C.getASTContext()))
return;
ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
if (PrecedesLowerBound && !WithinLowerBound) {
// We know that the index definitely precedes the lower bound.
- std::string RegName = getRegionName(Reg);
- std::string Msg = getPrecedesMsg(RegName, ByteOffset);
- reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+ Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+ reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
return;
}
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
if (ExceedsUpperBound) {
if (!WithinUpperBound) {
// We know that the index definitely exceeds the upper bound.
- std::string RegName = getRegionName(Reg);
- std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
- *KnownSize, Location);
- reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+ Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+ *KnownSize, Location);
+ reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
return;
}
if (isTainted(State, ByteOffset)) {
- // Both cases are possible, but the index is tainted, so report.
+ // Both cases are possible, but the offset is tainted, so report.
std::string RegName = getRegionName(Reg);
- std::string Msg = getTaintMsg(RegName);
- reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+ // Diagnostic detail: "tainted offset" is always correct, but the
+ // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+ // nicer to say "tainted index".
+ const char *OffsetName = "offset";
+ if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+ if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+ OffsetName = "index";
+
+ Messages Msgs = getTaintMsgs(Reg, OffsetName);
+ reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
return;
}
}
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
ProgramStateRef ErrorState, OOB_Kind Kind,
- NonLoc Offset, std::string RegName,
- std::string Msg) const {
+ NonLoc Offset, Messages Msgs) const {
ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
if (!ErrorNode)
return;
- std::string ShortMsg = getShortMsg(Kind, RegName);
-
auto BR = std::make_unique<PathSensitiveBugReport>(
- Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+ Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
// Track back the propagation of taintedness.
if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
// expected-note@-2 {{Access of 'array' at negative byte offset -12}}
}
+int underflowWithDeref(void) {
+ int *p = array;
+ --p;
+ return *p;
+ // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+ // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
int scanf(const char *restrict fmt, ...);
void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
// expected-note@-1 {{Taint originated here}}
// expected-note@-2 {{Taint propagated to the 2nd argument}}
array[index] = 5;
+ // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+ // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+ int index;
+ scanf("%d", &index);
+ // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
+ int *p = array + index;
+ p[0] = 5;
// expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
// expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
}
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
// expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
}
+void flippedOverflow(void) {
+ 12[array] = 5;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+ // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+ // I think people who rely on this confusing corner case deserve a warning,
+ // but if this is widespread, then we could introduce a special case for it
+ // in the checker.
+ return &array[10];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+ // This is UB, it's invalid to form an after-after-the-end pointer.
+ return &array[11];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
int scalar;
int scalarOverflow(void) {
return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
// expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
}
-short convertedArray(void) {
- return ((short*)array)[47];
- // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
- // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
struct vec {
int len;
double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
// expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
}
+struct item {
+ int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+ return itemArray[35].a;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+ // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+ return (itemArray + 35)->b;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+ // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+ return ((short*)array)[47];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
struct two_bytes {
char lo, hi;
};
@@ -85,7 +143,9 @@ int intFromString(void) {
}
int intFromStringDivisible(void) {
- // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+ // However, this is reported with indices/elements, because the extent
+ // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+ // multiple of 4 bytes (= sizeof(int)).
return ((const int*)"abc")[20];
// expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
// expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
int Array[] = {1, 2, 3, 4, 5};
scanf("%d", &index); // expected-note {{Taint originated here}}
// expected-note@-1 {{Taint propagated to the 2nd argument}}
- return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
- // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+ return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+ // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
}
int taintDiagnosticDivZero(int operand) {
|
@llvm/pr-subscribers-clang-static-analyzer-1 Author: None (DonatNagyE) Changes...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check Note that after this change This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases. The main change of this commit (replacing Full diff: https://github.com/llvm/llvm-project/pull/72107.diff 3 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
namespace {
enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
-class ArrayBoundCheckerV2 :
- public Checker<check::Location> {
+struct Messages {
+ std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+ check::PostStmt<UnaryOperator>,
+ check::PostStmt<MemberExpr>> {
BugType BT{this, "Out-of-bound access"};
BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
+ void performCheck(const Expr *E, CheckerContext &C) const;
+
void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
- NonLoc Offset, std::string RegName, std::string Msg) const;
+ NonLoc Offset, Messages Msgs) const;
static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
public:
- void checkLocation(SVal l, bool isLoad, const Stmt *S,
- CheckerContext &C) const;
+ void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+ performCheck(E, C);
+ }
+ void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+ if (E->getOpcode() == UO_Deref)
+ performCheck(E, C);
+ }
+ void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+ if (E->isArrow())
+ performCheck(E->getBase(), C);
+ }
};
+
} // anonymous namespace
/// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
return formatv(ShortMsgTemplates[Kind], RegName);
}
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+ std::string RegName = getRegionName(Region);
SmallString<128> Buf;
llvm::raw_svector_ostream Out(Buf);
Out << "Access of " << RegName << " at negative byte offset";
if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
Out << ' ' << ConcreteIdx->getValue();
- return std::string(Buf);
+ return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
}
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
- NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+ NonLoc Offset, NonLoc Extent, SVal Location) {
+ std::string RegName = getRegionName(Region);
const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
assert(EReg && "this checker only handles element access");
QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
Out << "s";
}
- return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
- SmallString<128> Buf;
- llvm::raw_svector_ostream Out(Buf);
- Out << "Access of " << RegName
- << " with a tainted offset that may be too large";
- return std::string(Buf);
+ return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
}
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
- const Stmt *LoadS,
- CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+ std::string RegName = getRegionName(Region);
+ return {formatv("Potential out of bound access to {0} with tainted {1}",
+ RegName, OffsetName),
+ formatv("Access of {0} with a tainted {1} that may be too large",
+ RegName, OffsetName)};
+}
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
// NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
// some new logic here that reasons directly about memory region extents.
// Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
// have some flexibility in defining the base region, we can achieve
// various levels of conservatism in our buffer overflow checking.
+ const SVal Location = C.getSVal(E);
+
// The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
// #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
// and incomplete analysis of these leads to false positives. As even
// accurate reports would be confusing for the users, just disable reports
// from these macros:
- if (isFromCtypeMacro(LoadS, C.getASTContext()))
+ if (isFromCtypeMacro(E, C.getASTContext()))
return;
ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
if (PrecedesLowerBound && !WithinLowerBound) {
// We know that the index definitely precedes the lower bound.
- std::string RegName = getRegionName(Reg);
- std::string Msg = getPrecedesMsg(RegName, ByteOffset);
- reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+ Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+ reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
return;
}
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
if (ExceedsUpperBound) {
if (!WithinUpperBound) {
// We know that the index definitely exceeds the upper bound.
- std::string RegName = getRegionName(Reg);
- std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
- *KnownSize, Location);
- reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+ Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+ *KnownSize, Location);
+ reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
return;
}
if (isTainted(State, ByteOffset)) {
- // Both cases are possible, but the index is tainted, so report.
+ // Both cases are possible, but the offset is tainted, so report.
std::string RegName = getRegionName(Reg);
- std::string Msg = getTaintMsg(RegName);
- reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+ // Diagnostic detail: "tainted offset" is always correct, but the
+ // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+ // nicer to say "tainted index".
+ const char *OffsetName = "offset";
+ if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+ if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+ OffsetName = "index";
+
+ Messages Msgs = getTaintMsgs(Reg, OffsetName);
+ reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
return;
}
}
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
ProgramStateRef ErrorState, OOB_Kind Kind,
- NonLoc Offset, std::string RegName,
- std::string Msg) const {
+ NonLoc Offset, Messages Msgs) const {
ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
if (!ErrorNode)
return;
- std::string ShortMsg = getShortMsg(Kind, RegName);
-
auto BR = std::make_unique<PathSensitiveBugReport>(
- Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+ Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
// Track back the propagation of taintedness.
if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
// expected-note@-2 {{Access of 'array' at negative byte offset -12}}
}
+int underflowWithDeref(void) {
+ int *p = array;
+ --p;
+ return *p;
+ // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+ // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
int scanf(const char *restrict fmt, ...);
void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
// expected-note@-1 {{Taint originated here}}
// expected-note@-2 {{Taint propagated to the 2nd argument}}
array[index] = 5;
+ // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+ // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+ int index;
+ scanf("%d", &index);
+ // expected-note@-1 {{Taint originated here}}
+ // expected-note@-2 {{Taint propagated to the 2nd argument}}
+ int *p = array + index;
+ p[0] = 5;
// expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
// expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
}
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
// expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
}
+void flippedOverflow(void) {
+ 12[array] = 5;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+ // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+ // I think people who rely on this confusing corner case deserve a warning,
+ // but if this is widespread, then we could introduce a special case for it
+ // in the checker.
+ return &array[10];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+ // This is UB, it's invalid to form an after-after-the-end pointer.
+ return &array[11];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
int scalar;
int scalarOverflow(void) {
return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
// expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
}
-short convertedArray(void) {
- return ((short*)array)[47];
- // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
- // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
struct vec {
int len;
double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
// expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
}
+struct item {
+ int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+ return itemArray[35].a;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+ // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+ return (itemArray + 35)->b;
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+ // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+ return ((short*)array)[47];
+ // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+ // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
struct two_bytes {
char lo, hi;
};
@@ -85,7 +143,9 @@ int intFromString(void) {
}
int intFromStringDivisible(void) {
- // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+ // However, this is reported with indices/elements, because the extent
+ // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+ // multiple of 4 bytes (= sizeof(int)).
return ((const int*)"abc")[20];
// expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
// expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
int Array[] = {1, 2, 3, 4, 5};
scanf("%d", &index); // expected-note {{Taint originated here}}
// expected-note@-1 {{Taint propagated to the 2nd argument}}
- return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
- // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+ return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+ // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
}
int taintDiagnosticDivZero(int operand) {
|
I ran an analysis that compares this commit with its parent on many open source projects. The results revealed that this commit "converts" many alpha.security.ArrayBound (V1) results into alpha.security.ArrayBoundV2 results because (if I understand it correctly) the new This is a mostly irrelevant effect because I presume that the users don't want to enable both of the ArrayBound checkers at the same time (I was testing with options that enable almost all alpha checkers, but that's not a "normal" config). For the sake of completeness I'm pasting the result links for this first run where the diff is polluted by all the "lost report for V1, new report for V2" differences; but I started a clean run with
|
I evaluated a mostly [1] clean analysis run and it reveals that this change has almost no effects when CSA analyses stable open source code (this is the expected behavior, stable code doesn't contain out of bounds memory access). The only differences are apparently caused by the inherent instability of the exploded graph traversal:
[1] During the analysis of contour both the baseline (main snapshot created a few days ago) and the new revision crashed with the message |
Note that &array[idx] is perfectly valid code when
Of course, one could use the Could you elaborate on alternative approaches you considered fixing the problem and why you chose this one? E.g., would trying to look at the parent regions for expressions like Alternatively, would it be possible to suppress warnings on the common pattern |
Oh, dammit, this language is stupid... I couldn't imagine a reason to use
I considered two approaches, this one and walking on the parent region layers (I don't think that there is a third distinct approach). I chose this because:
Yes, that was my plan for resolving this issue; and as you say that it's needed, I'll implement it. I'll squeeze the parent lookup into the I think that I won't suppress the warning in situations like
where the analyzer knows that |
I updated the PR to allow creating the after-the-end pointer of an array as |
if (EqualsToThreshold && !NotEqualToThreshold) { | ||
// We are definitely in the exceptional case, so return early | ||
// instead of reporting a bug. | ||
C.addTransition(EqualsToThreshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this transition needed? This checker should not add assumptions to the state, only check for conditions and add only error transitions. EqualsToThreshold
probably does not contain new information compared to State
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that EqualsToThreshold
does not contain new information compared to State
, but I'm using it because I'm trying to follow the pattern that I'm always using the most recent state variable.
On the other hand, the variable State
may contain new information compared to the state of the node before the start of this callback: e.g. if we start with a range set like State
when we assume that there is no underflow.
I think it's valuable to record these assumptions with a state transition, because they improve the accuracy of the modeling. (Otherwise the analyzer could produce bug reports that rely on assumptions that contradict each other.) Currently the assumptions of this checker are added silently but I'll add note tags like "Assuming index is non-negative" for them in a followup commit.
I'm in favor of this change. I'll pull the patch downstream and report back how it performed. |
FYI I edited the PR summary so that I'm not tagged there directly because if someone is tagged in a commit message on GH, that person will be notified each time that commit is pushed in public. Consequently, the tagged person gets spammed by GH from all the public forks each time they rebase and push this commit :D |
@steakhal thanks for the checking and sorry for the unintentional spamming.
Just wonderful 😄 |
To clarify, you did not spam me. I'm worried about merging a commit directly mentioning people. That would be the point when forks start to pick up the commit (and my name e.g.) and start spamming. I just wanted to raise awareness of this being a thing, and why I did that edit. |
Note that (if I understand it correctly) after this change the |
I think another question is whether warning on code like |
@Xazax-hun I added a few testcases that show the current behavior of the checker w.r.t. invalid references. I think this is reasonably correct behavior, but I'm not too interested in the legality of these gray areas and I don't have strong opinions. By the way, with the new |
This commit extends the class `SValBuilder` with the methods `getMinValue()` and `getMaxValue()` to that work like `SValBuilder::getKnownValue()` but return the minimal/maximal possible value the `SVal` is not perfectly constrained. This extension of the ConstraintManager API is discussed at: https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192 As a simple proof-of-concept application of this new API, this commit extends a message from `core.BitwiseShift` with some range information that reports the assumptions of the analyzer. My main motivation for adding these methods is that I'll also want to use them in `ArrayBoundCheckerV2` to make the error messages less awkward, but I'm starting with this simpler and less important usecase because I want to avoid merge conflicts with my other commit llvm#72107 which is currently under review. The testcase `too_large_right_operand_compound()` shows a situation where querying the range information does not work (and the extra information is not added to the error message). This also affects the debug utility `clang_analyzer_value()`, so the problem isn't in the fresh code. I'll do some investigations to resolve this, but I think that this commit is a step forward even with this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for reporting back only after a week or so. The analysis was done in like a day, but then we had some credential issues that blocked me actually doing the diffing.
Now about the diffing: a lot of things changed because now we report earlier, so a different range gets highlighted. This caused me some problems, so I decided to not do anything fancy, but randomly pick a bunch of issues.
In short, the picked issues looked better; but effectively remained the same. This is good.
…74141) This commit extends the class `SValBuilder` with the methods `getMinValue()` and `getMaxValue()` to that work like `SValBuilder::getKnownValue()` but return the minimal/maximal possible value the `SVal` is not perfectly constrained. This extension of the ConstraintManager API is discussed at: https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192 As a simple proof-of-concept application of this new API, this commit extends a message from `core.BitwiseShift` with some range information that reports the assumptions of the analyzer. My main motivation for adding these methods is that I'll also want to use them in `ArrayBoundCheckerV2` to make the error messages less awkward, but I'm starting with this simpler and less important usecase because I want to avoid merge conflicts with my other commit #72107 which is currently under review. The testcase `too_large_right_operand_compound()` shows a situation where querying the range information does not work (and the extra information is not added to the error message). This also affects the debug utility `clang_analyzer_value()`, so the problem isn't in the fresh code. I'll do some investigations to resolve this, but I think that this commit is a step forward even with this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we could clean up checkLocation
to work as expected for all cases. It is more future proof in the sense that if C++ introduces new kinds of statements or expressions that could index into something (like multi-dimensional operator[]
), checks continue to work without change after the analysis engine is updated.
It is a higher-level abstraction that lets checker authors focus on the semantics without having to enumerate all the patterns the check might be interested in which is often impossible in C++, let alone considering all the extensions implemented by Clang. So, these high-level abstractions supposed to deduplicate a lot of work by doing all the pattern matching once and for all in the engine, so checkers could benefit from this work without duplicating some of that pattern matching in every checker.
All that being said, I understand that it is more practical to just work these engine problems around in this check for now as we do not have a volunteer to revamp checkLocation
at the moment. I know @steakhal looked into it, but it turned out to be a bigger fish than we expected. So, I'd say let's land this as a practical middle-term solution and hope that we will be able to come back and clean this up at some point using something that is more future proof.
Co-authored-by: Balazs Benics <[email protected]>
...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check
array[index].field
while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.Note that after this change
&array[idx]
will be handled as an access to theidx
th element ofarray
, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as
&arr[length]
. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as(arr+length)
and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.
The main change of this commit (replacing
check::Location
withcheck::PostStmt<...>
callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by steakhal. Those reviews were both abandoned, but the problems that led to abandonment were unrelated to the change that is introduced in this PR.