Skip to content

[analyzer] Use AllocaRegion in MallocChecker #72402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

NagyDonat
Copy link
Contributor

...to model the results of alloca() and _alloca() calls. Previously it acted as if these functions were returning memory from the heap, which led to alpha.security.ArrayBoundV2 producing incorrect messages.

...to model the results of alloca() and _alloca() calls. Previously it
acted as if these functions were returning memory from the heap, which
led to alpha.security.ArrayBoundV2 producing incorrect messages.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Nov 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: None (DonatNagyE)

Changes

...to model the results of alloca() and _alloca() calls. Previously it acted as if these functions were returning memory from the heap, which led to alpha.security.ArrayBoundV2 producing incorrect messages.


Full diff: https://github.com/llvm/llvm-project/pull/72402.diff

7 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h (+9)
  • (modified) clang/lib/StaticAnalyzer/Checkers/BuiltinFunctionChecker.cpp (+14-15)
  • (modified) clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp (+6-4)
  • (modified) clang/lib/StaticAnalyzer/Core/SValBuilder.cpp (+8)
  • (modified) clang/test/Analysis/malloc.c (+12-2)
  • (modified) clang/test/Analysis/memory-model.cpp (+1-1)
  • (modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+2-6)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
index 692c4384586569e..a64cf7ae4efcb82 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
@@ -215,6 +215,15 @@ class SValBuilder {
                                                 const LocationContext *LCtx,
                                                 QualType type, unsigned Count);
 
+  /// Create an SVal representing the result of an alloca()-like call, that is,
+  /// an AllocaRegion on the stack.
+  ///
+  /// After calling this function, it's a good idea to set the extent of the
+  /// returned AllocaRegion.
+  loc::MemRegionVal getAllocaRegionVal(const Expr *E,
+                                       const LocationContext *LCtx,
+                                       unsigned Count);
+
   DefinedOrUnknownSVal getDerivedRegionValueSymbolVal(
       SymbolRef parentSymbol, const TypedValueRegion *region);
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/BuiltinFunctionChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/BuiltinFunctionChecker.cpp
index 4a56156de4b27fe..143326c435cf815 100644
--- a/clang/lib/StaticAnalyzer/Checkers/BuiltinFunctionChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/BuiltinFunctionChecker.cpp
@@ -81,22 +81,21 @@ bool BuiltinFunctionChecker::evalCall(const CallEvent &Call,
 
   case Builtin::BI__builtin_alloca_with_align:
   case Builtin::BI__builtin_alloca: {
-    // FIXME: Refactor into StoreManager itself?
-    MemRegionManager& RM = C.getStoreManager().getRegionManager();
-    const AllocaRegion* R =
-      RM.getAllocaRegion(CE, C.blockCount(), C.getLocationContext());
-
-    // Set the extent of the region in bytes. This enables us to use the
-    // SVal of the argument directly. If we save the extent in bits, we
-    // cannot represent values like symbol*8.
-    auto Size = Call.getArgSVal(0);
-    if (Size.isUndef())
-      return true; // Return true to model purity.
-
-    state = setDynamicExtent(state, R, Size.castAs<DefinedOrUnknownSVal>(),
-                             C.getSValBuilder());
+    SValBuilder &SVB = C.getSValBuilder();
+    const loc::MemRegionVal R =
+        SVB.getAllocaRegionVal(CE, C.getLocationContext(), C.blockCount());
 
-    C.addTransition(state->BindExpr(CE, LCtx, loc::MemRegionVal(R)));
+    // Set the extent of the region in bytes. This enables us to use the SVal
+    // of the argument directly. If we saved the extent in bits, it'd be more
+    // difficult to reason about values like symbol*8.
+    auto Size = Call.getArgSVal(0);
+    if (auto DefSize = Size.getAs<DefinedOrUnknownSVal>()) {
+      state = setDynamicExtent(state, R.getRegion(), *DefSize, SVB);
+      // FIXME: perhaps the following transition should be moved out of the
+      // 'if' to bind an AllocaRegion (with unknown/unspecified size) even in
+      // the unlikely case when the size argument is undefined.
+      C.addTransition(state->BindExpr(CE, LCtx, R));
+    }
     return true;
   }
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
index d3a4020280616b0..417305e26c41b09 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -1731,10 +1731,12 @@ ProgramStateRef MallocChecker::MallocMemAux(CheckerContext &C,
   // TODO: We could rewrite post visit to eval call; 'malloc' does not have
   // side effects other than what we model here.
   unsigned Count = C.blockCount();
-  SValBuilder &svalBuilder = C.getSValBuilder();
+  SValBuilder &SVB = C.getSValBuilder();
   const LocationContext *LCtx = C.getPredecessor()->getLocationContext();
-  DefinedSVal RetVal = svalBuilder.getConjuredHeapSymbolVal(CE, LCtx, Count)
-      .castAs<DefinedSVal>();
+  DefinedSVal RetVal =
+      ((Family == AF_Alloca) ? SVB.getAllocaRegionVal(CE, LCtx, Count)
+                             : SVB.getConjuredHeapSymbolVal(CE, LCtx, Count)
+                                   .castAs<DefinedSVal>());
   State = State->BindExpr(CE, C.getLocationContext(), RetVal);
 
   // Fill the region with the initialization value.
@@ -1746,7 +1748,7 @@ ProgramStateRef MallocChecker::MallocMemAux(CheckerContext &C,
 
   // Set the region's extent.
   State = setDynamicExtent(State, RetVal.getAsRegion(),
-                           Size.castAs<DefinedOrUnknownSVal>(), svalBuilder);
+                           Size.castAs<DefinedOrUnknownSVal>(), SVB);
 
   return MallocUpdateRefState(C, CE, State, Family);
 }
diff --git a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
index e2f0fb6a6731ca0..eb9cde5c8918fc1 100644
--- a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -231,6 +231,14 @@ SValBuilder::getConjuredHeapSymbolVal(const Expr *E,
   return loc::MemRegionVal(MemMgr.getSymbolicHeapRegion(sym));
 }
 
+loc::MemRegionVal SValBuilder::getAllocaRegionVal(const Expr *E,
+                                                  const LocationContext *LCtx,
+                                                  unsigned VisitCount) {
+  const AllocaRegion *R =
+      getRegionManager().getAllocaRegion(E, VisitCount, LCtx);
+  return loc::MemRegionVal(R);
+}
+
 DefinedSVal SValBuilder::getMetadataSymbolVal(const void *symbolTag,
                                               const MemRegion *region,
                                               const Expr *expr, QualType type,
diff --git a/clang/test/Analysis/malloc.c b/clang/test/Analysis/malloc.c
index a3f7a69b8cef347..14b7555c2c58197 100644
--- a/clang/test/Analysis/malloc.c
+++ b/clang/test/Analysis/malloc.c
@@ -266,13 +266,18 @@ void CheckUseZeroAllocated1(void) {
 }
 
 char CheckUseZeroAllocated2(void) {
+  // FIXME: The return value of `alloca()` is modeled with `AllocaRegion`
+  // instead of `SymbolicRegion`, so the current implementation of
+  // `MallocChecker::checkUseZeroAllocated()` cannot handle it; and we get an
+  // unrelated, but suitable warning from core.uninitialized.UndefReturn.
   char *p = alloca(0);
-  return *p; // expected-warning {{Use of memory allocated with size zero}}
+  return *p; // expected-warning {{Undefined or garbage value returned to caller}}
 }
 
 char CheckUseZeroWinAllocated2(void) {
+  // FIXME: Same situation as `CheckUseZeroAllocated2()`.
   char *p = _alloca(0);
-  return *p; // expected-warning {{Use of memory allocated with size zero}}
+  return *p; // expected-warning {{Undefined or garbage value returned to caller}}
 }
 
 void UseZeroAllocated(int *p) {
@@ -727,6 +732,11 @@ void paramFree(int *p) {
   myfoo(p); // expected-warning {{Use of memory after it is freed}}
 }
 
+void allocaFree(void) {
+  int *p = alloca(sizeof(int));
+  free(p); // expected-warning {{Memory allocated by alloca() should not be deallocated}}
+}
+
 int* mallocEscapeRet(void) {
   int *p = malloc(12);
   return p; // no warning
diff --git a/clang/test/Analysis/memory-model.cpp b/clang/test/Analysis/memory-model.cpp
index fbdef4fddbea93b..fd5a286acb60c06 100644
--- a/clang/test/Analysis/memory-model.cpp
+++ b/clang/test/Analysis/memory-model.cpp
@@ -97,7 +97,7 @@ void symbolic_malloc() {
 
 void symbolic_alloca() {
   int *a = (int *)alloca(12);
-  clang_analyzer_dump(a);             // expected-warning {{Element{HeapSymRegion{conj}}
+  clang_analyzer_dump(a);             // expected-warning {{Element{alloca{}}
   clang_analyzer_dumpExtent(a);       // expected-warning {{12 S64b}}
   clang_analyzer_dumpElementCount(a); // expected-warning {{3 S64b}}
 }
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..da1573665fa7951 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -107,12 +107,8 @@ void *alloca(size_t size);
 int allocaRegion(void) {
   int *mem = (int*)alloca(2*sizeof(int));
   mem[3] = -2;
-  // expected-warning@-1 {{Out of bound access to memory after the end of the heap area}}
-  // expected-note@-2 {{Access of the heap area at index 3, while it holds only 2 'int' elements}}
-  // FIXME: this should be
-  //   {{Out of bound access to memory after the end of the memory returned by 'alloca'}}
-  //   {{Access of the memory returned by 'alloca' at index 3, while it holds only 2 'int' elements}}
-  // but apparently something models 'alloca' as if it was allocating on the heap
+  // expected-warning@-1 {{Out of bound access to memory after the end of the memory returned by 'alloca'}}
+  // expected-note@-2 {{Access of the memory returned by 'alloca' at index 3, while it holds only 2 'int' elements}}
   return *mem;
 }
 

@haoNoQ
Copy link
Collaborator

haoNoQ commented Nov 15, 2023

Hmm, it really worries me that MallocChecker is setting a return value outside of evalCall(). This can easily lead to conflicts if multiple checkers try to do this: evalCall() is protected from conflicts (the engine asserts that at most one checker evaluates each call) but checkPostCall() isn't.

To the best of my knowledge, the only legal way to use State->BindExpr(...) in a checker is to set the return value in evalCall(). Ideally we should have an assertion about this ("Environment is unchanged after checker callback invocation, unless the callback is a successful evalCall()").

char *p = alloca(0);
return *p; // expected-warning {{Use of memory allocated with size zero}}
return *p; // expected-warning {{Undefined or garbage value returned to caller}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'd have no warning in case of

char CheckUseZeroAllocatedAndInitialized(void) {
   char *p = alloca(0);
  *p = 4;
   return *p;
}

? Might be worth testing.

(It's probably not hard to fix it as well? It's not like AllocaRegion is special when it comes to being able to carry dynamic extent?)

Copy link
Contributor Author

@NagyDonat NagyDonat Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll test that. (Edit: I tried and the error disappears if I add *p = 4. However, as I write below, ArrayBoundV2 is already checking this kind of overflow, and that checker produces an useful error message even in the case when *p = 4 is added.)

Unfortunately this "allocated with size zero" report is based on the private "Symbol -> state enum" map that's maintained by MallocChecker (so it's independent of the dynamic extent). I'd guess that switching to dynamic extent wouldn't be too difficult and it could simplify the code, but I think that belongs to a separate commit.

Copy link
Contributor Author

@NagyDonat NagyDonat Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the report "Use of memory allocated with size zero" is redundant with ArrayBoundV2, which detects and reports that the offset (0) is not smaller than the extent (also 0). Based on this I'm not sure that it's useful to maintain this "size zero" special case. What do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is not the real question, what we are to do with the 0-size alloca calls, but just to highlight some practical concerns, I found these sources:
https://discourse.llvm.org/t/malloc-free-and-alloca-with-zero-size/9284/3
https://stackoverflow.com/questions/8036654/what-does-alloca0-do-and-return-on-various-platforms

So alloca(0) sometimes has a special meaning. If we can give more specific error messages in these cases, I would prefer to handle those error messages in the more specific checker.
Even if ArrayBoundV2 has more user-friendly and mature error reporting (and would cover this case strictly speaking), making this more specific checker emit better diagnostics as well is something worth considering IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that alloca() in general and alloca(0) in particular can mean many things, and I don't think that it's worth to create a specific error message because I cannot say anything concrete in it. This is a nonstandard function, and while we can model its "basic" behavior, I think that we shouldn't try to deal with its corner cases.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Nov 16, 2023

Hmm, it really worries me that MallocChecker is setting a return value outside of evalCall(). This can easily lead to conflicts if multiple checkers try to do this: evalCall() is protected from conflicts (the engine asserts that at most one checker evaluates each call) but checkPostCall() isn't.

To the best of my knowledge, the only legal way to use State->BindExpr(...) in a checker is to set the return value in evalCall().

I agree and there's already a FIXME which suggests moving this modeling step into an EvalCall callback. If you feel that this is an important improvement, I could create a separate commit that does it.

Ideally we should have an assertion about this ("Environment is unchanged after checker callback invocation, unless the callback is a successful evalCall()").

I was thinking about an alternative solution: we could turn State->BindExpr(...) into a private function that's only called in the engine, and modify the signature of the evalCall() callback to let it return an SVal (which will be bound to the call expression by the engine). What do you think about this idea?

Copy link
Contributor

@gamesh411 gamesh411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added two remarks inline, but those can be separate patches as well.

auto Size = Call.getArgSVal(0);
if (auto DefSize = Size.getAs<DefinedOrUnknownSVal>()) {
state = setDynamicExtent(state, R.getRegion(), *DefSize, SVB);
// FIXME: perhaps the following transition should be moved out of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go with first asserting that the Size is DefinedOrUnknown anyway, and if we have a crash with a reproducer, then we can add the if and the test case for it.

Copy link
Contributor Author

@NagyDonat NagyDonat Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the situation with an assert and a straightforward testcase, and it turns out that currently the getAs() always succeeds, because core.CallAndMessage stops the execution with a bug report when an Undefined argument is passed to a function. (However, if I disabled this functionality of core.CallAndMessage, the test -- obviously -- triggered the assertion.)

Based on this, I decided to keep the current defensive check, because I didn't want to add a hidden and marginal dependency relationship between these two checkers. I know that we may assume that core checkers are always on, but that doesn't mean that I should hide landmines like assert(core.CallAndMessage is active and works as I assume) in unrelated checkers.

I updated the comment to explain this situation, and I also moved the addTransition() call out of the if because that's the more logical place for that method call and this "supporting fire" from core.CallAndMessage guarantees that this change won't have unintended side effects (because the if succeeds in normal operation).

char *p = alloca(0);
return *p; // expected-warning {{Use of memory allocated with size zero}}
return *p; // expected-warning {{Undefined or garbage value returned to caller}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is not the real question, what we are to do with the 0-size alloca calls, but just to highlight some practical concerns, I found these sources:
https://discourse.llvm.org/t/malloc-free-and-alloca-with-zero-size/9284/3
https://stackoverflow.com/questions/8036654/what-does-alloca0-do-and-return-on-various-platforms

So alloca(0) sometimes has a special meaning. If we can give more specific error messages in these cases, I would prefer to handle those error messages in the more specific checker.
Even if ArrayBoundV2 has more user-friendly and mature error reporting (and would cover this case strictly speaking), making this more specific checker emit better diagnostics as well is something worth considering IMO.

@NagyDonat
Copy link
Contributor Author

@haoNoQ What do you think about this commit? May I merge it?

@Xazax-hun
Copy link
Collaborator

Xazax-hun commented Nov 24, 2023

I think the original alloca(0) warning message might be clearer/easier to understand. While it might have platform or compiler dependent meaning, those behaviors are non-portable, so I think it is undesirable in most cases and people probably want to be notified about it. Regarding binding the return value outside of evalCall, I think that could be addressed in a separate PR. This one does not make the current situation any worse. But the very least we should add a FIXME now (and potentially open a ticket), to reduce the chances of this getting forgotten.

Edit: we already have a TODO, but I think that one does not really do justice to the urgency of the problem, it does not mention that doing bindings in checkers outside of evalCall is not a good idea.

@NagyDonat
Copy link
Contributor Author

I think the original alloca(0) warning message might be clearer/easier to understand.

The difficulty is that we switch from the SymbolicRegion that represents the heap allocation to an AllocaRegion that doesn't have an associated symbol, while the code that produces the warnings about zero-allocated memory areas (essentially) maintains a set of symbols that correspond to zero-allocated regions (I'd guess that this was implemented before the introduction of dynamic extent).

It would be possible to reintroduce the original alloca(0) message (while keeping the improvement that alloca returns an AllocaRegion and not a SymbolicRegion on the heap), but that would require significant refactoring (and/or ugly code duplication) and frankly I feel that alloca is rare and this would be a waste of time.

While it might have platform or compiler dependent meaning, those behaviors are non-portable, so I think it is undesirable in most cases and people probably want to be notified about it.

I'd say that "non-portable" and "undesirable in most cases" is true for practically all applications of alloca, not just the alloca(0) corner case. Experienced programmers who use alloca check that their code works with their own particular alloca and wouldn't want to see generic messages that may or may not be relevant for them; while novices ( / projects that don't want to deal with alloca issues) would be better served by a generic "don't use alloca, it's platform-specific" warning.

Regarding binding the return value outside of evalCall, I think that could be addressed in a separate PR. This one does not make the current situation any worse. But the very least we should add a FIXME now (and potentially open a ticket), to reduce the chances of this getting forgotten.

Edit: we already have a TODO, but I think that one does not really do justice to the urgency of the problem, it does not mention that doing bindings in checkers outside of evalCall is not a good idea.

I'll spice up that TODO note; and after merging this commit I could start working on fixing it.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm in favor of this change.
On the other hand, I'd urge for not to regress on the diagnostics.
To me, alloca is like a VLA; which is prone to misuses, thus the edge-cases count there (esp. if tainted).

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Nov 28, 2023

I'm merging this commit in its current shape because even if I'd reimplement a warning for the "use of alloc(0)", I'd do it in a separate commit. However, I lean towards not doing it because I thought about the potential approaches and they all involve either large-scale refactoring or adding ugly hacks to the codebase.

Also note that these messages are coming from a Location callback, so if I understand it correctly they wouldn't be displayed if my commit #72107 is merged and ArrayBoundV2 is also enabled. In fact, perhaps we should consider transferring this "access of zero-allocated memory" bug category to the array bound checker, because it's very similar to other buffer overruns and it's difficult to distribute the responsibility between the two checkers. (Also note that currently we have an ugly data duplication that MallocChecker maintains its own private "zero allocated" state instead of just checking $dynamic\_extent == 0$.)

Anyway, I think that this is a very minor and secondary feature and the only reason why we're talking about it is that it was accidentally working when we modeled the zero-allocated alloca region as a zero-allocated heap region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants