Skip to content

[clang static analyzer] false negative related to alpha.security.ArrayBoundV2 #70187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0x21af opened this issue Oct 25, 2023 · 3 comments · Fixed by #72107
Closed

[clang static analyzer] false negative related to alpha.security.ArrayBoundV2 #70187

0x21af opened this issue Oct 25, 2023 · 3 comments · Fixed by #72107
Assignees

Comments

@0x21af
Copy link

0x21af commented Oct 25, 2023

For this case, If l_2003[9][0] is accessed, the analyzer would report "Out of bound memory access". However, when accessing l_2003[9][0].a, it doesn't.

struct S1 {
    unsigned a : 2
} b() {
    struct S1 l_2003[5][4] = {};
    l_2003[9][0].a;
}
int main() { b(); }

See it live: https://godbolt.org/z/cxYYv6vMv

Could you take a look when you have time? Many thanks!

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Oct 25, 2023
@EugeneZelenko EugeneZelenko added clang:static analyzer and removed clang Clang issues not falling into any other category labels Oct 25, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2023

@llvm/issue-subscribers-clang-static-analyzer

Author: ML.M (hallo-mars)

For this case, If `l_2003[9][0]` is accessed, the analyzer would report "Out of bound memory access". However, when accessing `l_2003[9][0].a`, it doesn't.
struct S1 {
    unsigned a : 2
} b() {
    struct S1 l_2003[5][4] = {};
    l_2003[9][0].a;
}
int main() { b(); }

See it live: https://godbolt.org/z/cxYYv6vMv

Could you take a look when you have time? Many thanks!

@NagyDonat NagyDonat self-assigned this Oct 26, 2023
@NagyDonat
Copy link
Contributor

I'm gradually rewriting this checker to fix its various limitations, and this issue is already on my TODO list. After merging my current improvements, the next step could be a commit that fixes this.

This issue is caused by the fact that the checker looks for memory access operations (checkLocation() callback) that access an ElementRegion, but an expression like array[index].field implies an access to a FieldRegion (which happens to be a sub-region of an ElementRegion, but the checker doesn't investigate that). Note that when accessing nested sub-regions (like array[index].field) the analyzer engine only calls the checkLocation() callback for the innermost accessed region.

It'd be possible to fix this "directly" by looking for ElementRegion layers in the middle of the stack of nested sub-regions (e.g. array[index].field is a FieldRegion, but its parent is an ElementRegion, so we perform bounds checking for it). However, my plan is that I'll switch from the abstract checkLocation() callback to callbacks that observe the concrete array subscript operators (and pointer dereference to handle things like *(array + index)), because this would also let me improve the quality of the messages.

@0x21af
Copy link
Author

0x21af commented Oct 26, 2023

I'm gradually rewriting this checker to fix its various limitations, and this issue is already on my TODO list. After merging my current improvements, the next step could be a commit that fixes this.

This issue is caused by the fact that the checker looks for memory access operations (checkLocation() callback) that access an ElementRegion, but an expression like array[index].field implies an access to a FieldRegion (which happens to be a sub-region of an ElementRegion, but the checker doesn't investigate that). Note that when accessing nested sub-regions (like array[index].field) the analyzer engine only calls the checkLocation() callback for the innermost accessed region.

It'd be possible to fix this "directly" by looking for ElementRegion layers in the middle of the stack of nested sub-regions (e.g. array[index].field is a FieldRegion, but its parent is an ElementRegion, so we perform bounds checking for it). However, my plan is that I'll switch from the abstract checkLocation() callback to callbacks that observe the concrete array subscript operators (and pointer dereference to handle things like *(array + index)), because this would also let me improve the quality of the messages.

Thanks for your comprehensive explanation of the issue.

I deeply appreciate the efforts you've made to enhance the quality of the checker. Looking forward to the updates :)

NagyDonat added a commit to Ericsson/llvm-project that referenced this issue Nov 13, 2023
...instead of the currently used, more abstract Location callback. The
main advantage of this change is that after it the checker will check
`array[index].field` while the previous implementation ignored this
situation (because here the ElementRegion is wrapped in a FieldRegion
object). This improvement fixes PR llvm#70187.

Note that after this change `&array[idx]` will be handled as an access
to the `idx`th element of `array`, which is technically incorrect but
matches the programmer intuitions. In my opinion it's more helpful if
the report points to the source location where the indexing happens
(instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case
when the past-the-end pointer of an array is formed as `&arr[length]`. I
think this is just unimportant pedantery (because it's cleaner to write
the past-the-end pointer as `(arr+length)` and that form isn't affected
by this checker), but if it does appear in real code, then we could
introduce a special case for it.

In addition to this primary improvement, this change tweaks the message
for the tainted index/offset case (using the more concrete information
that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing `check::Location` with
`check::PostStmt<...>` callbacks) was already proposed in my change
https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by
@steakhal. Those reviews were both abandoned, but the issues were
unrelated to the change that is introduced in this PR.
@NagyDonat NagyDonat linked a pull request Nov 13, 2023 that will close this issue
NagyDonat added a commit that referenced this issue Dec 5, 2023
...instead of the currently used, more abstract Location callback. The
main advantage of this change is that after it the checker will check
`array[index].field` while the previous implementation ignored this
situation (because here the ElementRegion is wrapped in a FieldRegion
object). This improvement fixes PR #70187.

Note that after this change `&array[idx]` will be handled as an access
to the `idx`th element of `array`, which is technically incorrect but
matches the programmer intuitions. In my opinion it's more helpful if
the report points to the source location where the indexing happens
(instead of the location where a pointer is finally dereferenced).

As a special case, this change allows code that forms the past-the-end
pointer of an array as `&arr[size]` (but still rejects code like
`if (idx >= size) return &array[idx];` and code that dereferences a
past-the-end pointer).

In addition to this primary improvement, this change tweaks the message
for the tainted index/offset case (using the more concrete information
that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing `check::Location` with
`check::PostStmt<...>` callbacks) was already proposed in my change
https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by
steakhal. Those reviews were both abandoned, but the problems that led
to abandonment were unrelated to the change that is introduced in this
PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants