-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[bug] clang incorrectly uses coroutine frame for scratch space after suspending #65018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-coroutines |
@llvm/issue-subscribers-c-20 |
Do all other uses follow the same pattern? |
No, and sorry for not providing more detail. In the larger example this includes other uses that are not centered around a Here are some snippets from the larger example (in this case the scratch space is at offset ; Point the `this` pointer for a mutex at the scratch space before locking
; it.
;
; (I haven't checked, but I assume the mutex doesn't live past a suspension
; point.)
0x00000000006a743e <+286>: mov rdi,r13
0x00000000006a7441 <+289>: call 0xeee6a0 <_ZN4absl5MutexD2Ev>
[...]
; Put the contents of rcx into the scratch space, then call a function that
; accepts a reference as its second word (in rsi).
0x00000000006a7604 <+740>: mov QWORD PTR [rbx+0xf8],rcx
0x00000000006a760b <+747>: mov QWORD PTR [rbx+0x100],0x0
0x00000000006a7616 <+758>: mov QWORD PTR [rbx+0x108],rax
0x00000000006a761d <+765>: lea rdi,[rip+0xffffffffffff9fec] # 0x6a1610 <_ZZN2k35FnRefIFvRNS_8internal11SideEffectsEEEC1INSt3__u14__bind_front_tIMNS_8CoFutureIvEEFvS3_EJPSA_EEEvEEOT_ENUllS3_E_8__invokeElS3_>
0x00000000006a7624 <+772>: mov rsi,r13
0x00000000006a7627 <+775>: call 0x6bd9a0 <_ZN2k38internal18ProcessSideEffectsENS0_10SideEffectE>
[...]
; Another example like the one above.
0x00000000006a7639 <+793>: mov QWORD PTR [rbx+0xf8],rax
0x00000000006a7640 <+800>: mov QWORD PTR [rbx+0x100],0x0
0x00000000006a764b <+811>: mov QWORD PTR [rbx+0x108],r15
0x00000000006a7652 <+818>: lea rdi,[rip+0xffffffffffff9fb7] # 0x6a1610 <_ZZN2k35FnRefIFvRNS_8internal11SideEffectsEEEC1INSt3__u14__bind_front_tIMNS_8CoFutureIvEEFvS3_EJPSA_EEEvEEOT_ENUllS3_E_8__invokeElS3_>
0x00000000006a7659 <+825>: mov rsi,r13
0x00000000006a765c <+828>: call 0x6bd9a0 <_ZN2k38internal18ProcessSideEffectsENS0_10SideEffectE>
[...]
; The incorrect code that matches the small reproducer in this bug report.
0x00000000006a77f8 <+1240>: call 0x6a2b90 <_ZZN2k38internal16CoroutinePromiseIvvE20AwaitTransformCommonIJEPNS0_23StartCoroutineAwaitableEEEDaT0_EN9Awaitable13await_suspendENSt3__u16coroutine_handleIvEE>
0x00000000006a77fd <+1245>: mov QWORD PTR [rbx+0xf8],rax
0x00000000006a7804 <+1252>: mov rdi,r13
0x00000000006a7807 <+1255>: call 0x6a07b0 <_ZNKSt3__u16coroutine_handleIvE7addressEv>
0x00000000006a780c <+1260>: mov rdi,rax
0x00000000006a780f <+1263>: add rsp,0x38
0x00000000006a7813 <+1267>: pop rbx
0x00000000006a7814 <+1268>: pop r12
0x00000000006a7816 <+1270>: pop r13
0x00000000006a7818 <+1272>: pop r14
0x00000000006a781a <+1274>: pop r15
0x00000000006a781c <+1276>: pop rbp
0x00000000006a781d <+1277>: jmp QWORD PTR [rax] I am not an expert here, but it seems that clang thinks the coroutine frame is a local variable that has not escaped, and so can be used as temporary space. This is correct only until suspension. |
So the pattern to me is that the generated code will store the coroutine handle returned from await_suspend to the coroutine frame while the users think it is unnecessary, do I understand correctly? |
The pattern in common with the reproducer above and with my real code is:
|
Yeah, I mean, is there any other store to the coroutine frame except the result from |
In my reproducer, the answer is no. In the larger example the answer is yes (see the excerpts in this comment), although I don't think any of them are illegal except for the result from By the way, this might be a regression. Clang 16.0.0 doesn't generate the illegal code: call Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)@PLT
mov rdi, rax
pop rcx
jmp qword ptr [rax] # TAILCALL I'm not sure if this is because it sees an optimization that trunk doesn't, or because it understands that the code generated by trunk is illegal. |
If I read correctly, the only problematic case in that comment is:
which is another case we store the result of
It may be a coincidence. Previously, we always (incorrectly) think it is safe to put things on the coroutine frame and it is only unsafe if we put something outside the frame. |
Correct. My only complaint is clang must not write to the coroutine frame after suspending, but in both functions it does so. Is it possible to implement that logic directly? Never write to the coroutine frame after suspending, e.g. by treating it as having been deallocated by await_suspend? |
It may not be straightforward to implement such semantics in the middle end.. I need to think more about how to fix it. |
@ChuanqiXu9 I bisected to find that this problem was introduced by Would it make sense to roll back that commit and reopen that thread? Before that commit: call _ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE@PLT
.Ltmp1:
# %bb.3:
mov rdi, rax
add rsp, 8
.cfi_def_cfa_offset 24
pop rbx
.cfi_def_cfa_offset 16
pop r14
.cfi_def_cfa_offset 8
jmp qword ptr [rax] # TAILCALL After that commit: call _ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE@PLT
.Ltmp1:
# %bb.3:
mov qword ptr [rbx + 32], rax
add rbx, 32
mov rdi, rbx
call _ZNKSt7__n486116coroutine_handleIvE7addressEv
mov rdi, rax
add rsp, 8
.cfi_def_cfa_offset 40
pop rbx
.cfi_def_cfa_offset 32
pop r12
.cfi_def_cfa_offset 24
pop r14
.cfi_def_cfa_offset 16
pop r15
.cfi_def_cfa_offset 8
jmp qword ptr [rax] # TAILCALL |
It is a big surprise to me since I think the patch simply add a no-inline attribute to await_suspend. And, sure, we should always revert such changes according to our policy. |
Agreed, I'm also surprised by it. Maybe a good outcome here would be to add the reproducer from this thread as a test case so it won't regress again; I don't know what caused it in this case. |
I've already got the idea. The reason may be that the coroutine_handle<>::address() function was marked as |
Ah yes, if that's a side effect that would probably explain it, along with the really poor optimization at |
@llvm/issue-subscribers-clang-codegen |
Thanks @ChuanqiXu9, I confirm that the revert fixes my reproducer here and my codebase's real problem (without yet being able to remove the workarounds for #56301). However, should we consider adding this reproducer as a test so it can't regress again? |
I've already added the test as a reproducer in another commit of #56301 |
Ah, perfect; thanks! |
Here is a simple library with a function that uses a foreign
await_suspend
method with symmetric transfer of control to await each element of an array of 32 integers, simulating a more complex construct in my codebase:Clang at trunk (currently
a738bdf35eaa
, using-std=c++20 -O1 -fno-exceptions
) miscompiles this, using the coroutine frame for scratch space afterawait_suspend
returns:(Compiler Explorer)
As discussed in #56301, it is not safe to use the coroutine frame for scratch space after the coroutine has suspended. This introduces a data race, because by the time the suspending thread writes to the scratch space the coroutine may have been resumed or destroyed by another thread. In my codebase this causes segfaults and use-after-free reports.
The text was updated successfully, but these errors were encountered: