Skip to content

[bug] clang incorrectly uses coroutine frame for scratch space after suspending #65018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobsa opened this issue Aug 27, 2023 · 20 comments
Closed
Labels
c++20 clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines miscompilation

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Aug 27, 2023

Here is a simple library with a function that uses a foreign await_suspend method with symmetric transfer of control to await each element of an array of 32 integers, simulating a more complex construct in my codebase:

#include <coroutine>

// A simple awaiter type with an await_suspend method that can't be
// inlined.
struct Awaiter {
  const int& x;

  bool await_ready() { return false; }
  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h);
  void await_resume() {}
};

struct MyTask {
  // A lazy promise with an await_transform method that supports awaiting
  // integer references using the Awaiter struct above.
  struct promise_type {
    MyTask get_return_object() { return {}; }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception();

    auto await_transform(const int& x) { return Awaiter{x}; }
  };
};

// A global array of integers.
int g_array[32];

// A coroutine that awaits each integer in the global array.
MyTask FooBar() {
  for (const int& x : g_array) {
    co_await x;
  }
}

Clang at trunk (currently a738bdf35eaa, using -std=c++20 -O1 -fno-exceptions) miscompiles this, using the coroutine frame for scratch space after await_suspend returns:

FooBar() [clone .resume]:                      # @FooBar() [clone .resume]
        push    r15
        push    r14
        push    r12
        push    rbx
        push    rax
        
        ; Throughout the function, rbx contains the address of the coroutine
        ; frame.
        mov     rbx, rdi
[...]

.LBB4_2:
        lea     r14, [rbx + 40]
        
        ; In this basic block, r15 contains offset 32 in the coroutine frame,
        ; which clang uses for scratch space. In the larger example in my real
        ; codebase it reuses this for multiple purposes, but in this small
        ; function it uses it only once (see below).
        lea     r15, [rbx + 32]
[...]

        ; When the coroutine suspends, clang dumps the handle returned by
        ; await_suspend into the scratch space.
        call    Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)@PLT
        mov     qword ptr [rbx + 32], rax
        
        ; Afterward it loads that handle back from scratch space and uses the
        ; resume function in the first word of the coroutine frame it refers
        ; to as the target of an indirect jump, for symmetric transfer of
        ; control.
        mov     rdi, r15
        call    std::__n4861::coroutine_handle<void>::address() const
        mov     rdi, rax
        add     rsp, 8
        pop     rbx
        pop     r12
        pop     r14
        pop     r15
        jmp     qword ptr [rax]                 # TAILCALL
[...]

(Compiler Explorer)

As discussed in #56301, it is not safe to use the coroutine frame for scratch space after the coroutine has suspended. This introduces a data race, because by the time the suspending thread writes to the scratch space the coroutine may have been resumed or destroyed by another thread. In my codebase this causes segfaults and use-after-free reports.

@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2023

@llvm/issue-subscribers-coroutines

@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2023

@llvm/issue-subscribers-c-20

@ChuanqiXu9
Copy link
Member

In the larger example in my real
; codebase it reuses this for multiple purposes, but in this small
; function it uses it only once (see below).

Do all other uses follow the same pattern?

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

Do all other uses follow the same pattern?

No, and sorry for not providing more detail. In the larger example this includes other uses that are not centered around a co_await, for example locking a mutex that is unlocked again before the next suspension. Clang seems to just be spilling registers here, using it for generic scratch space, the way it normally would the stack.

Here are some snippets from the larger example (in this case the scratch space is at offset 0xf8, which is stored in r13):

   ; Point the `this` pointer for a mutex at the scratch space before locking
   ; it.
   ;
   ; (I haven't checked, but I assume the mutex doesn't live past a suspension
   ; point.)
   0x00000000006a743e <+286>:   mov    rdi,r13
   0x00000000006a7441 <+289>:   call   0xeee6a0 <_ZN4absl5MutexD2Ev>
[...]

   ; Put the contents of rcx into the scratch space, then call a function that
   ; accepts a reference as its second word (in rsi).
   0x00000000006a7604 <+740>:   mov    QWORD PTR [rbx+0xf8],rcx
   0x00000000006a760b <+747>:   mov    QWORD PTR [rbx+0x100],0x0
   0x00000000006a7616 <+758>:   mov    QWORD PTR [rbx+0x108],rax
   0x00000000006a761d <+765>:   lea    rdi,[rip+0xffffffffffff9fec]        # 0x6a1610 <_ZZN2k35FnRefIFvRNS_8internal11SideEffectsEEEC1INSt3__u14__bind_front_tIMNS_8CoFutureIvEEFvS3_EJPSA_EEEvEEOT_ENUllS3_E_8__invokeElS3_>
   0x00000000006a7624 <+772>:   mov    rsi,r13
   0x00000000006a7627 <+775>:   call   0x6bd9a0 <_ZN2k38internal18ProcessSideEffectsENS0_10SideEffectE>
[...]
   
   ; Another example like the one above.
   0x00000000006a7639 <+793>:   mov    QWORD PTR [rbx+0xf8],rax
   0x00000000006a7640 <+800>:   mov    QWORD PTR [rbx+0x100],0x0
   0x00000000006a764b <+811>:   mov    QWORD PTR [rbx+0x108],r15
   0x00000000006a7652 <+818>:   lea    rdi,[rip+0xffffffffffff9fb7]        # 0x6a1610 <_ZZN2k35FnRefIFvRNS_8internal11SideEffectsEEEC1INSt3__u14__bind_front_tIMNS_8CoFutureIvEEFvS3_EJPSA_EEEvEEOT_ENUllS3_E_8__invokeElS3_>
   0x00000000006a7659 <+825>:   mov    rsi,r13
   0x00000000006a765c <+828>:   call   0x6bd9a0 <_ZN2k38internal18ProcessSideEffectsENS0_10SideEffectE>
[...]

   ; The incorrect code that matches the small reproducer in this bug report.
   0x00000000006a77f8 <+1240>:  call   0x6a2b90 <_ZZN2k38internal16CoroutinePromiseIvvE20AwaitTransformCommonIJEPNS0_23StartCoroutineAwaitableEEEDaT0_EN9Awaitable13await_suspendENSt3__u16coroutine_handleIvEE>
   0x00000000006a77fd <+1245>:  mov    QWORD PTR [rbx+0xf8],rax
   0x00000000006a7804 <+1252>:  mov    rdi,r13
   0x00000000006a7807 <+1255>:  call   0x6a07b0 <_ZNKSt3__u16coroutine_handleIvE7addressEv>
   0x00000000006a780c <+1260>:  mov    rdi,rax
   0x00000000006a780f <+1263>:  add    rsp,0x38
   0x00000000006a7813 <+1267>:  pop    rbx
   0x00000000006a7814 <+1268>:  pop    r12
   0x00000000006a7816 <+1270>:  pop    r13
   0x00000000006a7818 <+1272>:  pop    r14
   0x00000000006a781a <+1274>:  pop    r15
   0x00000000006a781c <+1276>:  pop    rbp
   0x00000000006a781d <+1277>:  jmp    QWORD PTR [rax]

I am not an expert here, but it seems that clang thinks the coroutine frame is a local variable that has not escaped, and so can be used as temporary space. This is correct only until suspension.

@ChuanqiXu9
Copy link
Member

So the pattern to me is that the generated code will store the coroutine handle returned from await_suspend to the coroutine frame while the users think it is unnecessary, do I understand correctly?

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

The pattern in common with the reproducer above and with my real code is:

  • Clang writes to the coroutine frame after suspending. This is incorrect, because it introduces a data race. This part is the bug.

  • It does this in order to temporarily store the std::coroutine_handle result of await_suspend, which it then uses as a reference when calling std::coroutine_handle::address. This is not necessary in theory (the result is already in rax) and can be optimized away, but that is not a bug. The bug is only where it is stored: it should not be on the coroutine frame, but rather on the thread's stack.

@ChuanqiXu9
Copy link
Member

Yeah, I mean, is there any other store to the coroutine frame except the result from await_suspend?

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

Yeah, I mean, is there any other store to the coroutine frame except the result from await_suspend?

In my reproducer, the answer is no. In the larger example the answer is yes (see the excerpts in this comment), although I don't think any of them are illegal except for the result from await_suspend.


By the way, this might be a regression. Clang 16.0.0 doesn't generate the illegal code:

        call    Awaiter::await_suspend(std::__n4861::coroutine_handle<void>)@PLT
        mov     rdi, rax
        pop     rcx
        jmp     qword ptr [rax]                 # TAILCALL

I'm not sure if this is because it sees an optimization that trunk doesn't, or because it understands that the code generated by trunk is illegal.

@ChuanqiXu9
Copy link
Member

In my reproducer, the answer is no. In the larger example the answer is yes (see the excerpts in #65018 (comment)), although I don't think any of them are illegal except for the result from await_suspend.

If I read correctly, the only problematic case in that comment is:

0x00000000006a77f8 <+1240>: call 0x6a2b90 <_ZZN2k38internal16CoroutinePromiseIvvE20AwaitTransformCommonIJEPNS0_23StartCoroutineAwaitableEEEDaT0_EN9Awaitable13await_suspendENSt3__u16coroutine_handleIvEE>
0x00000000006a77fd <+1245>: mov QWORD PTR [rbx+0xf8],rax

which is another case we store the result of await_suspend to the coroutine frame.


I'm not sure if this is because it sees an optimization that trunk doesn't, or because it understands that the code generated by trunk is illegal.

It may be a coincidence. Previously, we always (incorrectly) think it is safe to put things on the coroutine frame and it is only unsafe if we put something outside the frame.

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

If I read correctly, the only problematic case in that comment is:

0x00000000006a77f8 <+1240>: call 0x6a2b90 <_ZZN2k38internal16CoroutinePromiseIvvE20AwaitTransformCommonIJEPNS0_23StartCoroutineAwaitableEEEDaT0_EN9Awaitable13await_suspendENSt3__u16coroutine_handleIvEE>
0x00000000006a77fd <+1245>: mov QWORD PTR [rbx+0xf8],rax

which is another case we store the result of await_suspend to the coroutine frame.

Correct. My only complaint is clang must not write to the coroutine frame after suspending, but in both functions it does so.

Is it possible to implement that logic directly? Never write to the coroutine frame after suspending, e.g. by treating it as having been deallocated by await_suspend?

@ChuanqiXu9
Copy link
Member

It may not be straightforward to implement such semantics in the middle end.. I need to think more about how to fix it.

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

@ChuanqiXu9 I bisected to find that this problem was introduced by c4672454, your commit from the previous thread (#56301). This issue is actually worse for me than the previous one, because after your commit my workarounds for the previous issue no longer prevent the bug.

Would it make sense to roll back that commit and reopen that thread?


Before that commit:

        call    _ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE@PLT
.Ltmp1:
# %bb.3:
        mov     rdi, rax
        add     rsp, 8
        .cfi_def_cfa_offset 24
        pop     rbx
        .cfi_def_cfa_offset 16
        pop     r14
        .cfi_def_cfa_offset 8
        jmp     qword ptr [rax]                 # TAILCALL

After that commit:

        call    _ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE@PLT
.Ltmp1:
# %bb.3:
        mov     qword ptr [rbx + 32], rax
        add     rbx, 32
        mov     rdi, rbx
        call    _ZNKSt7__n486116coroutine_handleIvE7addressEv
        mov     rdi, rax
        add     rsp, 8
        .cfi_def_cfa_offset 40
        pop     rbx
        .cfi_def_cfa_offset 32
        pop     r12
        .cfi_def_cfa_offset 24
        pop     r14
        .cfi_def_cfa_offset 16
        pop     r15
        .cfi_def_cfa_offset 8
        jmp     qword ptr [rax]                 # TAILCALL

@ChuanqiXu9
Copy link
Member

It is a big surprise to me since I think the patch simply add a no-inline attribute to await_suspend. And, sure, we should always revert such changes according to our policy.

ChuanqiXu9 added a commit that referenced this issue Aug 28, 2023
…aiter is not empty"

This reverts commit 9d9c25f.
This reverts commit 19ab266.
This reverts commit c467245.

As the issue #65018 shows,
the previous fix introduce a regression actually. So this commit reverts
the fix by our policies.
@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

Agreed, I'm also surprised by it. Maybe a good outcome here would be to add the reproducer from this thread as a test case so it won't regress again; I don't know what caused it in this case.

@ChuanqiXu9
Copy link
Member

I've already got the idea. The reason may be that the coroutine_handle<>::address() function was marked as noinline incorrectly and unintentionally. I guess you can reproduce the same result by marking both await_suspend and coroutine_handle<>::address() as noinline manually.

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 28, 2023

Ah yes, if that's a side effect that would probably explain it, along with the really poor optimization at -O1 of spilling from rax just to get the value back again.

@EugeneZelenko EugeneZelenko added the clang:codegen IR generation bugs: mangling, exceptions, etc. label Aug 28, 2023
@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2023

@llvm/issue-subscribers-clang-codegen

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 29, 2023

Thanks @ChuanqiXu9, I confirm that the revert fixes my reproducer here and my codebase's real problem (without yet being able to remove the workarounds for #56301). However, should we consider adding this reproducer as a test so it can't regress again?

@ChuanqiXu9
Copy link
Member

Thanks @ChuanqiXu9, I confirm that the revert fixes my reproducer here and my codebase's real problem (without yet being able to remove the workarounds for #56301). However, should we consider adding this reproducer as a test so it can't regress again?

I've already added the test as a reproducer in another commit of #56301

@jacobsa
Copy link
Contributor Author

jacobsa commented Aug 29, 2023

Ah, perfect; thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++20 clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines miscompilation
Projects
Status: Done
Development

No branches or pull requests

5 participants