Skip to content

WASM rethrow instruction is generated with a wrong index #114600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
purplesyringa opened this issue Nov 1, 2024 · 2 comments · Fixed by #114693
Closed

WASM rethrow instruction is generated with a wrong index #114600

purplesyringa opened this issue Nov 1, 2024 · 2 comments · Fixed by #114693
Assignees

Comments

@purplesyringa
Copy link

purplesyringa commented Nov 1, 2024

This is a wasm target, built with -fwasm-exceptions. Reproducer:

#include <cstdio>

__attribute__((noinline)) void print_current() noexcept {
  try {
    throw;
  } catch (int x) {
    printf("%d\n", x);
  }
}

struct Dropper {
  ~Dropper() {
    try {
      throw 1;
    } catch (...) {
      print_current();
    }
  }
};

int main() {
  try {
    Dropper dropper;
    throw 2;
  } catch (...) {
    print_current();
  }
}

Expected output: 1 2, actual output: 1 1.

(I'm using emcc -O2 -fwasm-exceptions test.cpp for testing if that's important.)

The problem is in the codegen for main:

Wasm IR
(func (;6;) (type 11) (result i32)
  (local i32 i32)
  global.get 0
  local.set 1
  call 22
  local.tee 0
  i32.const 2
  i32.store
  try ;; label = @1
    local.get 0
    call 28
  catch_all
    local.get 1
    global.set 0
    call 22
    local.tee 0
    i32.const 1
    i32.store
    try ;; label = @2
      try ;; label = @3
        local.get 0
        call 28
      catch 0
        local.set 0
        local.get 1
        global.set 0
        local.get 0
        call 29
        drop
        call 5
        call 30
        try ;; label = @4
          try ;; label = @5
            rethrow 2 (;@3;)
          catch 0
            local.set 0
            local.get 1
            global.set 0
            local.get 0
            call 29
            drop
            call 5
            try ;; label = @6
              call 30
            delegate 5
            i32.const 0
            return
          end
          unreachable
        delegate 3
        unreachable
      end
    catch_all
      local.get 1
      global.set 0
      call 37
      unreachable
    end
    unreachable
  end
  unreachable
)

In pseudocode, simplified:

try:
    throw(2)
catch_all:
    try:
        try:
            throw(1)
        catch:
            print_current()
            rethrow(0)
    catch:
        print_current()

So rethrow(0) rethrows the same exception as has just been handled, so the same exception object is caught twice. rethrow(2) would work correctly. The current behavior leads to all sorts of memory corruption and hard-to-debug state.

@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2024

@llvm/issue-subscribers-backend-webassembly

Author: Alisa Sireneva (purplesyringa)

This is a wasm target, built with `-fwasm-exceptions`. Reproducer:
#include &lt;cstdio&gt;

__attribute__((noinline)) void print_current() noexcept {
  try {
    throw;
  } catch (int x) {
    printf("%d\n", x);
  }
}

struct Dropper {
  ~Dropper() {
    try {
      throw 1;
    } catch (...) {
      print_current();
    }
  }
};

int main() {
  try {
    Dropper dropper;
    throw 2;
  } catch (...) {
    print_current();
  }
}

Expected output: 1 2, actual output: 1 1.

(I'm using emcc -O2 -fwasm-exceptions test.cpp for testing if that's important.)

The problem is in the codegen for main:

<details><summary>Wasm IR</summary>

(func (;6;) (type 11) (result i32)
  (local i32 i32)
  global.get 0
  local.set 1
  call 22
  local.tee 0
  i32.const 2
  i32.store
  try ;; label = @<!-- -->1
    local.get 0
    call 28
  catch_all
    local.get 1
    global.set 0
    call 22
    local.tee 0
    i32.const 1
    i32.store
    try ;; label = @<!-- -->2
      try ;; label = @<!-- -->3
        local.get 0
        call 28
      catch 0
        local.set 0
        local.get 1
        global.set 0
        local.get 0
        call 29
        drop
        call 5
        call 30
        try ;; label = @<!-- -->4
          try ;; label = @<!-- -->5
            rethrow 2 (;@<!-- -->3;)
          catch 0
            local.set 0
            local.get 1
            global.set 0
            local.get 0
            call 29
            drop
            call 5
            try ;; label = @<!-- -->6
              call 30
            delegate 5
            i32.const 0
            return
          end
          unreachable
        delegate 3
        unreachable
      end
    catch_all
      local.get 1
      global.set 0
      call 37
      unreachable
    end
    unreachable
  end
  unreachable
)

</details>

In pseudocode, simplified:

try:
    throw(2)
catch_all:
    try:
        try:
            throw(1)
        catch:
            print_current()
            rethrow(0)
    catch:
        print_current()

So rethrow(0) rethrows the same exception as has just been handled, so the same exception object is caught twice. rethrow(1) would work correctly. The current behavior leads to all sorts of memory corruption and hard-to-debug state.

@coolreader18
Copy link

ftr, I think @purplesyringa was interested in maybe working on a fix for this

aheejin added a commit to aheejin/llvm-project that referenced this issue Nov 3, 2024
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @Destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:
https://github.com/llvm/llvm-project/blob/5091a359d9807db8f7d62375696f93fc34226969/llvm/include/llvm/Target/TargetSelectionDAG.td#L700
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes llvm#114600.
@aheejin aheejin closed this as completed in 492812f Nov 6, 2024
aheejin added a commit to aheejin/llvm-project that referenced this issue Nov 12, 2024
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @Destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

https://github.com/llvm/llvm-project/blob/5091a359d9807db8f7d62375696f93fc34226969/llvm/include/llvm/Target/TargetSelectionDAG.td#L700
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes llvm#114600.
tru pushed a commit to aheejin/llvm-project that referenced this issue Nov 19, 2024
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @Destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

https://github.com/llvm/llvm-project/blob/5091a359d9807db8f7d62375696f93fc34226969/llvm/include/llvm/Target/TargetSelectionDAG.td#L700
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes llvm#114600.
nikic pushed a commit to rust-lang/llvm-project that referenced this issue Nov 20, 2024
So far we have assumed that we only rethrow the exception caught in the
innermost EH pad. This is true in code we directly generate, but after
inlining this may not be the case. For example, consider this code:
```ll
ehcleanup:
  %0 = cleanuppad ...
  call @Destructor
  cleanupret from %0 unwind label %catch.dispatch
```

If `destructor` gets inlined into this function, the code can be like
```ll
ehcleanup:
  %0 = cleanuppad ...
  invoke @throwing_func
    to label %unreachale unwind label %catch.dispatch.i

catch.dispatch.i:
  catchswitch ... [ label %catch.start.i ]

catch.start.i:
  %1 = catchpad ...
  invoke @some_function
    to label %invoke.cont.i unwind label %terminate.i

invoke.cont.i:
  catchret from %1 to label %destructor.exit

destructor.exit:
  cleanupret from %0 unwind label %catch.dispatch
```

We lower a `cleanupret` into `rethrow`, which assumes it rethrows the
exception caught by the nearest dominating EH pad. But after the
inlining, the nearest dominating EH pad is not `ehcleanup` but
`catch.start.i`.

The problem exists in the same manner in the new (exnref) EH, because it
assumes the exception comes from the nearest EH pad and saves an exnref
from that EH pad and rethrows it (using `throw_ref`).

This problem can be fixed easily if `cleanupret` has the basic block
where its matching `cleanuppad` is. The bitcode instruction `cleanupret`
kind of has that info (it has a token from the `cleanuppad`), but that
info is lost when when we enter ISel, because `TargetSelectionDAG.td`'s
`cleanupret` node does not have any arguments:

https://github.com/llvm/llvm-project/blob/5091a359d9807db8f7d62375696f93fc34226969/llvm/include/llvm/Target/TargetSelectionDAG.td#L700
Note that `catchret` already has two basic block arguments, even though
neither of them means `catchpad`'s BB.

This PR adds the `cleanuppad`'s BB as an argument to `cleanupret` node
in ISel and uses it in the Wasm backend. Because this node is also used
in X86 backend we need to note its argument there too but nothing more
needs to change there as long as X86 doesn't need it.

---

- Details about changes in the Wasm backend:

After this PR, our pseudo `RETHROW` instruction takes a BB, which means
the EH pad whose exception it needs to rethrow. There are currently two
ways to generate a `RETHROW`: one is from `llvm.wasm.rethrow` intrinsic
and the other is from `CLEANUPRET` we discussed above. In case of
`llvm.wasm.rethrow`, we add a '0' as a placeholder argument when it is
lowered to a `RETHROW`, and change it to a BB in LateEHPrepare. As
written in the comments, this PR doesn't change how this BB is computed.
The BB argument will be converted to an immediate argument as with other
control flow instructions in CFGStackify.

In case of `CLEANUPRET`, it already has a BB argument pointing to an EH
pad, so it is just converted to a `RETHROW` with the same BB argument in
LateEHPrepare. This will also be lowered to an immediate in CFGStackify
with other control flow instructions.

---

Fixes llvm#114600.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants