How to handle Kunit killing the current thread on assertion failures? #759

bjorn3 · 2022-05-05T16:07:21Z

Documentation tests currently use kunit_do_failed_assertion which kills the current thread without unwinding the stack. This is UB in Rust as it deallocates the stack even if there may still be references to it. The first case where this can happen is in case of pinned stack variables. Pin requires that the value has it's Drop impl invoked before the memory is reused. This explicitly allows for example intrusive data structures. The second case is with scoped threads where child threads can reference variables on the stack of the parent thread. In this case the parent thread must wait for the completion of the child threads before returning from the current function. Exiting the current thread forcefully doesn't give the scoped thread implementation the chance of doing this. And finally there may be other unsafe code that temporarily puts references to variables on the stack of the current thread in global places. Yesterday me and @ojeda discussed in private how we should handle this. We came up with several options:

Keep aborting the current thread

This option is based on two observations:

Kunit is explicitly not meant for production, so UB is not as catastrophic as it would normally be.
Kunit prints the test failure before exiting the current thread. This means that even if everything goes wrong later on, at the test suite still reports at least one failure if there are any failures. The results of later tests may not be reliable anymore, but that is not a big deal.

We could make this option less likely to lead to crashes by avoiding pinning of stack variables.

Unwind the stack on test failures

In this option we would do panic!() in case of test failures and then unwind the stack. This option is free of any UB on the rust side and is the option how rust's builtin test framework works. The disadvantage of this option is that it requires adding a full DWARF unwinder in kernel space. Furthermore this depends on the currently unstable extern "C-unwind" abi to initiate the unwinding. It also gives some questions around how to handle unwinding past C kernel code.

Require all functions using the Kunit assertion macros to return `TestResult<T>`

In this option the macros would return Err(TestFailure) after reporting the failure allowing the failure to bubble up like a regular error. The advantage of this option is that it is simple to implement. The disadvantage is that it clutters the code a bit, which is not all that nice, especially for doctests.

The text was updated successfully, but these errors were encountered:

sulix · 2022-05-10T12:21:14Z

I'm not sure if there's anything you could do with KUnit resources — for instance, it's possible to use kunit_add_resource to execute some code when the test exits, which should run even if the thread is killed via a failed assertion.

In addition, KUnit resource allocations form a stack: they're guaranteed to run in the opposite order that they were created, which could make them useful for stack unwinding.

I suspect something based on this would take a lot of work to implement, and even then only partly solve the problem. But it might prove useful somewhere…

bjorn3 · 2022-05-10T12:58:33Z

Interesting. That would be useful to for example prevent deadlocks due to locks not being unlocked. I think it is not possible to create a watertight unwinding implementation without compiler support to automatically register destructors akin to SjLj unwinding though.

DemiMarie · 2022-05-14T13:45:58Z

I don’t think this is actually specific to Kunit. Any Rust panic or C BUG() call will also trigger this, which makes this a soundness problem for the bindings.

There are a few solutions I can think of:

Call Rust destructors during an oops. Linux already has a built-in unwinder, but I am not sure if it is sufficient for this purpose. Also, the destructors could make things worse, by interacting with inconsistent C-side state.
Require panic_on_oops. This is unlikely to be viable as it has already been NAKd by Linus.
Leak the current thread on BUG(): do not deallocate the kernel stack, and pretend as if the thread is stuck in an infinite loop. Userspace would not be impacted.
Leak the current thread if there are any Rust destructors that would be skipped. Not sure if this is a viable option.

…king With CONFIG_DMA_API_DEBUG enabled, the following warning is observed: DMA-API: snd_hda_intel 0000:03:00.1: device driver failed to check map error[device address=0x00000000ffff0000] [size=20480 bytes] [mapped as single] WARNING: CPU: 28 PID: 2255 at kernel/dma/debug.c:1036 check_unmap+0x1408/0x2430 CPU: 28 UID: 42 PID: 2255 Comm: wireplumber Tainted: G W L 6.12.0-10-133577cad6bf48e5a7848c4338124081393bfe8a+ Rust-for-Linux#759 debug_dma_unmap_page+0xe9/0xf0 snd_dma_wc_free+0x85/0x130 [snd_pcm] snd_pcm_lib_free_pages+0x1e3/0x440 [snd_pcm] snd_pcm_common_ioctl+0x1c9a/0x2960 [snd_pcm] snd_pcm_ioctl+0x6a/0xc0 [snd_pcm] ... Check for returned DMA addresses using specialized dma_mapping_error() helper which is generally recommended for this purpose by Documentation/core-api/dma-api.rst. Fixes: c880a51 ("ALSA: memalloc: Use proper DMA mapping API for x86 WC buffer allocations") Reported-by: Mikhail Gavrilov <[email protected]> Closes: https://lore.kernel.org/r/CABXGCsNB3RsMGvCucOy3byTEOxoc-Ys+zB_HQ=Opb_GhX1ioDA@mail.gmail.com/ Tested-by: Mikhail Gavrilov <[email protected]> Signed-off-by: Fedor Pchelkin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]>

ojeda added • lib Related to the `rust/` library. prio: normal labels May 5, 2022

sulix mentioned this issue Jan 17, 2023

Allow to run unit tests using KUnit with a user-space like syntax #950

Closed

ojeda removed the prio: normal label Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle Kunit killing the current thread on assertion failures? #759

How to handle Kunit killing the current thread on assertion failures? #759

bjorn3 commented May 5, 2022

sulix commented May 10, 2022

bjorn3 commented May 10, 2022

DemiMarie commented May 14, 2022

How to handle Kunit killing the current thread on assertion failures? #759

How to handle Kunit killing the current thread on assertion failures? #759

Comments

bjorn3 commented May 5, 2022

Keep aborting the current thread

Unwind the stack on test failures

Require all functions using the Kunit assertion macros to return TestResult<T>

sulix commented May 10, 2022

bjorn3 commented May 10, 2022

DemiMarie commented May 14, 2022

Require all functions using the Kunit assertion macros to return `TestResult<T>`