Skip to content

Incorrect assertion on mutex enter-count in async_context_execute_sync() cross-core path #2527

Closed
@schkovich

Description

@schkovich

TL;DR

The function async_context_threadsafe_background_execute_sync() incorrectly asserts that the recursive mutex is completely unentered (i.e., enter_count == 0) during cross-core invocations. However, it fails to consider who owns the mutex. As a result, it prohibits valid scenarios where the target core (e.g., the async core) legitimately holds the lock while doing its work.

This leads to hard assertion failures and makes the function unusable under typical dual-core setups involving background task scheduling.

Description

In multicore builds, async_context_threadsafe_background_execute_sync() uses this assertion in the cross-core execution path:

hard_assert(!recursive_mutex_enter_count(&self->lock_mutex));

This assertion only checks whether the recursive mutex's enter_count is zero. It does not verify which core, if any, owns the mutex. This check is too strict for valid cross-core calls.

For example, the async core (e.g., Core 1) may legally be processing background work while holding the lock. In this situation, a cross-core call from Core 0 should be allowed to run execute_sync(). However, the assertion fails because enter_count is non-zero, even though the usage is valid and deadlock-free.

This behaviour contradicts the SDK's documentation:

“You should NOT call this method while holding the async_context’s lock.”

The documentation only states that the method should not be called while holding the lock, not that the other core must not acquire the lock.

Instead, the check should ensure that the calling core does not already own the lock.

Steps to Reproduce

  1. Set up a Pico SDK test environment on an Arduino Nano RP2040 Connect (or compatible dual-core RP2040 board).
  2. Configure an async_context_threadsafe_background_t to run on Core 1.
  3. From Core 0, repeatedly invoke async_context_execute_sync() while Core 1 is processing scheduled work via async_context_add_at_time_worker_in_ms().
  4. Wait until a hard assertion is triggered.
  5. When the assert fires and pauses the GDB, inspect frame 4, and examine mutex_state.
  6. Compare findings against the provided disassembly and GDB logs.

Expected Behaviour

  • Cross-core execution of async_context_execute_sync() should succeed when the target core holds the lock (e.g., performing scheduled work).
  • Only same-core recursive misuse (i.e., when the calling core already owns the lock) should result in assertion failure.
  • The assertion should verify ownership, not just the lock’s enter count.
  • Behaviour should align with the documentation: the method must not be called while the calling core holds the lock.

Actual Behaviour

  • The assertion fails even when the calling core does not hold the mutex.

  • GDB inspection reveals that:

    • The mutex owner is 1 (i.e., Core 1),
    • enter_count is 1,
    • The calling core (Core 0) does not own the lock.
  • This confirms that the assertion triggers due to activity on the other core, not the one making the call.

(gdb) frame 4
#4  async_context_threadsafe_background_execute_sync (self_base=0x20001590 <async_ctx>, ...) at async_context_threadsafe_background.c:144
144         hard_assert(!recursive_mutex_enter_count((recursive_mutex_t *)&mutex_state));
(gdb) info locals
mutex_state = {
  core = {
    spin_lock = 0xd0000144
  },
  owner = 1 '\001',
  enter_count = 1 '\001'
}
  • Disassembly confirms that execution is mid-call inside async_context_add_at_time_worker_in_ms() which holds the lock on the async core.

For example:

0x1000a6d0 <+124>: add r6, pc, #196 @ (adr r6, 0x1000a798 <async_context_base_add_at_time_worker+12>)

This indicates that the assertion is triggered during normal background operation, not a logic error by the user.

Environment

Component Version/Details
OS Ubuntu 24.04.2 LTS
Toolchain Custom GCC 14.2 / Newlib 4.3 (toolchain-rp2040-earlephilhower @ 5.140200.240929 (14.2.0))
CMake Version 3.28.3
Compiler arm-none-eabi-gcc 14.2.0
Board Arduino Nano RP2040 Connect
Arduino Core mutex_owner
Pico SDK Version 2.1.1-6-gb1676c18

Additional Information

  1. Introduced a local volatile copy of the recursive_mutex_t to prevent compiler optimisations from eliding reads.
    if (self_base->core_num != get_core_num()) {
        volatile recursive_mutex_t mutex_state = self->lock_mutex;  // Atomic copy of entire mutex state
        hard_assert(!recursive_mutex_enter_count((recursive_mutex_t *)&mutex_state));
  1. Designed a test that reliably triggers the condition during async activity on Core 1.
  2. Used GDB to verify that the mutex is held by the target core (self->core_num), not by the calling core.

Comparison with RTOS implementation

The FreeRTOS variant of async_context_execute_sync() performs an ownership-aware check:

hard_assert(xSemaphoreGetMutexHolder(self->lock_mutex) != xTaskGetCurrentTaskHandle());

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions