Skip to content

Add condition variables to pico_sync (fixes #1093) #1101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

pguyot
Copy link

@pguyot pguyot commented Nov 16, 2022

This PR provides condition variables as companion to mutexes.

It is implemented without any assumption on the number of cores.
Like mutexes, condition variables are protected by a spinlock.

When waiting on a condition variable, a core tries to be the waiter (=owner) and when it is, it waits to be signaled.
When signaling a condition variable, the caller verifies that there is a waiter, and if there is, sets a boolean to signal it.

There is a trick to avoid holding two spinlocks which was incompatible with RP2350: the condition variable spinlock protects the cv waiter and the broadcast flag only, while the mutex spinlock is used in cond_wait and cond_signal to for the signal. Signaling means locking two spinlocks (or only one if they are the same). This introduces a potential race condition if more than one core try to signal at the same time, which can be solved by only calling cond_signal while mutex is held.

This busy-loop implementation seems to be immune from spurious wakeup.

@alastairpatrick
Copy link
Contributor

I'm just a contributor so please take this with a "pinch of salt" and I suggest waiting for someone with authority before acting on any of this.

First of all, if this is to be called a condition variable module, I think it would be reasonable for an SDK user to expect it to also include a cond_broadcast() or similarly named function that signals all waiting threads. If an RTOS is in use, there could be >1 waiting threads.

I appreciate that implementing cond_broadcast() is a challenge in the context of the SDK, in part because there is nothing resembling a thread control block that could be used to efficiently implement a linked list of threads waiting for a particular condition variable. I have two suggestions.

My first suggestion is a (hopefully) upcoming thread local variable module for the SDK, which essentially provides a thread control block by another name.

My second suggestion is to add a broadcast_count variable to cond_t, which atomically increments whenever cond_broadcast() is called. I've never implemented condition variables this way and I wonder if it's flawed but I think it might give waiting threads enough information to all wake up on broadcast:

typedef struct __packed_aligned
{
    lock_core_t core;
    lock_owner_id_t priority_waiter;  // arbitrary one of N threads that will wake up on cond_signal().
    uint64_t broadcast_count;  // all N threads will wake up when this changes.
    bool signaled;
} cond_t;

Finally, I think there are some issues with condition variables as implemented.

There are three cases where cond_wait() blocks by calling lock_internal_spin_unlock_with_wait():

  1. A thread blocks until the cond_t is signalled.
  2. A thread blocks until the cond_t has no current waiter.
  3. A thread blocks until the mutex_t is released.

I use the term "blocked" here, and in the operating system sense, because that is exactly what will happen when an RTOS is in use: the calling thread will transition to a blocked state. In order to transition the thread out of the blocked state, another thread must call lock_internal_spin_unlock_with_notify().

Of the three, case 1 is covered by the call to lock_internal_spin_unlock_with_notify() in cond_signal() and I believe this case is fully covered. I think there are issues with cases 2 and 3 though.

In case 2, I there's no notification when cond->waiter becomes LOCK_INVALID_OWNER_ID so a thread waiting to become the current waiter might never wake up.

Case 3 is partially covered by the call to lock_internal_spin_unlock_with_notify() in mutex_exit(). However, code in cond_wait() also releases the mutex by bypassing the public mutex API and setting mtx->owner = LOCK_INVALID_OWNER_ID. So, like case 2, a thread waiting for the mutex to be released might never wake up.

@pguyot
Copy link
Author

pguyot commented Nov 17, 2022

Thank you.

  • I wasn't clear about the notify semantics and obviously got it wrong
  • I don't need broadcast for my use case, but thought of an implementation slightly different from your suggestion. I'll give it more thought, I like yours of using a counter that is unlikely to overflow. Maybe 32 bits would be sufficient.

@alastairpatrick
Copy link
Contributor

Perhaps it's slightly less efficient but I think cond_wait() is clearer refactored like this. I haven't attempted to fix any of the issues I mentioned above.

void __time_critical_func(cond_wait)(cond_t *cond, mutex_t *mtx) {
    lock_owner_id_t caller = lock_get_caller_owner_id();
    uint32_t save = save_and_disable_interrupts();
    spin_lock_unsafe_blocking(mtx->core.spin_lock);
    assert(lock_is_owner_id_valid(mtx->owner));

    if (mtx->core.spin_lock != cond->core.spin_lock) {
        spin_lock_unsafe_blocking(cond->core.spin_lock);
    }
    
    // Wait to be the waiter first, then wait for the signal.
    if (lock_is_owner_id_valid(cond->waiter)) {
        mtx->owner = LOCK_INVALID_OWNER_ID;
        spin_unlock_unsafe(mtx->core.spin_lock);
        do {
            if (!lock_is_owner_id_valid(cond->waiter)) {
                cond->waiter = caller;
                break;
            }
            lock_internal_spin_unlock_with_wait(&cond->core, save);
            save = spin_lock_blocking(cond->core.spin_lock);
        } while (true);
    } else {
        cond->waiter = caller;
        mtx->owner = LOCK_INVALID_OWNER_ID;
        spin_unlock_unsafe(mtx->core.spin_lock);
    }
    
    // We are the current waiter, now wait for the signal.
    do {
        if (cond->signaled) {
            cond->waiter = LOCK_INVALID_OWNER_ID;
            cond->signaled = false;
            break;
        }
        lock_internal_spin_unlock_with_wait(&cond->core, save);
        save = spin_lock_blocking(cond->core.spin_lock);
    } while (true);
      
    if (mtx->core.spin_lock != cond->core.spin_lock) {
        spin_unlock_unsafe(cond->core.spin_lock);
    }
    
    do {
        if (!lock_is_owner_id_valid(mtx->owner)) {
            mtx->owner = caller;
            spin_unlock(mtx->core.spin_lock, save);
            break;
        }
        lock_internal_spin_unlock_with_wait(&mtx->core, save);
        save = spin_lock_blocking(mtx->core.spin_lock);
    } while (true);
}

@kilograham kilograham added this to the 1.6.0 milestone Feb 7, 2023
@pguyot pguyot force-pushed the w46/condition-variables branch 2 times, most recently from 58fa5fb to 26d7860 Compare April 9, 2023 15:18
@pguyot
Copy link
Author

pguyot commented Apr 9, 2023

@alastairpatrick Thank you for your feedback. I eventually implemented broadcast as well as timed waits.

pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 9, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@kilograham
Copy link
Contributor

Thanks for the update; I haven't really had a chance to look at this in detail; I do note however that the latest commit uses "_sev()" in some path which is not valid under a RTOS (it is what lock_internal_spin_unlock_with_notify is there for). So hopefully things can be refactored to use that method in some way - it seems like the @alastairpatrick version might do that... sorry busy with other stuff atm, so not really engaging brain fully!

@pguyot
Copy link
Author

pguyot commented Apr 9, 2023

@kilograham Thank you for this feedback.

The __sev() call was here to solve an issue raised by @alastairpatrick which wasn't solved by his refactorization proposal, as he wrote. I rewrote the code to use lock_internal_spin_unlock_with_notify instead of spin_unlock_unsafe on the paths where __sev() was called.

pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Apr 10, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@pguyot pguyot force-pushed the w46/condition-variables branch from 23c4cb5 to 653b354 Compare June 4, 2023 17:09
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 4, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 4, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
pguyot added a commit to pguyot/AtomVM that referenced this pull request Jun 5, 2023
Also slightly modify SMP code to handle the specific case of the Pico.

Code relies on condition variables implementation proposed in this upstream PR:
raspberrypi/pico-sdk#1101

Signed-off-by: Paul Guyot <[email protected]>
@kilograham kilograham self-assigned this May 19, 2024
@kilograham kilograham modified the milestones: 1.6.1, 1.7.0 May 19, 2024
@pguyot pguyot force-pushed the w46/condition-variables branch from 653b354 to f593b3a Compare September 21, 2024 06:06
@pguyot
Copy link
Author

pguyot commented Sep 21, 2024

@kilograham this was rebased on top of SDK 2.0 and tested on RP2040, RP2350 ARM and RP2350 RISCV.

@pguyot pguyot force-pushed the w46/condition-variables branch from f593b3a to ee1abdf Compare May 11, 2025 04:30
@pguyot
Copy link
Author

pguyot commented May 11, 2025

@kilograham this has been rebased on top of SDK 2.1.1. Bazel compilation was fixed. Tests have been extended to increase coverage. They didn't pass on Pico 2W as there was a race condition that has been fixed (mutex spinlock was released before condition waiter was set).

@pguyot pguyot marked this pull request as draft May 11, 2025 09:14
@pguyot
Copy link
Author

pguyot commented May 11, 2025

Marked as draft as further tests seem to point that this could be faulty.

@pguyot pguyot force-pushed the w46/condition-variables branch from ee1abdf to a70d1c7 Compare May 11, 2025 14:25
@pguyot pguyot marked this pull request as ready for review May 11, 2025 14:25
@pguyot pguyot marked this pull request as draft May 13, 2025 06:08
Implement condition variables as a companion to mutexes. Condition variables
can be signaled and broadcast and implementation should work with any number
of cores.

To prevent deadlocks, at most a single spin lock is held at a given time. As a
result there can be a race condition if several cores call cond_signal at the
same time (without holding the mutex).
@pguyot pguyot force-pushed the w46/condition-variables branch from a70d1c7 to b333635 Compare May 13, 2025 20:29
@pguyot pguyot marked this pull request as ready for review May 13, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants