Skip to content

multiprocessing.Pool does not properly restore blocked signals #127586

Closed
@stephen-hansen

Description

@stephen-hansen

Bug report

Bug description:

When instantiating a multiprocessing.Pool, for start methods "spawn" and "forkserver", the resource tracker must register a signal mask in the parent process to block some signals prior to creating the spawned childs, as explained by the comment below:

# bpo-33613: Register a signal mask that will block the signals.
# This signal mask will be inherited by the child that is going
# to be spawned and will protect the child from a race condition
# that can make the child die before it registers signal handlers
# for SIGINT and SIGTERM. The mask is unregistered after spawning
# the child.
try:
if _HAVE_SIGMASK:
signal.pthread_sigmask(signal.SIG_BLOCK, _IGNORED_SIGNALS)
pid = util.spawnv_passfds(exe, args, fds_to_pass)
finally:
if _HAVE_SIGMASK:
signal.pthread_sigmask(signal.SIG_UNBLOCK, _IGNORED_SIGNALS)

The comment makes sense, however the way in which the mask is unregistered seems like a bug, because it can have unintended effects on the parent process.

import signal
from multiprocessing import Pool, set_start_method

def get_blocked_signals():
    return signal.pthread_sigmask(signal.SIG_BLOCK, {})

if __name__ == "__main__":
    set_start_method("spawn")
    signal.pthread_sigmask(signal.SIG_BLOCK, {signal.SIGTERM})
    print(f"I am blocking {get_blocked_signals()}")
    my_pool = Pool(5)
    print(f"After creating Pool, now I am blocking {get_blocked_signals()}")

Running this code on the latest commit here, the output is as follows:

I am blocking {<Signals.SIGTERM: 15>}
After creating Pool, now I am blocking set()

This does not match what I would expect to see, I would have expected the set of blocked signals to remain unchanged after the call to construct the Pool. As far as I know I don't think Pool is supposed to have side effects on the signal handling of the main process.

It appears that in the resource tracker, after spawning the child process, the tracker unblocks the ignored signals rather than restore to the previous sigmask. This works fine for most use cases but has the unintended side effect of unblocking any of the ignored signals that were already blocked prior (as shown in the example above).

This seems relatively trivial to fix so I should have a PR up shortly.

One important thing to note here is with the fix, child processes created by spawn or forkserver will now inherit blocked signals such as SIGTERM, if they are blocked in the parent process. This is already the preexisting behavior for signals that weren't ignored by the resource manager, such as SIGUSR1. Calling methods like terminate(), or garbage collecting the Pool, will hang due to the child processes blocking SIGTERM. In this case Pool cleanup should be handled in the parent using close() followed by join(), or when the Pool is created an initializer is provided in the constructor to unblock SIGTERM in the child. To me this feels more in line with expected behavior.

It might be worthwhile to either adjust Pool.terminate() to call kill() (i.e. SIGKILL) instead on the workers or, to avoid breaking compatibility, provide an implementation of kill() in Pool to match parity with Process, but I'll leave that up for discussion for now.

Please let me know if anything here seems inaccurate or if there are any questions.

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions