bpo-31699 Deadlocks in `concurrent.futures.ProcessPoolExecutor` with pickling error #3895

tomMoral · 2017-10-05T07:41:25Z

When using concurrent.futures.ProcessPoolExecutor with objects that are not picklable or unpicklable, several situations results in a deadlock, with the interpreter freezed.

This is the case for different scenario, for instance these three scripts. This PR propose to test the different failure scenario and fix the ones that results on deadlocks.

Overall, the goal is to make concurrent.futures.ProcessPoolExecutor more robust to faulty user code.

This work was done as part of the tommoral/loky#48 with the intent to re-use the executor in multiple independent part of the program, in collaboration with @ogrisel. See #1013 for more the details.

https://bugs.python.org/issue31699

pitrou · 2017-10-05T15:25:52Z

At a high level, I'm a bit surprised by the strategy adopted by this PR. Isn't it possible to keep a functioning executor in case an error occurs when (un)pickling? What does multiprocessing.Pool do (I don't remember)?

The crashing process problem is separate IMHO (and arguably less likely), though it does deserve to be handled.

tomMoral · 2017-10-05T16:34:47Z

For unpickling errors, you have currently no way to get the work_id of the failure once the error occurred, that is why we resorted to flagging the executor as broken. Another strategy which could solve this would be to change the _ResultItem/_CallItem objects to have safe unpickle, which is guaranteed to give back at least the work_id, so in case of failures, we can just add the exception to this job. But I am not sure how to do it.
Note that multiprocessing.Pool is also broken when a result fails at unpickle. (see here ).
If you think this is necessary for this PR, I can look into it.

For pickling errors, this is much simpler and I should add something for the result pickling error to make sure we do not fail the executor in this case. The pickling error for the call item is already handled that way.

tomMoral · 2017-10-05T17:22:32Z

After thinking a bit, my main concern about safe pickling/unpickling of _ResultItem/_CallItem is with the fact that we are agnostic of the Pickler objects used in the call_queue and results_queue. This should be the _ForkingPickler, but this could be changed, depending on the context, as we do in the loky project. So I am not sure if it is the right solution.

ogrisel · 2017-10-31T15:26:00Z

@tomMoral can you please summarize the change you did in the last commit. The tests deadlock now. I think this is no longer robust to the ExitAtUnpickle case (which I think is fine but it should be removed from the test suite).

tomMoral · 2017-11-02T14:42:10Z

@ogrisel In the last commit, I just removed the crash detection thread and the cases in the test suite not handled anymore. It works locally but it seems to fail deterministically on travis with clang. I cannot reproduce it even with a docker image from travis.

ogrisel · 2017-11-03T10:49:53Z

I managed to reproduce the deadlock (without -j4 and with --timeout=300):

test_crash (test.test_concurrent_futures.ProcessPoolForkserverExecutorDeadlockTest) ... error at task pickle
exit at task unpickle
error at task unpickle
crash at task unpickle
crash during func execution on worker
exit during func execution on worker
error during func execution on worker
crash during result pickle on worker
exit during result pickle on worker
error during result pickle on worker
error during result unpickle in result_handler

Timeout (0:05:00)!
Thread 0x00007fa7a0a88700 (most recent call first):
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 296 in wait
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/queues.py", line 224 in _feed
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 865 in run
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 917 in _bootstrap_inner
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 885 in _bootstrap

Thread 0x00007fa7a24a1700 (most recent call first):
  File "/home/parietal/ogrisel/cpython/Lib/selectors.py", line 415 in select
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/connection.py", line 920 in wait
  File "/home/parietal/ogrisel/cpython/Lib/concurrent/futures/process.py", line 308 in _queue_management_worker
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 865 in run
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 917 in _bootstrap_inner
  File "/home/parietal/ogrisel/cpython/Lib/threading.py", line 885 in _bootstrap

Thread 0x00007fa7a748b700 (most recent call first):
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/synchronize.py", line 96 in __enter__
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/queues.py", line 357 in put
  File "/home/parietal/ogrisel/cpython/Lib/concurrent/futures/process.py", line 567 in shutdown
  File "/home/parietal/ogrisel/cpython/Lib/test/test_concurrent_futures.py", line 873 in test_crash
  File "/home/parietal/ogrisel/cpython/Lib/unittest/case.py", line 615 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/case.py", line 663 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/runner.py", line 176 in run
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 1892 in _run_suite
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest
  File "/home/parietal/ogrisel/cpython/Lib/test/test_concurrent_futures.py", line 1185 in test_main
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 2068 in decorator
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/runtest.py", line 176 in runtest_inner
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/runtest.py", line 140 in runtest
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 379 in run_tests_sequential
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 458 in run_tests
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 536 in _main
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 510 in main
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 585 in main
  File "/home/parietal/ogrisel/cpython/Lib/test/__main__.py", line 2 in <module>
  File "/home/parietal/ogrisel/cpython/Lib/runpy.py", line 85 in _run_code
  File "/home/parietal/ogrisel/cpython/Lib/runpy.py", line 193 in _run_module_as_main

ogrisel · 2017-11-03T11:01:38Z

Here is another run, this time the GC of the executor triggers the deadlock (via the wearkref callback):

test_crash (test.test_concurrent_futures.ProcessPoolForkserverExecutorDeadlockTest) ... error at task pickle
exit at task unpickle
error at task unpickle
crash at task unpickle
crash during func execution on worker
exit during func execution on worker
error during func execution on worker
crash during result pickle on worker
exit during result pickle on worker
error during result pickle on worker
error during result unpickle in result_handler
Timeout (0:01:00)!
Thread 0x00007f8bf9956700 (most recent call first):
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/synchronize.py", line 96 in __enter__
  File "/home/parietal/ogrisel/cpython/Lib/multiprocessing/queues.py", line 357 in put
  File "/home/parietal/ogrisel/cpython/Lib/concurrent/futures/process.py", line 486 in weakref_cb
  File "/home/parietal/ogrisel/cpython/Lib/unittest/case.py", line 643 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/case.py", line 663 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 122 in run
  File "/home/parietal/ogrisel/cpython/Lib/unittest/suite.py", line 84 in __call__
  File "/home/parietal/ogrisel/cpython/Lib/unittest/runner.py", line 176 in run
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 1892 in _run_suite
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest
  File "/home/parietal/ogrisel/cpython/Lib/test/test_concurrent_futures.py", line 1186 in test_main
  File "/home/parietal/ogrisel/cpython/Lib/test/support/__init__.py", line 2068 in decorator
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/runtest.py", line 176 in runtest_inner
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/runtest.py", line 140 in runtest
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 379 in run_tests_sequential
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 458 in run_tests
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 536 in _main
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 510 in main
  File "/home/parietal/ogrisel/cpython/Lib/test/libregrtest/main.py", line 585 in main
  File "/home/parietal/ogrisel/cpython/Lib/test/__main__.py", line 2 in <module>
  File "/home/parietal/ogrisel/cpython/Lib/runpy.py", line 85 in _run_code
  File "/home/parietal/ogrisel/cpython/Lib/runpy.py", line 193 in _run_module_as_main

ogrisel · 2017-11-03T12:45:34Z

Note: I also had to use taskset -c 0 python -m test ... to trigger the race condition. The problem is the lock of the result handler queue that has not yet been released when the worker process is terminated.

ogrisel · 2017-11-07T12:24:30Z

@tomMoral this PR is now conflicting with master.

- TST crash in CallItem unpickling - TST crash in func call run - TST crash in result pickling the test include crashes with PythonError/SystemExist/SegFault Also add more tests for race condition when a worker crashes

tomMoral · 2017-11-09T15:35:53Z

This PR has been rebased on master and I think it is ready for review.

ogrisel · 2017-11-21T10:49:31Z

Lib/concurrent/futures/process.py

@@ -75,12 +76,36 @@
 _threads_queues = weakref.WeakKeyDictionary()
 _global_shutdown = False

+
+# This constants control the maximal wakeup. If a job is submitted to the


Typo: "This constant controls"

ogrisel · 2017-11-21T10:55:35Z

Lib/concurrent/futures/process.py

-        if reader in ready:
-            result_item = reader.recv()
+        # Wait for a result to be ready in the result_queue while checking
+        # that worker process are still running.


typo: "that all worker processes are still running"

ogrisel · 2017-11-21T11:06:38Z

Lib/concurrent/futures/process.py

+        # result_queue state. This avoid deadlocks caused by the non
+        # transmission of wakeup signal when a worker died with the
+        # _result_queue write lock.
+        self._wakeup = _Sentinel()


I think this should better be renamed to _queue_management_thread_sentinel to be more explicit.

ogrisel · 2017-11-21T11:07:22Z

Lib/concurrent/futures/process.py

            self._queue_management_thread.daemon = True
            self._queue_management_thread.start()
-            _threads_queues[self._queue_management_thread] = self._result_queue
+            _threads_queues[self._queue_management_thread] = self._wakeup


_threads_queues should be renamed to _threads_sentinels.

ogrisel · 2017-11-21T12:45:49Z

Lib/test/test_concurrent_futures.py

+            p.terminate()
+        executor.shutdown(wait=True)
+        print(f"\nTraceback:\n {tb}", file=sys.__stderr__)
+        self.fail(f"Deadlock executor:\n\n{tb}")


Phrasing: "Executor deadlock:"

ogrisel · 2017-11-21T13:01:29Z

Lib/test/test_concurrent_futures.py

+            from signal import SIGTERM as SIGKILL
+        try:
+            os.kill(pid, SIGKILL)
+            time.sleep(.01)


Why is this sleep necessary? Shouldn't it be part of the caller instead? Or alternatively, kill several times until the OS replies that the process is dead:

n_trials = 3 for i in range(n_trials): try: os.kill(pid, SIGKILL) except (ProcessLookupError, PermissionError): break time.sleep(0.01) else: raise RuntimeError(f"Could not kill process {pid} after {n_trials} trials")

This sleep increases the chances for this process to be killed by another process and give time to make sure that the BrokenProcessPool error occurs before the end of the current function.
I think that if a process fail to kill another one, it means that it was already shutdown so we should not make multiple tries.

pitrou · 2017-12-04T13:36:14Z

Lib/test/test_concurrent_futures.py

+            tb = f.read()
+        for p in executor._processes.values():
+            p.terminate()
+        executor.shutdown(wait=True)


If the executor is in a deadlock state, shutdown(wait=True) will never return, right?

As we kill the processes just before, this should be safe. In this case, if the queue_management_worker is alive, it will flag the ProcessPoolExecutor as broken and clean up the state and if it is not, the call to shutdown won't be blocking and the ressources will still be freed.

pitrou · 2017-12-04T13:37:06Z

Lib/test/test_concurrent_futures.py

+        import faulthandler
+        from tempfile import TemporaryFile
+        with TemporaryFile(mode="w+") as f:
+            faulthandler.dump_traceback(file=f)


Why not file=sys.stderr?

If I recall correctly, faulthandler actually needs a system-level file handle, not just a Python-level file object.

This part of the code is called with test.support.captured_stderr. In this case, the sys.stderr object does not have a fileno and fault_handler.dump_traceback fails. So we resorted to this to allow getting the traceback in the exception. Another option is to dump the traceback on sys.stdout.

I see. Do you think this whole code is still useful? I understand it was useful for you to debug the PR, but now that it seems to pass, is there a reason for keeping it?

The reason to keep it would be to have an informative test report in case of a regression introducing a deadlock. As deadlocks potentially happen randomly, it is always good to have a traceback asap when it happens. So I would rather keep it.
But we could move it to sys.stdout if you think the usage of a tempfile introduces unnecessary complexity. The test report would still have the information but in a degraded format: not in the test report, but in the test stdout.

ogrisel · 2018-01-04T08:43:52Z

@pitrou I think all the comments have been addressed in this PR.

pitrou

@ogrisel thanks for pinging me (did you amend previous commits? I didn't see any recent commits). There are still a couple nits left.

pitrou · 2018-01-04T18:05:17Z

Lib/test/test_concurrent_futures.py

@@ -94,6 +95,7 @@ def tearDown(self):


 class ExecutorMixin:
+    timeout = 30


This doesn't seem used anymore.

pitrou · 2018-01-04T18:05:38Z

Lib/test/test_concurrent_futures.py

@@ -116,6 +118,8 @@ def setUp(self):
        self._prime_executor()

    def tearDown(self):
+        # Remove the reference to self.timer to avoid the thread_cleanup


Also this comment doesn't seem necessary anymore. Or am I missing something?

pitrou · 2018-01-04T18:06:37Z

Lib/test/test_concurrent_futures.py

-        call_queue.close()
+        # Make sure that the queue management thread was properly finished
+        # and the queue was closed by the shutdown process
+        queue_management_thread.join()


It's already joined 6 lines above. The second join() call is a no-op, no?

pitrou · 2018-01-04T18:07:16Z

Lib/test/test_concurrent_futures.py

@@ -759,6 +766,180 @@ def test_ressources_gced_in_workers(self):
                                       ProcessPoolForkserverMixin,
                                       ProcessPoolSpawnMixin))

+def hide_process_stderr():
+    import io
+    setattr(sys, "stderr", io.StringIO())


Is there a subtlety that I'm missing, or can this simply be written as sys.stderr = io.StringIO()?

pitrou · 2018-01-04T18:08:08Z

Lib/test/test_concurrent_futures.py

+
+
+class ExitAtPickle(object):
+    """Bad object that triggers a segfault at pickling time."""


The docstring is wrong here.

pitrou · 2018-01-04T18:08:17Z

Lib/test/test_concurrent_futures.py

+
+
+class ErrorAtPickle(object):
+    """Bad object that triggers a segfault at pickling time."""


And here as well.

pitrou · 2018-01-04T18:08:27Z

Lib/test/test_concurrent_futures.py

+
+
+class ErrorAtUnpickle(object):
+    """Bad object that triggers a process exit at unpickling time."""


And here too :-)

The classic copy/paste mistake... Thanks for pointing it out! :-)

pitrou · 2018-01-04T18:09:31Z

Lib/test/test_concurrent_futures.py

+                    executor.shutdown(wait=True)
+
+    @classmethod
+    def _test_getpid(cls, a):


This doesn't seem used anymore?

pitrou · 2018-01-04T18:11:31Z

Lib/test/test_concurrent_futures.py

+
+
+class ExecutorDeadlockTest:
+    # If ExecutorDeadlockTest takes more than 100secs to complete, it is very


"100secs" isn't in sync with the number below.

The comment was a duplicate from the one 3 lines below so I removed it.

bedevere-bot · 2018-01-04T18:13:24Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pitrou · 2018-01-05T09:58:17Z

@tomMoral thanks for the updates! This is looking good to me now.

ogrisel · 2018-01-05T10:18:30Z

Great, thanks for the review @pitrou!

adam-kral · 2021-12-09T10:28:59Z

Hi, according to github the fix commit is long in python 3.8, right?

However, it still hangs without an error, for example the code you posted originally:

https://gist.github.com/tomMoral/cc27a938d669edcf0286c57516942369#file-fail_pickle_input_shutdown-py

Do I understand it correctly that it should result in an error now rather than just hang?

ogrisel · 2021-12-09T11:23:40Z

https://gist.github.com/tomMoral/cc27a938d669edcf0286c57516942369#file-fail_pickle_input_shutdown-py
Do I understand it correctly that it should result in an error now rather than just hang?

I think this one should terminate without hanging and without errors (since we do not call f.result()). I cannot reproduce the hanging with Python 3.9 on macOS. Which Operating System do you use @adam-kral?

adam-kral · 2021-12-09T12:53:33Z

I use Ubuntu 20.04.3 LTS and Python 3.8.10. Tried Python 3.7.12, hangs as well.

Traceback on KeyboardInterrupt

Traceback (most recent call last):
Process ForkProcess-1:
Process ForkProcess-3:
  File "/home/adam/.config/JetBrains/PyCharm2021.3/scratches/scratch_2.py", line 33, in <module>
Process ForkProcess-2:
    f = e.submit(id, ObjectWithPickleError())
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 644, in __exit__
    self.shutdown(wait=True)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 686, in shutdown
    self._queue_management_thread.join()
  File "/usr/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
    call_item = call_queue.get(block=True)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
    call_item = call_queue.get(block=True)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 96, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 96, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
    call_item = call_queue.get(block=True)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get
    res = self._recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
    elif lock.acquire(block, timeout):
KeyboardInterrupt
Process ForkProcess-4:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 233, in _process_worker
    call_item = call_queue.get(block=True)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 96, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt

adam-kral · 2021-12-09T12:56:11Z

However, with Python 3.9.9, it exits as expected.

Shouldn't this mean, this fix should be in Python 3.7 and newer?

ogrisel · 2021-12-09T13:35:29Z

I can indeed reproduce the hanging with Python 3.8. This is weird because this PR was merged as 94459fd which is an ancestor commit of the v3.7.0 tag (and all subsequent Python releases). Maybe the shutdown case was fixed for real only in a follow-up bugfix merged between 3.8 and 3.9.

adam-kral · 2021-12-09T14:07:30Z

Yeah thanks!

It's not the first time I had this bug. Impossible to debug, unless you google thoroughly.

Since then, I forgot about the specifics, but it has manifested again. Basically I think that an also common case is that someone provides a function that is not picklable (vs an argument). A lambda or a local function.

My usual workflow was:

my_iterable = ...

with ProcessPoolExecutor() as executor:
    def process_item(item):
        ...
        return result
    results = executor.map(process_item, my_iterable)

# do sth with results

That works well. However, I needed to move the code from if __name__ == '__main__' stanza to a function because of setuptools, to define entry_points -> console_scripts.
In that moment, the function became local and unpicklable. All I got was this hanging.

ogrisel · 2021-12-09T14:47:58Z

Alright, thanks for the details.

In that moment, the function became local and unpicklable.

You might want to have a look at https://github.com/joblib/loky !

tomMoral requested a review from rhettinger as a code owner October 5, 2017 07:41

bedevere-bot added the awaiting review label Oct 5, 2017

the-knights-who-say-ni added the CLA signed label Oct 5, 2017

tomMoral changed the title ~~Deadlocks in concurrent.futures.ProcessPoolExecutor with pickling error~~ bpo-31699 Deadlocks in concurrent.futures.ProcessPoolExecutor with pickling error Oct 5, 2017

tomMoral force-pushed the PR_robust_failures branch from 5865996 to 69d774f Compare October 5, 2017 07:48

tomMoral mentioned this pull request Oct 5, 2017

bpo-30006 More robust concurrent.futures.ProcessPoolExecutor #1013

Closed

tomMoral force-pushed the PR_robust_failures branch 2 times, most recently from 8427d36 to c70c4d6 Compare October 23, 2017 16:36

tomMoral force-pushed the PR_robust_failures branch from c70c4d6 to 3c74cb2 Compare November 2, 2017 10:53

tomMoral force-pushed the PR_robust_failures branch 2 times, most recently from e8595c5 to 7ad6721 Compare November 3, 2017 10:22

tomMoral force-pushed the PR_robust_failures branch from 7ad6721 to 7d886ad Compare November 3, 2017 12:45

tomMoral force-pushed the PR_robust_failures branch from b37efc5 to 3de1ee7 Compare November 3, 2017 14:37

tomMoral mentioned this pull request Nov 3, 2017

gh-75880: Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error #4256

Closed

tomMoral added 4 commits November 9, 2017 15:37

TST no deadlocks with pickling crashes

afa2c7d

- TST crash in CallItem unpickling - TST crash in func call run - TST crash in result pickling the test include crashes with PythonError/SystemExist/SegFault Also add more tests for race condition when a worker crashes

ENH don't fail on result pickling errors

f99ce5e

DOC add NEWS entry

5a27c11

ENH independent com channel for wakeup to avoid deadlocks in shutdown

04e1dc1

tomMoral force-pushed the PR_robust_failures branch from 3de1ee7 to 04e1dc1 Compare November 9, 2017 14:52

ogrisel reviewed Nov 21, 2017

View reviewed changes

pitrou reviewed Dec 4, 2017

View reviewed changes

tomMoral added 2 commits December 4, 2017 14:50

CLN remove specific per-test timer

c5b78fe

CLN remove unnecessary code and improve test_shutdown_deadlock

874f263

tomMoral force-pushed the PR_robust_failures branch from 5da1f92 to 874f263 Compare December 4, 2017 14:15

pitrou requested changes Jan 4, 2018

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting core review labels Jan 4, 2018

CLN test comments + remove unused code

942fd79

pitrou approved these changes Jan 5, 2018

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Jan 5, 2018

Improvement NEWS wording

aca74f8

pitrou merged commit 94459fd into python:master Jan 5, 2018

bedevere-bot removed the awaiting merge label Jan 5, 2018

tomMoral deleted the PR_robust_failures branch January 11, 2018 09:20

tomMoral mentioned this pull request Mar 12, 2018

bpo-33056 FIX leaking fd in concurrent.futures.ProcessPoolExecutor #6084

Merged

tomMoral mentioned this pull request Apr 9, 2018

Port robust ProcessPoolExecutor to concurrent.futures joblib/loky#48

Closed

11 tasks

rsln-s mentioned this pull request Sep 8, 2021

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. Qiskit/qiskit#6973

Open

		@@ -94,6 +95,7 @@ def tearDown(self):


		class ExecutorMixin:
		timeout = 30



		class ExitAtPickle(object):
		"""Bad object that triggers a segfault at pickling time."""



		class ErrorAtPickle(object):
		"""Bad object that triggers a segfault at pickling time."""



		class ErrorAtUnpickle(object):
		"""Bad object that triggers a process exit at unpickling time."""



		class ExecutorDeadlockTest:
		# If ExecutorDeadlockTest takes more than 100secs to complete, it is very

Uh oh!

bpo-31699 Deadlocks in concurrent.futures.ProcessPoolExecutor with pickling error #3895

bpo-31699 Deadlocks in concurrent.futures.ProcessPoolExecutor with pickling error #3895

Uh oh!

Conversation

tomMoral commented Oct 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Oct 5, 2017

Uh oh!

tomMoral commented Oct 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomMoral commented Oct 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 31, 2017

Uh oh!

tomMoral commented Nov 2, 2017

Uh oh!

ogrisel commented Nov 3, 2017

Uh oh!

ogrisel commented Nov 3, 2017

Uh oh!

ogrisel commented Nov 3, 2017

Uh oh!

ogrisel commented Nov 7, 2017

Uh oh!

tomMoral commented Nov 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou Dec 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jan 4, 2018

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bpo-31699 Deadlocks in `concurrent.futures.ProcessPoolExecutor` with pickling error #3895

bpo-31699 Deadlocks in `concurrent.futures.ProcessPoolExecutor` with pickling error #3895

tomMoral commented Oct 5, 2017 •

edited

Loading

tomMoral commented Oct 5, 2017 •

edited

Loading

tomMoral commented Oct 5, 2017 •

edited

Loading

ogrisel Dec 4, 2017 •

edited

Loading

pitrou Dec 4, 2017 •

edited

Loading

adam-kral commented Dec 9, 2021 •

edited

Loading

ogrisel commented Dec 9, 2021 •

edited

Loading

adam-kral commented Dec 9, 2021 •

edited

Loading

adam-kral commented Dec 9, 2021 •

edited

Loading

ogrisel commented Dec 9, 2021 •

edited

Loading