Skip to content

gh-75880: Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error #4256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Nov 3, 2017

When using concurrent.futures.ProcessPoolExecutor with objects that are not picklable or unpicklable, several situations results in a deadlock, with the interpreter frozen.

This is the case for different scenarios, for instance, https://gist.github.com/tomMoral/cc27a938d669edcf0286c57516942369. This PR is a follow up of #3895 and specifically fixes the unpickling behavior. With this PR, the unpickling failures do not result in broken executors. This is done by ensuring that the _callItem and the _ResultItem are sent in queues along the work_id with a specific communication protocol, which sends the work_id as bytes, followed by the object serialized by pickle.
To this end, we introduce private API in multiprocessing.Queue to allow modified serialization protocols.

Overall, the goal is to make concurrent.futures.ProcessPoolExecutor more robust to faulty user code.

This work was done as part of the tommoral/loky#48 with the intent to re-use the executor in multiple independent parts of the program, in collaboration with @ogrisel. See #1013 for more the details.

https://bugs.python.org/issue31699

@ogrisel
Copy link
Contributor

ogrisel commented Nov 7, 2017

The windows failures are unrelated to this PR (test.test_nntplib.NetworkedNNTP_SSLTests) but this PR is now conflicting with master.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't what the best solution is but this PR should not be merged while the following regression is not addressed.

work_id, exception=_ExceptionWithTraceback(e, e.__traceback__)
))
result_queue._put_bytes(work_id.to_bytes(WORK_ID_SIZE, WORK_ID_ENC) +
serialize_res)
Copy link
Contributor

@ogrisel ogrisel Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bytes concatenation triggers a potentially very large memory copy which is a regression compared to master.
There are two possible solutions to this problem:

  • Add a send_byte_chunks API down to the Connection that would make it possible to send a topple of bytes without materializing the full buffer. We don't know if it is possible on windows.
  • Implement a new Queue sub class to send and receive 2 messages at a time. This would introduce a small call overhead, potentially critical for small messages.

@brettcannon brettcannon changed the title bpo-31699 Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error bpo-31699: Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error Mar 29, 2019
@brettcannon brettcannon added the type-bug An unexpected behavior, bug, or error label Mar 29, 2019
@iritkatriel
Copy link
Member

https://bugs.python.org/issue31699 is closed. What is the status of this PR?

@rhettinger rhettinger removed their request for review May 3, 2022 06:27
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Aug 13, 2022
@arhadthedev arhadthedev changed the title bpo-31699: Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error gh-75880: Deadlocks in concurrent.futures.ProcessPoolExecutor with unpickling error Apr 15, 2023
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Apr 16, 2023
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review stale Stale PR or inactive for long period of time. type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants