-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-39678 RFC refactor queue manager thread #18551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18551 +/- ##
==========================================
+ Coverage 82.11% 82.13% +0.01%
==========================================
Files 1956 1955 -1
Lines 589382 584547 -4835
Branches 44456 44476 +20
==========================================
- Hits 483994 480089 -3905
+ Misses 95737 94812 -925
+ Partials 9651 9646 -5
Continue to review full report at Codecov.
|
The failure seems unrelated to this PR (in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. This looks generally good, just some comments.
n_children_alive = sum(p.is_alive() for p in processes.values()) | ||
# Set this thread to be daemonized | ||
super().__init__() | ||
self.daemon = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be worthwhile exploring whether/how we can remove the daemon flag, though in another BPO issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the daemon flag is only necessary if we can't guarantee that the thread will be joined at the end of the executor's shutdown process. But, I could see this potentially being an issue when calling executor.shutdown(wait=False)
for ProcessPoolExecutor (which already has issues with graceful termination).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am not sure this is needed anymore.
The shutdown(wait=False)
behavior should be fixed by #17670 and the clean up process is quite robust I believe. We should maybe give a try to simply removing the flag and experiments bad situations to see if it leads to interpreters hanging at exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is: what happens if the interpreter starts to shutdown while an executor is still running? If the management thread is not daemon, the interpreter will hang trying to join it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the current behavior. The _python_exit
function will join the thread anyway no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that atexit
hooks are called after non-daemon threads are joined. See https://bugs.python.org/issue37266#msg362890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed these messages.
Indeed, the simplest solution would be to call the atexit _python_exit
before joining non-deamonized threads.
@aeros You may be interested in this, at least abstractly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @tomMoral!
After just recently working with the internals for the cancel_futures executor.shutdown()
parameter, I can certainly attest to the queue management function becoming a bit convoluted. Splitting up the monolithic function into a class with multiple methods should make it significantly easier to make changes and adjustments in the future.
Misc/NEWS.d/next/Library/2020-02-28-12-59-30.bpo-39678.3idfxM.rst
Outdated
Show resolved
Hide resolved
Thanks for the reviews! I addressed your comments, let me know if there are other changes to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, thank you. @aeros Do you want to take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the latest commits to the PR addressed all my of concerns.
I think this PR would be a good candidate for the test-with-buildbot
label, just to ensure that we don't encounter any odd side effects on other platforms prior to merging (besides the CI checks). I suspect everything will pass, but if anything doesn't, it's much simpler to address it now rather than post-commit (which could require a PR to revert the changes and another to re-apply them later).
Buildbot failures look unrelated, I'm merging. |
bpo-39678: refactor queue manager thread (pythonGH-18551)
thanks for the quick reviews and merge! |
@pitrou In the post-commit, it looks like the failures only occurred for the Windows 7 buildbot ( (Last time I checked, the buildbot team was working on removing all of the Windows 7 builders from the fleet, the above is the last one remaining) So, in other words, everything looks good! Thanks again for working on the ProcessPoolExecutor refactor @tomMoral. |
As discussed in #17670 the the
_queue_management_worker
function has grown quite long and complicated.This PR intends to turn it into an object with a bunch of short and readable helper methods.
I did a first pass, not sure about name changed for
_queue_management_thread
and a bunch of other change, in particular to cancel behavior (only do it once now). Also, I am not sure if I should add a what's new entry. Let me know if I should change anything.https://bugs.python.org/issue39678