Skip to content

bpo-39678 RFC refactor queue manager thread #18551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 1, 2020

Conversation

tomMoral
Copy link
Contributor

@tomMoral tomMoral commented Feb 18, 2020

As discussed in #17670 the the _queue_management_worker function has grown quite long and complicated.
This PR intends to turn it into an object with a bunch of short and readable helper methods.

I did a first pass, not sure about name changed for _queue_management_thread and a bunch of other change, in particular to cancel behavior (only do it once now). Also, I am not sure if I should add a what's new entry. Let me know if I should change anything.

https://bugs.python.org/issue39678

@codecov
Copy link

codecov bot commented Feb 18, 2020

Codecov Report

Merging #18551 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18551      +/-   ##
==========================================
+ Coverage   82.11%   82.13%   +0.01%     
==========================================
  Files        1956     1955       -1     
  Lines      589382   584547    -4835     
  Branches    44456    44476      +20     
==========================================
- Hits       483994   480089    -3905     
+ Misses      95737    94812     -925     
+ Partials     9651     9646       -5     
Impacted Files Coverage Δ
Lib/distutils/tests/test_bdist_rpm.py 30.00% <0.00%> (-65.00%) ⬇️
Lib/distutils/command/bdist_rpm.py 7.63% <0.00%> (-56.88%) ⬇️
Modules/_decimal/libmpdec/umodarith.h 80.76% <0.00%> (-19.24%) ⬇️
Lib/test/test_urllib2net.py 76.92% <0.00%> (-13.85%) ⬇️
Lib/test/test_smtpnet.py 78.57% <0.00%> (-7.15%) ⬇️
Lib/ftplib.py 63.85% <0.00%> (-6.06%) ⬇️
Lib/test/test_ftplib.py 87.11% <0.00%> (-4.72%) ⬇️
Tools/scripts/db2pickle.py 17.82% <0.00%> (-3.97%) ⬇️
Tools/scripts/pickle2db.py 16.98% <0.00%> (-3.78%) ⬇️
Lib/test/test_socket.py 71.94% <0.00%> (-3.77%) ⬇️
... and 345 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24bba8c...c3274e4. Read the comment docs.

@tomMoral
Copy link
Contributor Author

The failure seems unrelated to this PR (in importlib).

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. This looks generally good, just some comments.

n_children_alive = sum(p.is_alive() for p in processes.values())
# Set this thread to be daemonized
super().__init__()
self.daemon = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile exploring whether/how we can remove the daemon flag, though in another BPO issue.

Copy link
Contributor

@aeros aeros Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the daemon flag is only necessary if we can't guarantee that the thread will be joined at the end of the executor's shutdown process. But, I could see this potentially being an issue when calling executor.shutdown(wait=False) for ProcessPoolExecutor (which already has issues with graceful termination).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure this is needed anymore.
The shutdown(wait=False) behavior should be fixed by #17670 and the clean up process is quite robust I believe. We should maybe give a try to simply removing the flag and experiments bad situations to see if it leads to interpreters hanging at exit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is: what happens if the interpreter starts to shutdown while an executor is still running? If the management thread is not daemon, the interpreter will hang trying to join it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the current behavior. The _python_exit function will join the thread anyway no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that atexit hooks are called after non-daemon threads are joined. See https://bugs.python.org/issue37266#msg362890

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed these messages.
Indeed, the simplest solution would be to call the atexit _python_exit before joining non-deamonized threads.

@pitrou
Copy link
Member

pitrou commented Feb 28, 2020

@aeros You may be interested in this, at least abstractly.

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @tomMoral!

After just recently working with the internals for the cancel_futures executor.shutdown() parameter, I can certainly attest to the queue management function becoming a bit convoluted. Splitting up the monolithic function into a class with multiple methods should make it significantly easier to make changes and adjustments in the future.

@aeros aeros added the type-feature A feature request or enhancement label Feb 28, 2020
@tomMoral
Copy link
Contributor Author

tomMoral commented Mar 1, 2020

Thanks for the reviews! I addressed your comments, let me know if there are other changes to do.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, thank you. @aeros Do you want to take another look?

Copy link
Contributor

@aeros aeros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the latest commits to the PR addressed all my of concerns.

I think this PR would be a good candidate for the test-with-buildbot label, just to ensure that we don't encounter any odd side effects on other platforms prior to merging (besides the CI checks). I suspect everything will pass, but if anything doesn't, it's much simpler to address it now rather than post-commit (which could require a PR to revert the changes and another to re-apply them later).

@aeros aeros added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 1, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @aeros for commit b9d5c38 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 1, 2020
@pitrou
Copy link
Member

pitrou commented Mar 1, 2020

Buildbot failures look unrelated, I'm merging.

@pitrou pitrou merged commit 0e89076 into python:master Mar 1, 2020
sthagen added a commit to sthagen/python-cpython that referenced this pull request Mar 1, 2020
bpo-39678: refactor queue manager thread (pythonGH-18551)
@tomMoral
Copy link
Contributor Author

tomMoral commented Mar 1, 2020

thanks for the quick reviews and merge!

@tomMoral tomMoral deleted the RFC_queue_management_thread branch March 1, 2020 21:49
@aeros
Copy link
Contributor

aeros commented Mar 2, 2020

@pitrou In the post-commit, it looks like the failures only occurred for the Windows 7 buildbot (buildbot/AMD64 Windows7 SP1 3.x), which we no longer support since Windows 7 recently reached EoL. That buildbot has also been consistently failing across PRs recently, so it's definitely not related to the PR.

(Last time I checked, the buildbot team was working on removing all of the Windows 7 builders from the fleet, the above is the last one remaining)

So, in other words, everything looks good! Thanks again for working on the ProcessPoolExecutor refactor @tomMoral.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants