bpo-45466: Add download feature to urllib.request module #29217

pohlt · 2021-10-25T19:23:26Z

Similar to http.server, the urllib.request could offer a download functionality:

python -m urllib.request https://python.org/ --output file.html

To keep the code lean, output is the only optional parameter.
A typical use case could be downloading some installation scripts or other data from within a container where curl/wget is not available.

https://bugs.python.org/issue45466

the-knights-who-say-ni · 2021-10-25T19:23:29Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@pohlt

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Similar to http.server, the urllib.request offers a download functionality: python -m urllib.request https://python.org/ --output file.html

ericvsmith · 2021-10-26T06:20:49Z

Tests are needed.

pohlt · 2021-10-26T06:45:15Z

Yep, tests are next.
I'm planning to use subprocess.run(sys.executable, ...). Any recommendations which server to test against? Set up my own HTTP server, use python.org (pipeline would fail if down for maintenance), ... ?

I just found test_urllib2_localnet.py which offers exactly what is needed for the tests.

Lib/test/test_urllib2_localnet.py

pohlt · 2021-10-26T08:00:05Z

Lib/urllib/request.py

+    out = stdout.buffer if args.output is None else open(args.output, "wb")
+
+    with urlopen(args.URL) as response:
+        while data := response.read(1024 * 1024):


Is 1 MB a reasonable choice?

1 MB is a bit large. curl uses a buffer size of 32768.

The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768)) and readinto().

And some housekeeping.

pohlt · 2021-10-27T05:31:12Z

I looked into the doc building issue, but couldn't figure out what went wrong. Could someone please give me a hint?

pohlt · 2021-10-27T12:10:12Z

Fixed some uncritical typos and reformatted the news blurb to remove newlines from within code parts. Maybe this fixes the docs issue.

ericvsmith · 2021-10-27T13:42:50Z

I'm sure the docs are an unrelated problem. I think there's an open issue with the Sphinx version being used, but now I can't find the mention of that problem.

pohlt · 2021-11-01T08:56:05Z

I guess it is unlikely for a core developer to look at the PR as long as there are open basic issues like breaking the docs. Is there anything I can do to re-run the pipeline to see if the docs issue has been fixed?

ericvsmith · 2021-11-01T12:50:26Z

Closing and re-opening to trigger the doc build step.

ericvsmith

I'd prefer if we use f-strings for new code.

Lib/test/test_urllib2_localnet.py

bedevere-bot · 2021-11-01T14:30:32Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

tiran

I still don't think it is a good idea to add a download feature at all.

While the PR implements enough functionality to address your use case, other users will open issues to request more features. I expect that users will ask for "take filename from remote", "custom HTTP headers" and "POST requests" next. curl has over 200 command line options for a reason. Soon we'll end up with a poor clone of curl and a new source of security bugs. :)

tiran · 2021-11-01T17:05:10Z

Lib/urllib/request.py

+    out = stdout.buffer if args.output is None else open(args.output, "wb")
+
+    with urlopen(args.URL) as response:
+        while data := response.read(1024 * 1024):


1 MB is a bit large. curl uses a buffer size of 32768.

The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768)) and readinto().

tiran · 2021-11-01T17:06:20Z

Lib/urllib/request.py

+    args = parser.parse_args()
+    out = stdout.buffer if args.output is None else open(args.output, "wb")
+
+    with urlopen(args.URL) as response:


This will print an exception. You should add error checking and a nice error output in case connection or download fails.

I'm catching URLError now.

tiran · 2021-11-01T17:13:26Z

Lib/urllib/request.py

+    parser.add_argument(
+        "-o",
+        "--output",
+        type=str,


argparse has builtin file handling:

Suggested change

type=str,

type=argparse.FileType('wb'), default=sys.stdout.buffer

TIL, thanks.

tiran · 2021-11-01T17:14:15Z

Lib/urllib/request.py

+if __name__ == "__main__":
+    from argparse import ArgumentParser


Please turn this into a helper method, e.g. def main().

Is it common practice (or impolite) to resolve a fixed issue?

* use buffer to avoid buffer reallocations * catch URLError and print error output * use argparse file handling

pohlt · 2021-11-01T19:49:17Z

Sorry if I screw up the review process. I've never done this in Github...

pohlt · 2021-11-01T19:57:46Z

While the PR implements enough functionality to address your use case, other users will open issues to request more features. [...]

The default answer for those requests could be: If you need more functionality than that, install curl/wget and use it instead.

Anyway, thanks for the reviews.

ericvsmith · 2021-11-01T20:35:01Z

After discussing this among the core devs, we've decided not to accept this patch. Sorry, @pohlt. I hope you at least gained some experience in working with the code and our processes. I'll comment on the issue about why we're not accepting it.

pohlt · 2021-11-02T04:11:53Z

Thanks, @ericvsmith, for your support.

the-knights-who-say-ni added the CLA not signed label Oct 25, 2021

bedevere-bot added the awaiting review label Oct 25, 2021

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Oct 25, 2021

bpo-45466: add download feature to urllib.request

979e79e

Similar to http.server, the urllib.request offers a download functionality: python -m urllib.request https://python.org/ --output file.html

ilovebirdi approved these changes Oct 26, 2021

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Oct 26, 2021

test: add tests for download in urllib.request

7c2f0ca

pohlt commented Oct 26, 2021

View reviewed changes

Lib/test/test_urllib2_localnet.py Show resolved Hide resolved

pohlt commented Oct 26, 2021

View reviewed changes

Lib/test/test_urllib2_localnet.py Show resolved Hide resolved

pohlt commented Oct 26, 2021

View reviewed changes

chore: add my name to ACKS

33859f4

And some housekeeping.

fix: typos in news blurb

9cbf931

ericvsmith closed this Nov 1, 2021

ericvsmith reopened this Nov 1, 2021

ericvsmith requested changes Nov 1, 2021

View reviewed changes

Lib/test/test_urllib2_localnet.py Outdated Show resolved Hide resolved

Lib/test/test_urllib2_localnet.py Outdated Show resolved Hide resolved

Lib/test/test_urllib2_localnet.py Outdated Show resolved Hide resolved

bedevere-bot removed the awaiting core review label Nov 1, 2021

bedevere-bot added the awaiting changes label Nov 1, 2021

tiran requested changes Nov 1, 2021

View reviewed changes

test: use f-strings

7cb6227

pohlt added 2 commits November 1, 2021 20:25

refactor: use helper function for download

aefeb80

refactor: according to comments

2c72f86

* use buffer to avoid buffer reallocations * catch URLError and print error output * use argparse file handling

ericvsmith closed this Nov 1, 2021

	type=str,
	type=argparse.FileType('wb'), default=sys.stdout.buffer

		if __name__ == "__main__":
		from argparse import ArgumentParser

Uh oh!

bpo-45466: Add download feature to urllib.request module #29217

bpo-45466: Add download feature to urllib.request module #29217

Uh oh!

Conversation

pohlt commented Oct 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Oct 25, 2021

Recognized GitHub username

Uh oh!

ericvsmith commented Oct 26, 2021

Uh oh!

pohlt commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohlt commented Oct 27, 2021

Uh oh!

pohlt commented Oct 27, 2021

Uh oh!

ericvsmith commented Oct 27, 2021

Uh oh!

pohlt commented Nov 1, 2021

Uh oh!

ericvsmith commented Nov 1, 2021

Uh oh!

ericvsmith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-bot commented Nov 1, 2021

Uh oh!

tiran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pohlt commented Nov 1, 2021

Uh oh!

pohlt commented Nov 1, 2021

Uh oh!

ericvsmith commented Nov 1, 2021

Uh oh!

pohlt commented Nov 2, 2021

Uh oh!

Uh oh!

pohlt commented Oct 25, 2021 •

edited

Loading

pohlt commented Oct 26, 2021 •

edited

Loading