-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-45466: Add download feature to urllib.request module #29217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
Similar to http.server, the urllib.request offers a download functionality: python -m urllib.request https://python.org/ --output file.html
Tests are needed. |
Yep, tests are next. I just found |
Lib/urllib/request.py
Outdated
out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
||
with urlopen(args.URL) as response: | ||
while data := response.read(1024 * 1024): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 1 MB a reasonable choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 MB is a bit large. curl uses a buffer size of 32768
.
The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768))
and readinto()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
And some housekeeping.
I looked into the doc building issue, but couldn't figure out what went wrong. Could someone please give me a hint? |
Fixed some uncritical typos and reformatted the news blurb to remove newlines from within code parts. Maybe this fixes the docs issue. |
I'm sure the docs are an unrelated problem. I think there's an open issue with the Sphinx version being used, but now I can't find the mention of that problem. |
I guess it is unlikely for a core developer to look at the PR as long as there are open basic issues like breaking the docs. Is there anything I can do to re-run the pipeline to see if the docs issue has been fixed? |
Closing and re-opening to trigger the doc build step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if we use f-strings for new code.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think it is a good idea to add a download feature at all.
While the PR implements enough functionality to address your use case, other users will open issues to request more features. I expect that users will ask for "take filename from remote", "custom HTTP headers" and "POST requests" next. curl has over 200 command line options for a reason. Soon we'll end up with a poor clone of curl and a new source of security bugs. :)
Lib/urllib/request.py
Outdated
out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
||
with urlopen(args.URL) as response: | ||
while data := response.read(1024 * 1024): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 MB is a bit large. curl uses a buffer size of 32768
.
The code will also do lots of allocation and deallocations. It's possible to avoid allocations with memoryview(bytearray(32768))
and readinto()
.
Lib/urllib/request.py
Outdated
args = parser.parse_args() | ||
out = stdout.buffer if args.output is None else open(args.output, "wb") | ||
|
||
with urlopen(args.URL) as response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will print an exception. You should add error checking and a nice error output in case connection or download fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm catching URLError now.
Lib/urllib/request.py
Outdated
parser.add_argument( | ||
"-o", | ||
"--output", | ||
type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argparse has builtin file handling:
type=str, | |
type=argparse.FileType('wb'), default=sys.stdout.buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, thanks.
Lib/urllib/request.py
Outdated
if __name__ == "__main__": | ||
from argparse import ArgumentParser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please turn this into a helper method, e.g. def main()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it common practice (or impolite) to resolve a fixed issue?
* use buffer to avoid buffer reallocations * catch URLError and print error output * use argparse file handling
Sorry if I screw up the review process. I've never done this in Github... |
The default answer for those requests could be: If you need more functionality than that, install curl/wget and use it instead. Anyway, thanks for the reviews. |
After discussing this among the core devs, we've decided not to accept this patch. Sorry, @pohlt. I hope you at least gained some experience in working with the code and our processes. I'll comment on the issue about why we're not accepting it. |
Thanks, @ericvsmith, for your support. |
Similar to http.server, the
urllib.request
could offer a download functionality:To keep the code lean,
output
is the only optional parameter.A typical use case could be downloading some installation scripts or other data from within a container where curl/wget is not available.
https://bugs.python.org/issue45466