@@ -1208,48 +1208,49 @@ In addition to the examples below, more examples are given in
1208
1208
:ref: `urllib-howto `.
1209
1209
1210
1210
This example gets the python.org main page and displays the first 300 bytes of
1211
- it. ::
1211
+ it::
1212
1212
1213
1213
>>> import urllib.request
1214
1214
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1215
1215
... print(f.read(300))
1216
1216
...
1217
- b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1218
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1219
- xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1220
- <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1221
- <title>Python Programming '
1217
+ b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">
1222
1218
1223
1219
Note that urlopen returns a bytes object. This is because there is no way
1224
1220
for urlopen to automatically determine the encoding of the byte stream
1225
1221
it receives from the HTTP server. In general, a program will decode
1226
1222
the returned bytes object to string once it determines or guesses
1227
1223
the appropriate encoding.
1228
1224
1229
- The following W3C document, https://www.w3. org/International/O- charset\ , lists
1230
- the various ways in which an (X) HTML or an XML document could have specified its
1225
+ The following HTML spec document, https://html.spec.whatwg. org/# charset, lists
1226
+ the various ways in which an HTML or an XML document could have specified its
1231
1227
encoding information.
1232
1228
1229
+ For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations.
1230
+
1233
1231
As the python.org website uses *utf-8 * encoding as specified in its meta tag, we
1234
- will use the same for decoding the bytes object. ::
1232
+ will use the same for decoding the bytes object::
1235
1233
1236
1234
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1237
1235
... print(f.read(100).decode('utf-8'))
1238
1236
...
1239
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1240
- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1237
+ <!doctype html>
1238
+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1239
+ <!-
1241
1240
1242
1241
It is also possible to achieve the same result without using the
1243
- :term: `context manager ` approach. ::
1242
+ :term: `context manager ` approach::
1244
1243
1245
1244
>>> import urllib.request
1246
1245
>>> f = urllib.request.urlopen('http://www.python.org/')
1247
1246
>>> try:
1248
1247
... print(f.read(100).decode('utf-8'))
1249
1248
... finally:
1250
1249
... f.close()
1251
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1252
- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1250
+ ...
1251
+ <!doctype html>
1252
+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1253
+ <!--
1253
1254
1254
1255
In the following example, we are sending a data-stream to the stdin of a CGI
1255
1256
and reading the data it returns to us. Note that this example will only work
0 commit comments