Skip to content

Commit c85a266

Browse files
bpo-28749: Fixed the documentation of the mapping codec APIs. (#487)
Added the documentation for PyUnicode_Translate().
1 parent 909a6f6 commit c85a266

File tree

2 files changed

+66
-74
lines changed

2 files changed

+66
-74
lines changed

Doc/c-api/unicode.rst

Lines changed: 48 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1399,77 +1399,78 @@ Character Map Codecs
13991399
This codec is special in that it can be used to implement many different codecs
14001400
(and this is in fact what was done to obtain most of the standard codecs
14011401
included in the :mod:`encodings` package). The codec uses mapping to encode and
1402-
decode characters.
1403-
1404-
Decoding mappings must map single string characters to single Unicode
1405-
characters, integers (which are then interpreted as Unicode ordinals) or ``None``
1406-
(meaning "undefined mapping" and causing an error).
1407-
1408-
Encoding mappings must map single Unicode characters to single string
1409-
characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
1410-
(meaning "undefined mapping" and causing an error).
1411-
1412-
The mapping objects provided must only support the __getitem__ mapping
1413-
interface.
1414-
1415-
If a character lookup fails with a LookupError, the character is copied as-is
1416-
meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
1417-
resp. Because of this, mappings only need to contain those mappings which map
1418-
characters to different code points.
1402+
decode characters. The mapping objects provided must support the
1403+
:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
14191404
14201405
These are the mapping codec APIs:
14211406
1422-
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
1407+
.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
14231408
PyObject *mapping, const char *errors)
14241409
1425-
Create a Unicode object by decoding *size* bytes of the encoded string *s* using
1426-
the given *mapping* object. Return *NULL* if an exception was raised by the
1427-
codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
1428-
dictionary mapping byte or a unicode string, which is treated as a lookup table.
1429-
Byte values greater that the length of the string and U+FFFE "characters" are
1430-
treated as "undefined mapping".
1410+
Create a Unicode object by decoding *size* bytes of the encoded string *s*
1411+
using the given *mapping* object. Return *NULL* if an exception was raised
1412+
by the codec.
1413+
1414+
If *mapping* is *NULL*, Latin-1 decoding will be applied. Else
1415+
*mapping* must map bytes ordinals (integers in the range from 0 to 255)
1416+
to Unicode strings, integers (which are then interpreted as Unicode
1417+
ordinals) or ``None``. Unmapped data bytes -- ones which cause a
1418+
:exc:`LookupError`, as well as ones which get mapped to ``None``,
1419+
``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
1420+
an error.
14311421
14321422
14331423
.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
14341424
1435-
Encode a Unicode object using the given *mapping* object and return the result
1436-
as Python string object. Error handling is "strict". Return *NULL* if an
1425+
Encode a Unicode object using the given *mapping* object and return the
1426+
result as a bytes object. Error handling is "strict". Return *NULL* if an
14371427
exception was raised by the codec.
14381428
1439-
The following codec API is special in that maps Unicode to Unicode.
1440-
1429+
The *mapping* object must map Unicode ordinal integers to bytes objects,
1430+
integers in the range from 0 to 255 or ``None``. Unmapped character
1431+
ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
1432+
``None`` are treated as "undefined mapping" and cause an error.
14411433
1442-
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1443-
PyObject *table, const char *errors)
1444-
1445-
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1446-
character mapping *table* to it and return the resulting Unicode object. Return
1447-
*NULL* when an exception was raised by the codec.
14481434
1449-
The *mapping* table must map Unicode ordinal integers to Unicode ordinal
1450-
integers or ``None`` (causing deletion of the character).
1435+
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1436+
PyObject *mapping, const char *errors)
14511437
1452-
Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1453-
and sequences work well. Unmapped character ordinals (ones which cause a
1454-
:exc:`LookupError`) are left untouched and are copied as-is.
1438+
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1439+
*mapping* object and return the result as a bytes object. Return *NULL* if
1440+
an exception was raised by the codec.
14551441
14561442
.. deprecated-removed:: 3.3 4.0
14571443
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1458-
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1459-
<codec-registry>`
1444+
:c:func:`PyUnicode_AsCharmapString` or
1445+
:c:func:`PyUnicode_AsEncodedString`.
14601446
14611447
1462-
.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1448+
The following codec API is special in that maps Unicode to Unicode.
1449+
1450+
.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
14631451
PyObject *mapping, const char *errors)
14641452
1465-
Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
1466-
*mapping* object and return a Python string object. Return *NULL* if an
1467-
exception was raised by the codec.
1453+
Translate a Unicode object using the given *mapping* object and return the
1454+
resulting Unicode object. Return *NULL* if an exception was raised by the
1455+
codec.
1456+
1457+
The *mapping* object must map Unicode ordinal integers to Unicode strings,
1458+
integers (which are then interpreted as Unicode ordinals) or ``None``
1459+
(causing deletion of the character). Unmapped character ordinals (ones
1460+
which cause a :exc:`LookupError`) are left untouched and are copied as-is.
1461+
1462+
1463+
.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
1464+
PyObject *mapping, const char *errors)
1465+
1466+
Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
1467+
character *mapping* table to it and return the resulting Unicode object.
1468+
Return *NULL* when an exception was raised by the codec.
14681469
14691470
.. deprecated-removed:: 3.3 4.0
14701471
Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
1471-
:c:func:`PyUnicode_AsCharmapString` or
1472-
:c:func:`PyUnicode_AsEncodedString`.
1472+
:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
1473+
<codec-registry>`
14731474
14741475
14751476
MBCS codecs for Windows

Include/unicodeobject.h

Lines changed: 18 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1613,50 +1613,41 @@ PyAPI_FUNC(PyObject*) PyUnicode_EncodeASCII(
16131613
16141614
This codec uses mappings to encode and decode characters.
16151615
1616-
Decoding mappings must map single string characters to single
1617-
Unicode characters, integers (which are then interpreted as Unicode
1618-
ordinals) or None (meaning "undefined mapping" and causing an
1619-
error).
1620-
1621-
Encoding mappings must map single Unicode characters to single
1622-
string characters, integers (which are then interpreted as Latin-1
1623-
ordinals) or None (meaning "undefined mapping" and causing an
1624-
error).
1625-
1626-
If a character lookup fails with a LookupError, the character is
1627-
copied as-is meaning that its ordinal value will be interpreted as
1628-
Unicode or Latin-1 ordinal resp. Because of this mappings only need
1629-
to contain those mappings which map characters to different code
1630-
points.
1616+
Decoding mappings must map byte ordinals (integers in the range from 0 to
1617+
255) to Unicode strings, integers (which are then interpreted as Unicode
1618+
ordinals) or None. Unmapped data bytes (ones which cause a LookupError)
1619+
as well as mapped to None, 0xFFFE or '\ufffe' are treated as "undefined
1620+
mapping" and cause an error.
1621+
1622+
Encoding mappings must map Unicode ordinal integers to bytes objects,
1623+
integers in the range from 0 to 255 or None. Unmapped character
1624+
ordinals (ones which cause a LookupError) as well as mapped to
1625+
None are treated as "undefined mapping" and cause an error.
16311626
16321627
*/
16331628

16341629
PyAPI_FUNC(PyObject*) PyUnicode_DecodeCharmap(
16351630
const char *string, /* Encoded string */
16361631
Py_ssize_t length, /* size of string */
1637-
PyObject *mapping, /* character mapping
1638-
(char ordinal -> unicode ordinal) */
1632+
PyObject *mapping, /* decoding mapping */
16391633
const char *errors /* error handling */
16401634
);
16411635

16421636
PyAPI_FUNC(PyObject*) PyUnicode_AsCharmapString(
16431637
PyObject *unicode, /* Unicode object */
1644-
PyObject *mapping /* character mapping
1645-
(unicode ordinal -> char ordinal) */
1638+
PyObject *mapping /* encoding mapping */
16461639
);
16471640

16481641
#ifndef Py_LIMITED_API
16491642
PyAPI_FUNC(PyObject*) PyUnicode_EncodeCharmap(
16501643
const Py_UNICODE *data, /* Unicode char buffer */
16511644
Py_ssize_t length, /* Number of Py_UNICODE chars to encode */
1652-
PyObject *mapping, /* character mapping
1653-
(unicode ordinal -> char ordinal) */
1645+
PyObject *mapping, /* encoding mapping */
16541646
const char *errors /* error handling */
16551647
) Py_DEPRECATED(3.3);
16561648
PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16571649
PyObject *unicode, /* Unicode object */
1658-
PyObject *mapping, /* character mapping
1659-
(unicode ordinal -> char ordinal) */
1650+
PyObject *mapping, /* encoding mapping */
16601651
const char *errors /* error handling */
16611652
);
16621653
#endif
@@ -1665,8 +1656,8 @@ PyAPI_FUNC(PyObject*) _PyUnicode_EncodeCharmap(
16651656
character mapping table to it and return the resulting Unicode
16661657
object.
16671658
1668-
The mapping table must map Unicode ordinal integers to Unicode
1669-
ordinal integers or None (causing deletion of the character).
1659+
The mapping table must map Unicode ordinal integers to Unicode strings,
1660+
Unicode ordinal integers or None (causing deletion of the character).
16701661
16711662
Mapping tables may be dictionaries or sequences. Unmapped character
16721663
ordinals (ones which cause a LookupError) are left untouched and
@@ -1964,8 +1955,8 @@ PyAPI_FUNC(PyObject*) PyUnicode_RSplit(
19641955
/* Translate a string by applying a character mapping table to it and
19651956
return the resulting Unicode object.
19661957
1967-
The mapping table must map Unicode ordinal integers to Unicode
1968-
ordinal integers or None (causing deletion of the character).
1958+
The mapping table must map Unicode ordinal integers to Unicode strings,
1959+
Unicode ordinal integers or None (causing deletion of the character).
19691960
19701961
Mapping tables may be dictionaries or sequences. Unmapped character
19711962
ordinals (ones which cause a LookupError) are left untouched and

0 commit comments

Comments
 (0)