Skip to content

xcp.accessor, xcp.repository: Use binary mode for file I/O #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

bernhardkaindl
Copy link
Collaborator

@bernhardkaindl bernhardkaindl commented Apr 24, 2023

xcp.accessor must be able to access binary file content.

One example are the bootloader files. The
https://github.com/ydirson/xenserver-python-libs/blob/testsuite-driven-py3/tests/test_accessor.py#L21 updated by #17 reads boot/isolinux/mboot.c32 to show this.

To support accessing binary files, open(mode="b") must be used, Otherwise (at least when the interpreter's effective LC_CTYPE locale/charset is UTF-8) encoding arbitrary binary data into Unicode will fail on reading and writing.

Explanation:

Conversion between the Python3 str and bytes during I/O requires an encoding. While it can be assumed to by utf-8, at least in theory, for historical reasons, files could use other encodings.

One example is https://raw.githubusercontent.com/ydirson/xenserver-python-libs/testsuite-driven-py3/xcp/cpiofile.py which is still encoded using iso-8859-1 as you can see by the broken display of the name of Lars Gustäbel in Copyright (C) 2002 Lars Gust�bel <[email protected]> when retrieving the raw, unconverted file.

When decoding bytes using the UTF-8 codec to Unicode for the Python3 str type, errors can occur when the input is not 100% well-formed, and there are many valid options to handle such errors.

This shows that a simplification just decode bytes using the UTF-8 decoder to str is risky.

Most Python3 program dealing with strings from outside sources will have to deal with them and need manual attention and at least testing when converting them to Python3. The second commit provides the flexibility to pass encoding= and errors= when a conversion to/from str is desired.

ydirson and others added 24 commits January 20, 2023 17:45
Use of `unicode` needed to be immediately handled, but a few checks
relying on `str` could become insufficient in python2 with the larger
usage of unicode strings.

Signed-off-by: Yann Dirson <[email protected]>
…s to

open() as ths is considered best practice.

(cherry picked from cpython commit 6cef076ba5edbfa42239924951d8acbb087b3b19)

Signed-off-by: Yann Dirson <[email protected]>
Running tests on python3 did reveal some of them.

Signed-off-by: Yann Dirson <[email protected]>
There is no guaranty about ordering of dict elements, and tests compare
results derived from enumerating a dict element.  We could have used an
OrderedDict to store the formulae and get a predictible output order, but
just considering the output as a set seems better.

Only applying this to rules expected to hold more than one element.

Signed-off-by: Yann Dirson <[email protected]>
Caught by extended test.

Signed-off-by: Yann Dirson <[email protected]>
FIXME: I'm quite unsure why xcp.xmlunwrap would want to use bytes and not
unicode strings, but the encode/decode calls make it quite clear it wants
to work with bytes.  That makes the API painful to use in python3.
hashlib came with python 2.5, and old md5 module disappears in 3.0

Signed-off-by: Yann Dirson <[email protected]>
This is supposed to be just a module renaming to conform to PEP8, see
https://docs.python.org/3/whatsnew/3.0.html#library-changes

The SafeConfigParser class has been renamed to ConfigParser in Python
3.2, and backported as addon package.  The `readfp` method now
triggers a deprecation warning to replace it with `read_file`.

FIXME: With python3 some Accessor implementations (e.g. FileAccessor)
provide a text stream for repository config (and with python2 all
implementations), while others (e.g. HTTPAccessor) provide a binary
stream.  But on python3 ConfigParser will bomb out if given a binary
stream, so use a TextIOWrapper to access the config.  This is a hack,
which cannot be used when it is binary data which has to be read (see
later commits), so I don't consider this commit to be correct in that
respect.
Testing several accessor classes causes code duplication, which can be
avoided with help from the `parametrized` package (unfortunately, `pytest`
support cannot be used together with `unittest`).

Not a big deal right now, but starts becoming painful when adding new tests
or testing other Accessor classes.

Signed-off-by: Yann Dirson <[email protected]>
This test uses the same kind of I/O (file copy) that prepare_host_upgrade.py
does.

FIXME: the copy cannot proceed this way in python3
This works properly for the http case, but FileAccessor provides us with
a text fileobj handle, and `read()` gets a UTF-8 decoding error.

FIXME: Accessor ctor requires a `mode` argument
Reported under python3 for members created on-the-fly in `setUp()`

Signed-off-by: Yann Dirson <[email protected]>
With python3, pylint complains about `else: raise()` constructs.
This rework avoids them and reduces cyclomatic complexity by using
the error-out-first idiom.

Signed-off-by: Yann Dirson <[email protected]>
diff-cover defaults to origin/main in new version, it seems.

Signed-off-by: Yann Dirson <[email protected]>
Even though .github/workflows/main.yml does a curl of branding.py
GitHub CI still failed with ImportError for branding.

Signed-off-by: Bernhard Kaindl <[email protected]>
@bernhardkaindl bernhardkaindl force-pushed the testsuite-driven-py3-xcp.accessor-use-binary-mode branch from 28be2cd to b5bd2e2 Compare April 24, 2023 12:48
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 25, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 25, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 26, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 26, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 26, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
bernhardkaindl added a commit to xenserver-next/python-libs that referenced this pull request Apr 28, 2023
Fix issue xenserver#19 based on the description and progress from PR xenserver#24.
Allows for opening text and binary files in text and binary modes.

Mode, encoding and error handling can be set by passing the parameters
"encoding" and "errors" using the kwargs parameters from openAddress()
and writeFile() to open(mode, **kwargs) and ftp.makefile(mode, **kwargs).

Signed-off-by: Bernhard Kaindl <[email protected]>
@bernhardkaindl bernhardkaindl added Will be fixed by other PRs Leave until as issues in it are fixed by other PRs bug labels May 15, 2023
@bernhardkaindl
Copy link
Collaborator Author

Closing as obsoleted by other PRs being worked on now and being prepared.

bernhardkaindl added a commit to rosslagerwall/python-libs that referenced this pull request May 8, 2024
…/py2-py3-six.moves-urllib

Update urlopen() and getoutput() to support Python3 as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Will be fixed by other PRs Leave until as issues in it are fixed by other PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants