-
Notifications
You must be signed in to change notification settings - Fork 12
UTF-8 encoding fixes + tests for xcp.pci, .cmd and .net.biosdevname #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use of `unicode` needed to be immediately handled, but a few checks relying on `str` could become insufficient in python2 with the larger usage of unicode strings. Signed-off-by: Yann Dirson <[email protected]>
…conversion Signed-off-by: Yann Dirson <[email protected]>
…s to open() as ths is considered best practice. (cherry picked from cpython commit 6cef076ba5edbfa42239924951d8acbb087b3b19) Signed-off-by: Yann Dirson <[email protected]>
…fication Signed-off-by: Yann Dirson <[email protected]>
…ated Signed-off-by: Yann Dirson <[email protected]>
Running tests on python3 did reveal some of them. Signed-off-by: Yann Dirson <[email protected]>
Signed-off-by: Yann Dirson <[email protected]>
There is no guaranty about ordering of dict elements, and tests compare results derived from enumerating a dict element. We could have used an OrderedDict to store the formulae and get a predictible output order, but just considering the output as a set seems better. Only applying this to rules expected to hold more than one element. Signed-off-by: Yann Dirson <[email protected]>
Caught by extended test. Signed-off-by: Yann Dirson <[email protected]>
This goes away in python3. Signed-off-by: Yann Dirson <[email protected]>
FIXME: I'm quite unsure why xcp.xmlunwrap would want to use bytes and not unicode strings, but the encode/decode calls make it quite clear it wants to work with bytes. That makes the API painful to use in python3.
hashlib came with python 2.5, and old md5 module disappears in 3.0 Signed-off-by: Yann Dirson <[email protected]>
This is supposed to be just a module renaming to conform to PEP8, see https://docs.python.org/3/whatsnew/3.0.html#library-changes The SafeConfigParser class has been renamed to ConfigParser in Python 3.2, and backported as addon package. The `readfp` method now triggers a deprecation warning to replace it with `read_file`. FIXME: With python3 some Accessor implementations (e.g. FileAccessor) provide a text stream for repository config (and with python2 all implementations), while others (e.g. HTTPAccessor) provide a binary stream. But on python3 ConfigParser will bomb out if given a binary stream, so use a TextIOWrapper to access the config. This is a hack, which cannot be used when it is binary data which has to be read (see later commits), so I don't consider this commit to be correct in that respect.
Testing several accessor classes causes code duplication, which can be avoided with help from the `parametrized` package (unfortunately, `pytest` support cannot be used together with `unittest`). Not a big deal right now, but starts becoming painful when adding new tests or testing other Accessor classes. Signed-off-by: Yann Dirson <[email protected]>
This test uses the same kind of I/O (file copy) that prepare_host_upgrade.py does. FIXME: the copy cannot proceed this way in python3
This works properly for the http case, but FileAccessor provides us with a text fileobj handle, and `read()` gets a UTF-8 decoding error. FIXME: Accessor ctor requires a `mode` argument
Signed-off-by: Yann Dirson <[email protected]>
Signed-off-by: Yann Dirson <[email protected]>
Reported under python3 for members created on-the-fly in `setUp()` Signed-off-by: Yann Dirson <[email protected]>
With python3, pylint complains about `else: raise()` constructs. This rework avoids them and reduces cyclomatic complexity by using the error-out-first idiom. Signed-off-by: Yann Dirson <[email protected]>
diff-cover defaults to origin/main in new version, it seems. Signed-off-by: Yann Dirson <[email protected]>
Also use xcp.xcp_popen_text_kwargs for all affected unit tests because they need to handle the encoding decode/encode likewise.
Now we use encoding="utf-8" to open /usr/share/hwdata/pci.ids, enhance the test case to ensure that xcp.pci does not crash when the existing UTF-8 characters in /usr/share/hwdata/pci.ids are included in the unit test and returns the expected output.
We might be called with the locale not set, in which case python2's default charset is ASCII. This happens when code is running as an xapi-plugin. For example, this happens with the ACK plugin which uses xcp.pci.PCIIds() This means we have to test that the code uses encoding="utf-8" correctly in all cases (python2 and python3) and this test adds testing this while processing UTF-8 data in xcp.cmd and xcp.pci.PCIIds().read()
Signed-off-by: Bernhard Kaindl <[email protected]>
The DCO check is failing, this is because the signoffs are missing from some commits. To easily sign them of, run |
All commits of this PR have now been refactored and merged/obsoelted by PRs which are now merged. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The commits of 1-21 by ydirson are from #17, this is just to allow for a separate review of my additional commits.
Foundational information:
For reading UTF-8 characters from files using Python3, there AFAIK are 3 possibilities:
open(file, encoding="utf-8")
, subprocess pipes are opened usingPopen(..., encoding="utf-8")
open(file, "b"
) and then all data read and written is encoded and decoded using explicit encode()/decode() alls using the utf-8 codec.is needed.
My observations on these are:
en_US.UTF-8
in /etc/locale which shell logins get passed asLANG=
, it may not always be set. For example, when daemons are started it is good practice to clear the environment. The ACK xapi-plugin appears be an example for this situation. The locale can be enforced interpreter-wide by settingLC_CYTPE
/LC_ALL
using the environment orlocale.setlocale()
, but it affects the entire process including all threads, calls to setlocale are not thread-safe and it would not be good practice for a library like python-libs/xcp to change the locale used by the process.encoding='utf-8'
works nice, is easy to apply to allopen
andPopen
calls, but is neither supported nor needed for Python2. Passing to allopen
andPopen
can be done thru a**kwargs
keyword parameter which passes the needed arguments as a dict.encode(args)
call on bytes internally results indecode().encode(args)
, which can go wrong and can even result in raisedUnicodeDecodeError
because the implicitdecode()
is done withoutencoding=
anderrors=
arguments.Option 2, passing
encoding="utf-8", errors="replace"
(replaces decoding errors with "?", without raising an exception) using a**kwargs
keyword parameter dict on python3 and empty on python2 is the easiest and safest option that I can imagine.