Description
Hello pypdf team,
While trying to to get the fields of my PDF with the function PdfReader.get_fields()
, my code received an exception from the function create_string_object
(in pypdf/generic/_utils.py, line 113) because it received a bytearray instead of a str or bytes.
By looking at the traceback, the error occurs when the function def decrypt_object(self, obj: PdfObject) -> PdfObject
detects that the object to decrypt is either of type ByteStringObject or TextStringObject, before calling create_string_object
.
The documentation about the bytearray type states:
bytearray objects are a mutable counterpart to bytes objects.
As bytearray objects are mutable, they support the mutable sequence operations in addition to the common bytes and bytearray operations described in Bytes and Bytearray Operations.
source: https://docs.python.org/3/library/stdtypes.html#bytearray
So it seems like the function create_string_object
could accept bytearray objects and could treat them as bytes instead of raising an exception.
After applying this fix, I was able to read the fields of my PDF.
diff --git a/pypdf/generic/_utils.py b/pypdf/generic/_utils.py
index e6da5cf..edc9153 100644
--- a/pypdf/generic/_utils.py
+++ b/pypdf/generic/_utils.py
@@ -129,7 +129,7 @@ def create_string_object(
"""
if isinstance(string, str):
return TextStringObject(string)
- elif isinstance(string, bytes):
+ elif isinstance(string, bytes | bytearray):
if isinstance(forced_encoding, (list, dict)):
out = ""
for x in string:
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-6.5.0-15-generic-x86_64-with-glibc2.38
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.4, crypt_provider=('cryptography', '41.0.7'), PIL=none
Code + PDF
This is a minimal, complete example that shows the issue:
pdf = PdfReader(file)
fields = pdf.get_fields()
I can't provide my PDF file because it contains personal information.
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/home/stefan/src/source/pdf.py", line 235, in get_pdf_fields
fields = self.reader.get_fields()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/_reader.py", line 577, in get_fields
field = f.get_object()
^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/generic/_base.py", line 312, in get_object
obj = self.pdf.get_object(self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/_reader.py", line 1417, in get_object
retval = self._encryption.decrypt_object(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/_encryption.py", line 850, in decrypt_object
return cf.decrypt_object(obj)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/_encryption.py", line 104, in decrypt_object
obj[key] = self.decrypt_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/_encryption.py", line 97, in decrypt_object
obj = create_string_object(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stefan/.venv/lib/python3.11/site-packages/pypdf/generic/_utils.py", line 163, in create_string_object
raise TypeError(
TypeError: ('create_string_object should have str or unicode arg: %s', <class 'bytearray'>)