Skip to content

bpo-43785 removed the lock from BZ2File, but on PyPy writing from multiple threads now deadlocks #100996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattip opened this issue Jan 12, 2023 · 3 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@mattip
Copy link
Contributor

mattip commented Jan 12, 2023

Bug report

A clear and concise description of what the bug is.
Include a minimal, reproducible example (https://stackoverflow.com/help/minimal-reproducible-example), if possible.

We are working on upgrading PyPy to stdlib v3.10.6. As part of that work, we merge the contents of Lib/* to a py3.10 branch of PyPy and run the tests. We are seeing deadlocks in the Lib/test/test_bz2.py tests, specifically in the testThreading test. I can confirm that restoring the lock removed in #25299 (but only the part when writing to self.fp) breaks the deadlock:

--- a/lib-python/3/bz2.py       Thu Jan 12 16:22:35 2023 +0200
+++ b/lib-python/3/bz2.py       Thu Jan 12 20:27:36 2023 +0200
@@ -13,6 +13,7 @@
 import io
 import os
 import _compression
+from threading import RLock
 
 from _bz2 import BZ2Compressor, BZ2Decompressor
 
@@ -52,6 +53,9 @@
         If mode is 'r', the input file may be the concatenation of
         multiple compressed streams.
         """
+        # This lock must be recursive, so that BufferedIOBase's
+        # writelines() does not deadlock.
+        self._lock = RLock()
         self._fp = None
         self._closefp = False
         self._mode = _MODE_CLOSED
@@ -227,9 +231,10 @@
             data = memoryview(data)
             length = data.nbytes
 
-        compressed = self._compressor.compress(data)
-        self._fp.write(compressed)
-        self._pos += length
+        with self._lock:
+            compressed = self._compressor.compress(data)
+            self._fp.write(compressed)
+            self._pos += length
         return length
 
     def writelines(self, seq):

Your environment

PyPy on the py3.10 branch, on Ubuntu 20.04.

Does CPython have a different locking mechanism for writing to a file to prevent this?

xref @methane. I think the intention of the original issue was to improve reading speed, I don't see anything in the comments about writing.

@mattip mattip added the type-bug An unexpected behavior, bug, or error label Jan 12, 2023
@mattip
Copy link
Contributor Author

mattip commented Jan 12, 2023

xref the issue as well #87951

@methane
Copy link
Member

methane commented Jan 13, 2023

ref #51454 too.

I think the intention of the original issue was to improve reading speed, I don't see anything in the comments about writing.

I don't think supporting thread-safe for write but not for read is good solution?
I think multithread support should same for read/write, and bz2/gzip/lzma.

I am trying to reproduce the deadlock but I can not reproduce it yet.
Could you try this script?
https://gist.github.com/methane/92fef8980cb4d11e88959d067904e891

@mattip
Copy link
Contributor Author

mattip commented Jan 13, 2023

hmm. The script runs to completion on CPython and deadlocks only in BZ2file on PyPy (without the lock). I think I can close this, thanks for the reality check. There is something fishy in compressed = self._compressor.compress(data) on PyPy: only that line needs to be within the lock's context manager. Removing the lock seems to have exposed a latent problem with the code that comes from the implementation-specific _bz2.BZCompressor.compress. I do see that PyPy uses a lock there, but maybe there is some other contention causing deadlock.

Thanks for checking again.

@mattip mattip closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants