-
Notifications
You must be signed in to change notification settings - Fork 45
2.10.1 client deadlocks on Python GIL during logging while nacking and receiving messages #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From the threads info, the root cause is still related to the
Nearly all threads except thread 2 are stuck at acquiring GIL in In conclusion:
Since the GIL won't be released, all threads that try to acquire the GIL will be stuck. It will block nearly all threads when the Python logger is customized because all methods of I'm going to check the code in further progress and see how could we fix this deadlock. (BTW, this threads info is strange that I see |
I opened a PR in the C++ client repo to avoid the deadlock caused by user-provided logger: apache/pulsar-client-cpp#266 |
Fixes apache#116 apache/pulsar-client-cpp#266 is included in the C++ client 3.2.0 so that apache#116 will be fixed.
* Bump the C++ client to 3.2.0 Fixes #116 apache/pulsar-client-cpp#266 is included in the C++ client 3.2.0 so that #116 will be fixed. * Change the download URL since archive.apache.org is not available now * Revert "Change the download URL since archive.apache.org is not available now" This reverts commit 6b92e98.
We've observed full python interpreter lockups (not just "blocking": the interpreter calling the client halts; can't be unblocked or time out/raise exceptions, even if the blocking operation is moved to a python Thread and waited on with a timeout) in the presence of:
logger=
argument to Client. We must do this, otherwise the logs emitted by the client to STDOUT fill up our disks.All of those have to be present to trigger the issue. When multiple Shared consumers are repeatedly nacking messages with a 15sec delay on a topic with a few hundred messages (100% of them are nacked over and over), all but one of the consumers eventually (within a few minutes) locks up--that is, no Python in that consumer can run. It's not just that it's blocked in a
negative_acknowledge
call, it's that all threads, signal handlers, coroutines, etc. in that interpreter are stuck. This says GIL conflict to me.While this program has many hundreds of threads, the stacktraces from the most relevant ones are included here:
threads.txt
The text was updated successfully, but these errors were encountered: