-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
[2.7] bpo-34110: Fix module load problem of cPickle in case of multithreading #8276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. When your account is ready, please add a comment in this pull request Thanks again for your contribution, we look forward to reviewing it! |
f4b6047
to
53a5943
Compare
pythonGH-8276 fixed regression in 3.7. While the regression is not in 3.6, it's worth to backport test cases to 3.6 branch too.
I think my account is ready now. I see the star after my username. |
Hi @sangongs, based on @pablogsal's comment on the bug tracker, would you be able to rebase this over the master branch instead of just on the 2.7 branch? Thanks! |
@pitrou, as you made the last change to this code, would you be able to review this change? Thanks! |
@pitrou Sure. I will try to rebase it. It may take some time since I am not familiar with the code base of Python 3. BTW: The scripts to reproduce the problem in Python 3: foo.py
main.py
|
This looks to be a duplicate of https://bugs.python.org/issue34572, which was already fixed in Python 3 (but not backported to Python 2.7). See PRs there. |
Hmm. Yes, I tested with 3.7.3 and it is fixed. |
Thanks @pitrou! Apologies for not seeing that other issue. Since |
The As for what a correct solution may look like, perhaps one of the import experts may want to answer - cc @brettcannon @ncoghlan . |
I'm not sure what a correct Python 2 fix looks like, but I'll note that any Python 2.7 approach based on acquiring the import lock creates the risk of introducing deadlocks where none existed before: https://docs.python.org/2/library/threading.html#importing-in-threaded-code We didn't have that problem in the Python 3.x fix, as all the versions where the fix was implemented already included the change to switch the import system over to using per-module locks. |
Hmm, you're right, the risk of introducing a regression here is too important. And I don't know of a solution that would not require a lock (since we're talking about multi-threading synchronization). |
There may be another possible approach, but I don't know if it fixes @sangongs 's issue: module = PySys_GetObject("modules");
if (module == NULL)
return NULL;
module = PyDict_GetItem(module, py_module_name); /* borrowed ref */
if (module != NULL) {
global = PyObject_GetAttr(module, py_global_name);
if (global != NULL || !PyErr_ExceptionMatches(PyExc_AttributeError)) {
return global;
}
PyErr_Clear();
}
/* Either module not found in sys.modules, or attribute lookup
* in module failed. In both cases, we retry after actually
* importing the module
*/
module = PyImport_Import(py_module_name); /* new ref */
if (!module)
return NULL;
global = PyObject_GetAttr(module, py_global_name);
Py_DECREF(module);
return global; |
@pitrou Doesn't that still implicitly acquire the import lock inside the full import call, thus potentially deadlocking some cases that are currently getting by with a reference to a partially initialized module? |
@ncoghlan Indeed, those cases currently raise an |
@pitrou I was more concerned about the cases where the module is initialised well enough that all the attributes that the unpickling operation needs are already there, even though the module level code hasn't finished executing. Any such code would be spectacularly fragile in general, but it would still be problematic to break it in a maintenance release. |
If the attributes are already there then the first |
I now think you're right, so posting the argument where I was attempting to convince you I was right, but ended up convincing myself that you're right instead, and I had just misunderstood your proposal. So +1 from me. The import system adds the
In current Python 2.7 code, the pickle module can access Illustrative example where I believe being unable to access a partially imported module could end up being problematic: parent_pkg.module_a.py:
module_b.py:
As written, that code would be fine regardless of how the locking works, since the unpickle call happens with the import lock for In Python 2.7, it also works even if the unpickling happens in another thread, since pickle doesn't try to acquire the import lock, and the required module attributes are available before the potential unpickle is invoked. In Python 3.7, there's technically a chance to deadlock with another thread, but only if With a global import lock, though, there would be many more ways to end up in a situation where the However, the key to your proposal is that the cases like my hypothetical one above won't fall through to the import lock, because the attributes they need will already be there. It's only the motivating cases that currently raise AttributeError that will risk deadlocking (but probably won't deadlock in practice, since they probably won't be implicitly waiting for any other threads during the import). |
That said, there's also the slight risk that someone somewhere is fine getting an |
I didn't read the whole in-depth discussion, but, a priori, Clearly, we've been living with this |
In Python 2.7,
sys.modules
may contain a partially loaded module. This is caused by the following logic:cpython/Python/import.c
Lines 721 to 727 in a45fa39
When executing the code of a module, an empty module is added into the module dict. Thus if any code in the module releases the GIL, another thread may get a partially loaded module from
sys.modules
. Unfortunately,cPickle
checkssys.modules
before trying to import and may get an incomplete module, leading to anAttributeError
when trying to obtain an object of the module.This fix acquires import lock before checking
sys.modules
, make sure there is no partially loaded module.To reproduce, first create a
foo.py
:Then create and run
main.py
:The following error should be raised:
https://bugs.python.org/issue34110