-
Notifications
You must be signed in to change notification settings - Fork 61
Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290
Comments
Implementation is in pull request #291. |
Actually this turned out to be the hard part. First I thought, that I could simply use the tp_vectorcall_offset field in slp_methodflags, but first there is no such field and second the whole approach does not work. Unlike all other slots, the vectorcall function is a per instance property. Each instance of a type may use a different vectorcall function. When I discovered this difficulty I first feared this could be a show stopper for Stackless. Then I found a way to modify the "Stackless-protocol" that is used to control, if C-function may return Py_UnwindToken. Traditionally we use the global variable int _PyStackless_TRY_STACKLESS and the local variable int stackless. Both were used as boolean variables with zero or non-zero values. The calling function needs to know whether the called function supports the Stackless protocol, because the called function must reset the global variable _PyStackless_TRY_STACKLESS. For the vectorcall protocol the calling function does not know, if the called function supports the Stackless protocol. Therefore, for vectorcalls, the Stackless-protocol must work without the strict requirement for the called function to reset _PyStackless_TRY_STACKLESS. The solution is as follows:
I implemented this solution in pull request #291. Seems to work well. |
Upstream commit be718c3 causes assertion failures in prickelpit.c function It is now required to copy a few fields from the base type: tp_descr_get, tp_vectorcall_offset and tp_call. |
Adapt Stackless Python to the new Vectorcall protocol introduced by upstream commit aacc77f. This changes changes the Stackless protocol to support Vectorcall functions. It also updates code in pricklpit.c to correctly initialize wrapper types. This avoids assertion failures introduced by upstream commit be718c3 (not yet merged).
Before dropping the GIL, assert that non-Vectorcall functions did reset _PyStackless_TRY_STACKLESS.
Interesting. The thoughts behind this certainly belong to the documentation. Is this perhaps a better protocol than the old one?
|
I added the information to the comment for the try_stackless field in pycore_slp_pystate.h. It is an implementation detail. This is why I didn't add it to the API documentation.
Not sure. It requires the address of a called function (woul prevent complete inlining functions) and requires two writes and one read for each function call. With the other protocol many simple wrapper functions do not touch try_stackless at all.
I completely rewrote this part.
Sure, but a global variable has a known address and is always available. A unconditional write of 0 in And many thanks for your help. Good documentation is really hard. |
On my way to the office this morning, I was thinking about this, and I realized that I never really understood the soft-switch mechanism in stackless. If there isn't one, maybe a section on how it works would be helpful. Here is a rough outline from what I
This is where it gets fishy for me. How can we now, re-enter the C-stack of the function that return the original UnwindToken in 2. Maybe it was a |
…tation) Update the API Documentation. There are changes to the Stackless-protocol and a few new macros (STACKLESS_VECTORCALLxxxx).
bpo-36974 commit aacc77f implements PEP 590, the Vectorcall protocol. This commit causes massive merge conflicts.
Plan:
"call"-functions touched by commit aacc77f. We need to (re-)add support for the Stackless protocol.
call.c
_PyObject_FastCallDict
_PyObject_MakeTpCall
PyVectorcall_Call
PyObject_Call
_PyFunction_FastCallKeywords
_PyCFunction_FastCallKeywords
PyCFunction_Call
ceval.c
trace_call_function
call_function
classobject.c
method_vectorcall
The text was updated successfully, but these errors were encountered: