Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

akruis · 2021-07-17T20:38:05Z

bpo-36974 commit aacc77f implements PEP 590, the Vectorcall protocol. This commit causes massive merge conflicts.
Plan:

Resolve conflicts with minimal changes and drop Stackless support in affected code.
Make the code compilable.
Readd Stackless support to the functions mentioned below.
Define how to indicate, that a Vectorcall-function supports the Stackless protocol.

"call"-functions touched by commit aacc77f. We need to (re-)add support for the Stackless protocol.

call.c

_PyObject_FastCallDict
_PyObject_MakeTpCall
PyVectorcall_Call
PyObject_Call
_PyFunction_FastCallKeywords
_PyCFunction_FastCallKeywords
PyCFunction_Call

ceval.c

trace_call_function
call_function

classobject.c

method_vectorcall

The text was updated successfully, but these errors were encountered:

akruis · 2021-07-27T13:03:01Z

Implementation is in pull request #291.

akruis · 2021-07-30T11:08:34Z

4\. Define how to indicate, that a Vectorcall-function supports the Stackless protocol.

Actually this turned out to be the hard part. First I thought, that I could simply use the tp_vectorcall_offset field in slp_methodflags, but first there is no such field and second the whole approach does not work. Unlike all other slots, the vectorcall function is a per instance property. Each instance of a type may use a different vectorcall function. When I discovered this difficulty I first feared this could be a show stopper for Stackless.

Then I found a way to modify the "Stackless-protocol" that is used to control, if C-function may return Py_UnwindToken. Traditionally we use the global variable int _PyStackless_TRY_STACKLESS and the local variable int stackless. Both were used as boolean variables with zero or non-zero values. The calling function needs to know whether the called function supports the Stackless protocol, because the called function must reset the global variable _PyStackless_TRY_STACKLESS.

For the vectorcall protocol the calling function does not know, if the called function supports the Stackless protocol. Therefore, for vectorcalls, the Stackless-protocol must work without the strict requirement for the called function to reset _PyStackless_TRY_STACKLESS. The solution is as follows:

The type of the global variable _PyStackless_TRY_STACKLESS is changed from int to intptr_t.
Possible values of _PyStackless_TRY_STACKLESS when calling a C-function.
- 0: stackless calls are not possible, the called function MUST NOT return Py_UnwindToken.
- 1: the called function MUST reset _PyStackless_TRY_STACKLESS and MAY return Py_UnwindToken.
- Any other value: if the called function does not support the Stackless protocol, there are no requirements.
  Otherwise, the called function MUST reset _PyStackless_TRY_STACKLESS. If the value of
  _PyStackless_TRY_STACKLESS is the address of the called function, then the called function MAY return
  Py_UnwindToken.
To prevent leakage of a non zero value of _PyStackless_TRY_STACKLESS to other threads, a thread must reset
_PyStackless_TRY_STACKLESS before it drops the GIL. This is done in the C-function drop_gil.

I implemented this solution in pull request #291. Seems to work well.

akruis · 2021-07-31T14:38:36Z

Upstream commit be718c3 causes assertion failures in prickelpit.c function static int init_type(PyTypeObject *t, int (*initchain)(PyObject *), PyObject * mod)

It is now required to copy a few fields from the base type: tp_descr_get, tp_vectorcall_offset and tp_call.

Adapt Stackless Python to the new Vectorcall protocol introduced by upstream commit aacc77f. This changes changes the Stackless protocol to support Vectorcall functions. It also updates code in pricklpit.c to correctly initialize wrapper types. This avoids assertion failures introduced by upstream commit be718c3 (not yet merged).

akruis · 2021-07-31T20:01:57Z

I merged the code changes (pull request #293). The documentation changes still need to be reviewed (pull request #291).

Before dropping the GIL, assert that non-Vectorcall functions did reset _PyStackless_TRY_STACKLESS.

kristjanvalur · 2021-08-03T14:08:06Z

Interesting. The thoughts behind this certainly belong to the documentation. Is this perhaps a better protocol than the old one?
Some points

"MAY return unwind token" should perhaps be "Is allowed to return ... (because the caller is prepared to handle it)"
the issue witth the GIL (3 above), that's because the value is set without knowing if the called function supports it... would it be possible to use the thread state instead of a global value to avoid this?

akruis · 2021-08-04T17:35:28Z

Interesting. The thoughts behind this certainly belong to the documentation.

I added the information to the comment for the try_stackless field in pycore_slp_pystate.h. It is an implementation detail. This is why I didn't add it to the API documentation.

Is this perhaps a better protocol than the old one?

Not sure. It requires the address of a called function (woul prevent complete inlining functions) and requires two writes and one read for each function call. With the other protocol many simple wrapper functions do not touch try_stackless at all.

Some points

* "MAY return unwind token" should perhaps be "Is allowed to return ... (because the caller is prepared to handle it)"

I completely rewrote this part.

* the issue witth the GIL (3 above), that's because the value is set without knowing if the called function supports it... would it be possible to use the thread state instead of a global value to avoid this?

Sure, but a global variable has a known address and is always available. A unconditional write of 0 in drop_gil() is extremely fast.

And many thanks for your help. Good documentation is really hard.

kristjanvalur · 2021-08-05T09:22:45Z

On my way to the office this morning, I was thinking about this, and I realized that I never really understood the soft-switch mechanism in stackless.

If there isn't one, maybe a section on how it works would be helpful. Here is a rough outline from what I think happens, when A switches to B and then back to A: (without consulting code and my memory is rusty), e.g. when calling one of the switching functions, like tasklet_run

A sets ts.stackless.current to B
return the UnwindToken
Unwind token is passed on up to the top where the "Mario code" switches the current PyFrame state to the one of tasklet B
B runs its own frame execution function
B switches to A by same means (setting ts->stackless.next or current...), returns UnwindToken
code returns token back to the mario code...

This is where it gets fishy for me. How can we now, re-enter the C-stack of the function that return the original UnwindToken in 2. Maybe it was a stackless-aware function, and it needs to resume now. Is that even possible?

…tation) Update the API Documentation. There are changes to the Stackless-protocol and a few new macros (STACKLESS_VECTORCALLxxxx).

akruis · 2021-08-05T12:32:23Z

The documentation of vectorcall related changes (pull request #291) is merged. I opened issue #299 for the missing high level implementation documentation.

akruis self-assigned this Jul 17, 2021

akruis added blocker main-slp stackless python task labels Jul 17, 2021

akruis mentioned this issue Jul 31, 2021

Stackless issue #290: bpo-36974, PEP 590 Vectorcall-protocol (code changes) #293

Merged

akruis pushed a commit that referenced this issue Aug 1, 2021

Stackless issue #290: fix two compiler warnings on Windows 64bit

07f90b0

akruis pushed a commit that referenced this issue Aug 1, 2021

Stackless issue #290: add an assert()

b8829fb

Before dropping the GIL, assert that non-Vectorcall functions did reset _PyStackless_TRY_STACKLESS.

akruis added a commit that referenced this issue Aug 5, 2021

Stackless issue #290: bpo-36974, PEP 590 Vectorcall-protocol (documen…

6390138

…tation) Update the API Documentation. There are changes to the Stackless-protocol and a few new macros (STACKLESS_VECTORCALLxxxx).

akruis closed this as completed Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

akruis commented Jul 17, 2021

akruis commented Jul 27, 2021

Uh oh!

akruis commented Jul 30, 2021

Uh oh!

akruis commented Jul 31, 2021

Uh oh!

akruis commented Jul 31, 2021

Uh oh!

kristjanvalur commented Aug 3, 2021

Uh oh!

akruis commented Aug 4, 2021

Uh oh!

kristjanvalur commented Aug 5, 2021

Uh oh!

akruis commented Aug 5, 2021

Uh oh!

Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

Comments

akruis commented Jul 17, 2021

akruis commented Jul 27, 2021

Uh oh!

akruis commented Jul 30, 2021

Uh oh!

akruis commented Jul 31, 2021

Uh oh!

akruis commented Jul 31, 2021

Uh oh!

kristjanvalur commented Aug 3, 2021

Uh oh!

akruis commented Aug 4, 2021

Uh oh!

kristjanvalur commented Aug 5, 2021

Uh oh!

akruis commented Aug 5, 2021

Uh oh!