Skip to content
This repository was archived by the owner on Feb 13, 2025. It is now read-only.

Stackless support for PEP 590 Vectorcall: a fast calling protocol for CPython #290

Closed
akruis opened this issue Jul 17, 2021 · 8 comments
Closed

Comments

@akruis
Copy link

akruis commented Jul 17, 2021

bpo-36974 commit aacc77f implements PEP 590, the Vectorcall protocol. This commit causes massive merge conflicts.
Plan:

  1. Resolve conflicts with minimal changes and drop Stackless support in affected code.
  2. Make the code compilable.
  3. Readd Stackless support to the functions mentioned below.
  4. Define how to indicate, that a Vectorcall-function supports the Stackless protocol.

"call"-functions touched by commit aacc77f. We need to (re-)add support for the Stackless protocol.

call.c

  • _PyObject_FastCallDict
  • _PyObject_MakeTpCall
  • PyVectorcall_Call
  • PyObject_Call
  • _PyFunction_FastCallKeywords
  • _PyCFunction_FastCallKeywords
  • PyCFunction_Call

ceval.c

  • trace_call_function
  • call_function

classobject.c

  • method_vectorcall
@akruis
Copy link
Author

akruis commented Jul 27, 2021

Implementation is in pull request #291.

@akruis
Copy link
Author

akruis commented Jul 30, 2021

4\. Define how to indicate, that a Vectorcall-function supports the Stackless protocol.

Actually this turned out to be the hard part. First I thought, that I could simply use the tp_vectorcall_offset field in slp_methodflags, but first there is no such field and second the whole approach does not work. Unlike all other slots, the vectorcall function is a per instance property. Each instance of a type may use a different vectorcall function. When I discovered this difficulty I first feared this could be a show stopper for Stackless.

Then I found a way to modify the "Stackless-protocol" that is used to control, if C-function may return Py_UnwindToken. Traditionally we use the global variable int _PyStackless_TRY_STACKLESS and the local variable int stackless. Both were used as boolean variables with zero or non-zero values. The calling function needs to know whether the called function supports the Stackless protocol, because the called function must reset the global variable _PyStackless_TRY_STACKLESS.

For the vectorcall protocol the calling function does not know, if the called function supports the Stackless protocol. Therefore, for vectorcalls, the Stackless-protocol must work without the strict requirement for the called function to reset _PyStackless_TRY_STACKLESS. The solution is as follows:

  1. The type of the global variable _PyStackless_TRY_STACKLESS is changed from int to intptr_t.

  2. Possible values of _PyStackless_TRY_STACKLESS when calling a C-function.

    • 0: stackless calls are not possible, the called function MUST NOT return Py_UnwindToken.
    • 1: the called function MUST reset _PyStackless_TRY_STACKLESS and MAY return Py_UnwindToken.
    • Any other value: if the called function does not support the Stackless protocol, there are no requirements.
      Otherwise, the called function MUST reset _PyStackless_TRY_STACKLESS. If the value of
      _PyStackless_TRY_STACKLESS is the address of the called function, then the called function MAY return
      Py_UnwindToken.
  3. To prevent leakage of a non zero value of _PyStackless_TRY_STACKLESS to other threads, a thread must reset
    _PyStackless_TRY_STACKLESS before it drops the GIL. This is done in the C-function drop_gil.

I implemented this solution in pull request #291. Seems to work well.

@akruis
Copy link
Author

akruis commented Jul 31, 2021

Upstream commit be718c3 causes assertion failures in prickelpit.c function static int init_type(PyTypeObject *t, int (*initchain)(PyObject *), PyObject * mod)

It is now required to copy a few fields from the base type: tp_descr_get, tp_vectorcall_offset and tp_call.

akruis added a commit that referenced this issue Jul 31, 2021
Adapt Stackless Python to the new Vectorcall protocol introduced by upstream commit aacc77f.
This changes changes the Stackless protocol to support Vectorcall functions.
It also updates code in pricklpit.c to correctly initialize wrapper types. This avoids assertion
failures introduced by upstream commit be718c3 (not yet merged).
@akruis
Copy link
Author

akruis commented Jul 31, 2021

I merged the code changes (pull request #293). The documentation changes still need to be reviewed (pull request #291).

akruis pushed a commit that referenced this issue Aug 1, 2021
Before dropping the GIL, assert that non-Vectorcall functions did reset
_PyStackless_TRY_STACKLESS.
@kristjanvalur
Copy link
Collaborator

Interesting. The thoughts behind this certainly belong to the documentation. Is this perhaps a better protocol than the old one?
Some points

  • "MAY return unwind token" should perhaps be "Is allowed to return ... (because the caller is prepared to handle it)"
  • the issue witth the GIL (3 above), that's because the value is set without knowing if the called function supports it... would it be possible to use the thread state instead of a global value to avoid this?

@akruis
Copy link
Author

akruis commented Aug 4, 2021

Interesting. The thoughts behind this certainly belong to the documentation.

I added the information to the comment for the try_stackless field in pycore_slp_pystate.h. It is an implementation detail. This is why I didn't add it to the API documentation.

Is this perhaps a better protocol than the old one?

Not sure. It requires the address of a called function (woul prevent complete inlining functions) and requires two writes and one read for each function call. With the other protocol many simple wrapper functions do not touch try_stackless at all.

Some points

* "MAY return unwind token" should perhaps be "Is allowed to return ... (because the caller is prepared to handle it)"

I completely rewrote this part.

* the issue witth the GIL (3 above), that's because the value is set without knowing if the called function supports it... would it be possible to use the thread state instead of a global value to avoid this?

Sure, but a global variable has a known address and is always available. A unconditional write of 0 in drop_gil() is extremely fast.

And many thanks for your help. Good documentation is really hard.

@kristjanvalur
Copy link
Collaborator

On my way to the office this morning, I was thinking about this, and I realized that I never really understood the soft-switch mechanism in stackless.

If there isn't one, maybe a section on how it works would be helpful. Here is a rough outline from what I think happens, when A switches to B and then back to A: (without consulting code and my memory is rusty), e.g. when calling one of the switching functions, like tasklet_run

  1. A sets ts.stackless.current to B
  2. return the UnwindToken
  3. Unwind token is passed on up to the top where the "Mario code" switches the current PyFrame state to the one of tasklet B
  4. B runs its own frame execution function
  5. B switches to A by same means (setting ts->stackless.next or current...), returns UnwindToken
  6. code returns token back to the mario code...

This is where it gets fishy for me. How can we now, re-enter the C-stack of the function that return the original UnwindToken in 2. Maybe it was a stackless-aware function, and it needs to resume now. Is that even possible?

akruis added a commit that referenced this issue Aug 5, 2021
…tation)

Update the API Documentation. There are changes to the Stackless-protocol and
a few new macros (STACKLESS_VECTORCALLxxxx).
@akruis
Copy link
Author

akruis commented Aug 5, 2021

The documentation of vectorcall related changes (pull request #291) is merged. I opened issue #299 for the missing high level implementation documentation.

@akruis akruis closed this as completed Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants