Description
Python, like almost all other languages, operates on a stack of call frames.
Most languages use a continuous stack for calls because it is much more efficient.
Obviously, C and all languages that are pre-compiled use the OS/hardware provided stack.
But even interpreted languages use a continuous stack. Java and Lua are obvious examples.
Python should do the same. Allocating frames on the heap, "zombie" frames, and the excessive copying of arguments are all slow and unnecessary.
Implementation
Each thread needs two stacks, a data stack and a control stack. These can be continuous or chunked. Chunked gives us most of the performance advantages of continuous and is more flexible, but is more complex to implement.
The data stack is a (conceptually) infinite stack of PyObject *
s (or PyValue
s with tagging).
The control stack is (conceptually) infinite stack of ControlBlock
s.
To efficiently implement overflow checks, stacks should be power-of-2 sized and aligned on their size.
For performance it is probably a good idea to arrange that the control stack cannot overflow unless the data stack does first. That way there is no need to check it for overflow. This can be done by choosing some ratio R and ensuring that for all code-objects (locals + 3) >= (1+block_stack_size)*R
, which can be done by inflating the number of locals if necessary, and that the data stack has no more than R times as many entries as the control stack. 4 is probably a good value for R.
ControlBlocks
A control block will be mostly one of:
- Return block, contains IP of caller.
- Try block, contains handler and stack-depth
Additional types of ControlBlock
s can be used for transfer in and out of generators, exits from the interepreter at the C level, and other flow control tasks.
Frame layout on data stack
Each frame consists of local variables (including cells) followed by globals
, builtins
, and locals
, followed by the evaluation stack.
Python-to-python calls
Assuming that calls are effectively specialized (by #28) then making a call will require the following operations:
- Make the frame-pointer point to the first argument.
- Shuffle the arguments into the new locals and fill in the defaults. For many calls this will be a no-op as the arguments will match the parameters.
- Increase the stack-pointer to point just past the new locals.
Generators and coroutines
Generators will need to contain space for their own local variables and control stack (for exceptions, not calls).
The compiler will need to be modified to:
- Emit special opcodes for local loads and stores.
- Make sure that the evaluation stack is empty across yields, by saving iterators and other temporaries into local variables.
Frame objects
Frame objects are widely used for debugging and introspection. So we must support them in a reasonably efficient fashion.
Upon making a call, we push a ControlBlock
. This "call" block will contain a pointer (initially NULL) pointing to the frame object. Should we ever need a frame object, say from sys._getframe()
, we lazily create one that points back to the control block for that frame. On exiting the function, the frame can be discarded or, if still live, the locals can be copied into the frame.
Implementing sys._getframe()
To find the n
th frame we walk the control stack until we find the n
th "call" block, then read the frame-object from that.
If it is NULL, we create a new one and store it into the control block.
Example control blocks:
"Call" block:
- int frame_pointer_offset
- int instruction_offset
- PyObject *frame_object (usually NULL)
"Generator" block:
- int yield_pointer_offset -- Where to jump to on yield
- int return_pointer_offset -- Where to jump to on return
"Try" block:
- int handler_offset
- int stack_depth
Metadata
Metadata
Assignees
Labels
Type
Projects
Status