Skip to content

Compact code object marshaled form and pre-quicken bytecode when unmarshaling #462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #465
markshannon opened this issue Sep 19, 2022 · 8 comments
Closed
Tracked by #465
Assignees

Comments

@markshannon
Copy link
Member

Currently we go through a two stage process to get to the adaptive form of instructions.
The explanation below uses CALL, but it applies to all specializable instructions.

  1. When unmarshaling the code object we create the bytecode CALL, oparg, 0, 0, 0 ... where the zeros are the cache.
  2. When executing, check a counter and quicken when that counter reaches zero.
  3. When quickening, replace all CALLs with CALL_ADAPTIVEs.

Instead we should quicken when unmarshaling, so that we create:

(CALL_ADAPTIVE oparg)
7 # uint16_t not byte pair
...

instead of

(CALL oparg)
(0 0)

The marshaled form only needs one byte for instructions without an oparg and two for other instructions. No space is needed for the cache.

@ericsnowcurrently
Copy link
Collaborator

Presumably the same idea would apply to deep-frozen code objects?

@gvanrossum
Copy link
Collaborator

I see loading marshalled code as equivalent to compiling it. So if we want unmarshalling to put the adaptive bytecodes in, we should do the same for the compiler -- IOW we should quicken everything immediately.

Unmarshalling results in exactly the same tree of code objects (module -> classes -> methods etc.) and compiling.

@markshannon
Copy link
Member Author

Yes, it is the same, provided that [un]marshaling handles the bytecode as an array of code units which it compresses, not just an array of bytes.

@gvanrossum
Copy link
Collaborator

Yes, it is the same

Was this in response to Eric's question about deepfreeze, or mine about unmarshal vs. compiler?

@markshannon
Copy link
Member Author

Yes, it is the same

Was this in response to Eric's question about deepfreeze, or mine about unmarshal vs. compiler?

Yours.
The compiler would emit the adaptive instructions directly (we can drop the non-adaptive forms), then mashal would store them in the compact form, and unmarshal would expand them.

@markshannon
Copy link
Member Author

But we need to change marshaling first, so that it understands that code is made up of 16bit code units, not just bytes.

@gvanrossum
Copy link
Collaborator

gvanrossum commented Sep 20, 2022

But we need to change marshaling first, so that it understands that code is made up of 16bit code units, not just bytes.

To marshal a code object, we get the bytecode as a bytes object by calling _PyCode_GetCode(co). Any compression that requires knowledge of the code format can be placed in that function, as long as we also update the corresponding unmarshalling code, which calls _PyCode_Validate() and _PyCode_New().

(Likely we would design a new, slightly different API, but my point is that we can implement and test the bytecode compression first, and then using it from marshal.c would be straightforward.)

@markshannon
Copy link
Member Author

Obsolete. See #566

@github-project-automation github-project-automation bot moved this from In Progress to Done in Fancy CPython Board Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants