Skip to content

Specialized eval loops for categories of functions #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericsnowcurrently opened this issue Mar 16, 2021 · 8 comments
Closed

Specialized eval loops for categories of functions #17

ericsnowcurrently opened this issue Mar 16, 2021 · 8 comments
Labels

Comments

@ericsnowcurrently
Copy link
Collaborator

ericsnowcurrently commented Mar 16, 2021

Here's an idea I had during discussions with Guido about his "add-opcodes" (super-instructions) branch (#16). The key observation is that many functions (i.e. code objects) could be grouped by different sets of common characteristics. Then, for each of those groups we could derive an eval loop implementation that is optimized for that group of code objects.

(FYI, this relates to @markshannon's idea about generated code for the eval loop.)

The approach would look something like the following.

During core development:

  1. analyze a large amount of Python code (characteristic of target workloads)
  2. identify classes of functions with common characteristics ("numeric ops", "string ops", "object manipulation", "only set X of opcodes", etc.)
  3. for each category, generate an eval loop implementations with targeted optimizations (the normal eval loop is default)

At runtime:

  1. compiler identifies execution class for the current code object (and flags it)
  2. in _PyEval_EvalFrameDefault() (or maybe in _PyEval_EvalFrame()) we pick the eval loop that corresponds to the code objects flag

There are other factors to consider, (e.g. the cache-level impact of switching between multiple eval loop implementations) but let's start with the high-level idea.

@gvanrossum gvanrossum changed the title Specialized eval loops for categories of functions. Specialized eval loops for categories of functions Mar 16, 2021
@markshannon
Copy link
Member

This seems rather vague.

What exactly are the "targeted optimizations"?
How would the different dispatch loops differ?
Why would having several dispatch loops be faster than having one?

I really don't think we should be doing "blue sky" thinking. There is plenty of existing research to build on.

@gvanrossum
Copy link
Collaborator

I don't know, and I'm not pushing to pursue this, but one thought would be that some VMs I've read about have a "profiling phase" during which they count things that might be relevant to the optimization. Instead of an "if profiling:" flag check we could have a separate profiling eval loop.

@markshannon
Copy link
Member

Type profiling is usually continuous at lower tiers.
For example, the specializing interpreter #28 implicitly gathers type feedback, which can be used by higher tiers.

@ericsnowcurrently
Copy link
Collaborator Author

FWIW, the specific idea here would be more appropriate to revisit later, if at all, when/if it becomes more practical to develop multiple eval loop implementations (e.g. using generated code). Extra discussion on this isn't worth the time right now.

I really don't think we should be doing "blue sky" thinking. There is plenty of existing research to build on.

For the most part I agree. My intent here was to capture some thoughts that came to mind as Guido and I discussed possible improvements to explore. Part of the challenge, at least for me, is an effective lack of familiarity with "existing research" to use as a guide. I'm definitely in favor of both relying on the efforts of those many smart people and becoming more familiar with that research. At the same time still plan on sharing ideas I have. That isn't so frequent that it's a distraction and at the least ensuring discussion helps me learn more about this space.

@markshannon
Copy link
Member

Multiple interpreter loops are going to be very unfriendly to the icache, and almost certainly slower.
There is no mechanism to choose the interpreter, or specification of how they would differ.
Can we close this?

@gvanrossum
Copy link
Collaborator

gvanrossum commented Jul 16, 2021 via email

@markshannon
Copy link
Member

I'm not certain about anything regarding what CPUs do with their caches.

@markshannon
Copy link
Member

I'm closing this as there is nothing actually to be done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants