-
Notifications
You must be signed in to change notification settings - Fork 52
Specialized eval loops for categories of functions #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems rather vague. What exactly are the "targeted optimizations"? I really don't think we should be doing "blue sky" thinking. There is plenty of existing research to build on. |
I don't know, and I'm not pushing to pursue this, but one thought would be that some VMs I've read about have a "profiling phase" during which they count things that might be relevant to the optimization. Instead of an "if profiling:" flag check we could have a separate profiling eval loop. |
Type profiling is usually continuous at lower tiers. |
FWIW, the specific idea here would be more appropriate to revisit later, if at all, when/if it becomes more practical to develop multiple eval loop implementations (e.g. using generated code). Extra discussion on this isn't worth the time right now.
For the most part I agree. My intent here was to capture some thoughts that came to mind as Guido and I discussed possible improvements to explore. Part of the challenge, at least for me, is an effective lack of familiarity with "existing research" to use as a guide. I'm definitely in favor of both relying on the efforts of those many smart people and becoming more familiar with that research. At the same time still plan on sharing ideas I have. That isn't so frequent that it's a distraction and at the least ensuring discussion helps me learn more about this space. |
Multiple interpreter loops are going to be very unfriendly to the icache, and almost certainly slower. |
How certain are you that a typical function execution doesn't completely
void the icache whenever an excursion in the runtime (maybe as simple as
PyObject_GetAttr) is made? IOW do we care about the icache at the scale of
function executions?
|
I'm not certain about anything regarding what CPUs do with their caches. |
I'm closing this as there is nothing actually to be done here. |
Uh oh!
There was an error while loading. Please reload this page.
Here's an idea I had during discussions with Guido about his "add-opcodes" (super-instructions) branch (#16). The key observation is that many functions (i.e. code objects) could be grouped by different sets of common characteristics. Then, for each of those groups we could derive an eval loop implementation that is optimized for that group of code objects.
(FYI, this relates to @markshannon's idea about generated code for the eval loop.)
The approach would look something like the following.
During core development:
At runtime:
_PyEval_EvalFrameDefault()
(or maybe in_PyEval_EvalFrame()
) we pick the eval loop that corresponds to the code objects flagThere are other factors to consider, (e.g. the cache-level impact of switching between multiple eval loop implementations) but let's start with the high-level idea.
The text was updated successfully, but these errors were encountered: