-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-121954: Don't mutate tuples in _PyCode_New #122180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Py_ssize_t size = PyTuple_GET_SIZE(orig_tuple); | ||
new_tuple = PyTuple_New(size); | ||
if (!new_tuple) { | ||
Py_DECREF(new_item); | ||
return -1; | ||
} | ||
for (Py_ssize_t i = size; --i >= 0; ) { | ||
_PyTuple_ITEMS(new_tuple)[i] = Py_NewRef( | ||
_PyTuple_ITEMS(orig_tuple)[i]); | ||
} | ||
*new_tuple_p = new_tuple; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the tuple should be a rare event, mostly for creating code from user data.
Should it use some kind of faster-cpython stat counter?
(Symtable change looks fine to me, would rather have someone from faster-cpython review the rest, once the PR is ready.) |
Thanks! (I'm aware of some of the problems in the PR; since it's not a 3.13 regression I'll get back to this after 3.13rc1 is out.) |
I'm blocked on not understanding some of the internals for free-threaded builds. The free-threaded build tries to “intern” all code constants, but it doesn't -- see for example this, adapted from
Which constants are meant to be interned here? Worse, when a container (i.e. tuple) is interned, it's later cleared in the @colesbury, do you know? |
All types of code constants (frozenset, slices, strings, numbers) are interned, but there are some code paths that skip interning and immortalization:
I think
We are deallocating the immortalized constants near the end of the interpreter shutdown process. Your comment is about The constants form an acyclic graph so we have to be careful about the order in which we free them. The emptying of containers breaks up this graph, which allows us to call
|
I'll take a closer look at the free-threading test failures |
@encukou, sorry I think the reported leaks were all pre-existing. They seem to be fixed by: When I merge that PR into yours, all the tests pass. Would you please review that PR when you get a chance? |
Thank you! That makes things much clearer.
Those are constants used by the compiler and marshal. But, users can put anything in For example: def f():
print(())
code = f.__code__
code = code.replace(co_consts=(None, []))
exec(code) Would it make sense to limit this interning to the marshallable immutable types only?
The frozenset code seems to rely on the order of elements, so it might intern two distinct but equal sets. Would that be a problem? (If so I'll try to find an example.)
The danger I see is that the values interned here are assumed to be “owned” by this machinery, but it seems they could be shared with other code, and perhaps needed in some later stage of interpreter shutdown. |
I'm concerned that allowing non-immortal or mutable types in constants will cause problems downstream in the interpreter. My preference would be to more thoroughly validate the replacement
This is generally our requirement for APIs, but I still thinks it's a good goal -- the more footguns we remove, the better -- but I think we'd be better off if we get there by more thorough validation, rather than expanding scope/support.
I think the behavior matches logic in the bytecode compiler, which also de-dupes constants. We only de-dupe two sets if they contain the same elements and their iteration orders match. The iteration order requirement is not strictly necessary, but it simplifies the comparison logic.
This happens very late in interpreter shutdown for that reason. The thread state and interpreter state are already cleared and static types have been finalized. |
Sounds reasonable. I filed #122850, to not block this PR.
Sure, but people could reasonably expect a “copy-and-patch” operation on
I see. I'll add a comment to make it less likely for someone to add/rearrange the finalization phases. (Leaving now; I'll update the PR next week.) |
Today, I wasn't able to figure out the refleak free-threaded buildbots found in |
@encukou let me know if you'd like me to take a look at the remaining refleak |
Don't spend hours on it, but if there's a quick trick or tool you know of, or if can point to where you'd look next, I'd appreciate it. |
This hides the refleak: diff --git a/Lib/test/test__interpreters.py b/Lib/test/test__interpreters.py
index f493a92e0dd..9ffec59c8bf 100644
--- a/Lib/test/test__interpreters.py
+++ b/Lib/test/test__interpreters.py
@@ -26,7 +26,7 @@ def _captured_script(script):
indented = script.replace('\n', '\n ')
wrapped = dedent(f"""
import contextlib
- with open({w}, 'w', encoding="utf-8") as spipe:
+ with open({w}, 'w', -1, "utf-8") as spipe:
with contextlib.redirect_stdout(spipe):
{indented}
""") |
Change
_PyCode_New
(intern_strings
&intern_constants
) to return new tuplesrather than mutating their arguments.
If possible, return the argument rather than create a copy.
In order to make copying less common, intern strings destined for
_PyCode_New
earlier -- in:_PyCode_ConstantKey
, used in the compiler to de-duplicate constants;symtable_implicit_arg
, which generates the names.0
,.1
etc..0
, is baked in as a_Py_STR
This means all string constants produced by the compiler are interned;
previously it was only names and identifier-like strings.
(We already hash all code constants in order to de-duplicate them,
and we treat deleting code objects as rare event. Also, constants were
interned in the free-threaded build.)
Remove a test that verified a string constant is not interned.
Merge
validate_and_copy_tuple
, the routine for cleaning up user inputfor
code.__new__
, into the internalintern_names
.Note that this changes the exception type for non-string names
in
PyCode_NewWithPosOnlyArgs
and such fromSystemError
toTypeError
;IMO that doesn't need documenting.
Rename
intern_strings
tointern_names
to further distinguish it fromintern_constants
. Unlike constants, names are immortalized.