Skip to content

gh-128118: Improve performance of copy.copy by using a fast lookup for atomic and container types #128119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 30, 2024

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Dec 20, 2024

Similar to the approached used for copy.deepcopy in #114266 we can simplifly the implementation of copy.copy and improve performance by checking on the type of the argument using a lookup.

Results:

copy int: Mean +- std dev: [main] 159 ns +- 10 ns -> [pr_v2] 104 ns +- 7 ns: 1.54x faster
copy slice: Mean +- std dev: [main] 184 ns +- 44 ns -> [pr_v2] 109 ns +- 13 ns: 1.69x faster
copy dict: Mean +- std dev: [main] 196 ns +- 18 ns -> [pr_v2] 182 ns +- 16 ns: 1.07x faster
copy dataclass: Mean +- std dev: [main] 1.88 us +- 0.13 us -> [pr_v2] 1.82 us +- 0.11 us: 1.04x faster
copy small list: Mean +- std dev: [main] 179 ns +- 18 ns -> [pr_v2] 134 ns +- 7 ns: 1.33x faster
copy small tuple: Mean +- std dev: [main] 155 ns +- 12 ns -> [pr_v2] 80.3 ns +- 6.1 ns: 1.93x faster
copy list dataclasses: Mean +- std dev: [main] 151 ns +- 11 ns -> [pr_v2] 160 ns +- 25 ns: 1.05x slower

Geometric mean: 1.32x faster

Benchmark script:

import pyperf

runner = pyperf.Runner()

setup = """
import copy

a={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}

from dataclasses import dataclass

lst = [1, 's']
tpl  =('a', 'b', 3)

i = 123123123
sl = slice(1,2,3)

@dataclass
class A:
    a : int
    
dc = A(123)
list_dc = [A(1), A(2), A(3), A(4)]
"""

runner.timeit(name="copy int", stmt="b=copy.copy(i)", setup=setup)
runner.timeit(name="copy slice", stmt="b=copy.copy(sl)", setup=setup)
runner.timeit(name="copy dict", stmt="b=copy.copy(a)", setup=setup)
runner.timeit(name="copy dataclass", stmt="b=copy.copy(dc)", setup=setup)
runner.timeit(name="copy small list", stmt="b=copy.copy(lst)", setup=setup)
runner.timeit(name="copy small tuple", stmt="b=copy.copy(tpl)", setup=setup)
runner.timeit(name="copy list dataclasses", stmt="b=copy.copy(list_dc)", setup=setup)

@eendebakpt eendebakpt changed the title Improve performance of copy.copy by using a fast lookup for atomic and container types gh-128118: Improve performance of copy.copy by using a fast lookup for atomic and container types Dec 20, 2024
def _copy_immutable(x):
return x
for t in (types.NoneType, int, float, bool, complex, str, tuple,
_copy_atomic_types = {types.NoneType, int, float, bool, complex, str, tuple,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, would performance be better if we use a frozenset instead of a set? (and is it possible?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I'll benchmark a bit later. A frozenset should not require any locking, so perhaps there is a difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment the set and frozenset have the same implementation for __contains__:

cpython/Objects/setobject.c

Lines 2529 to 2531 in 3bd7730

static PyMethodDef frozenset_methods[] = {
SET___CONTAINS___METHODDEF
FROZENSET_COPY_METHODDEF

cpython/Objects/setobject.c

Lines 2416 to 2420 in 3bd7730

static PyMethodDef set_methods[] = {
SET_ADD_METHODDEF
SET_CLEAR_METHODDEF
SET___CONTAINS___METHODDEF
SET_COPY_METHODDEF

so there is no performance difference. In the future however, for the free-threading build one could remove the critical section for the frozenset implementation here:

cpython/Objects/setobject.c

Lines 2198 to 2207 in 3bd7730

static int
set_contains(PyObject *self, PyObject *key)
{
PySetObject *so = _PySet_CAST(self);
return _PySet_Contains(so, key);
}
/*[clinic input]
@critical_section
@coexist

Using a frozenset is possible, but this would add a bit of time to the import. On my system %timeit frozenset(_copy_atomic_types) is about 300 ns

Even faster than a setwould be a data structure that looks only at the id of the objects involved (the set will use rich compare if no match is found, but that is not needed as all objects involved are singletons), but that is not available in cpython I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that is not available in cpython I believe.

That's right, it's not available.


Up to you if you want to make the free-threaded build faster in the future, but we should probably check the performances on this build. For now, let's keep the set for now (hopefully you'll rememeber this)

@erlend-aasland erlend-aasland merged commit 34b85ef into python:main Dec 30, 2024
38 checks passed
@erlend-aasland
Copy link
Contributor

Thanks for the speed-up, Pieter! Thanks for the reviews, Bénédikt and Sergey!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants