Description
Feature or enhancement
The dataclasses
library provides an easy way to create classes. The library will automatically generate relevant methods for the users.
Creating dataclass
es with argument frozen=True
will automatically generate methods __setattr__
and __delattr__
in _frozen_get_del_attr
.
This issue proposes to change the tuple
-based lookup to set
-based lookup. Reduce the time complexity from
In [1]: # tuple-based
In [2]: %timeit 'a' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
9.91 ns ± 0.0982 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [3]: %timeit 'd' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
33.2 ns ± 0.701 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [4]: %timeit 'g' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
56.4 ns ± 0.818 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: # set-based
In [6]: %timeit 'a' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.3 ns ± 0.0723 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [7]: %timeit 'd' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11 ns ± 0.106 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
In [8]: %timeit 'g' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.1 ns ± 0.126 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
A tiny benchmark script:
from contextlib import suppress
from dataclasses import FrozenInstanceError, dataclass
@dataclass(frozen=True)
class Foo2:
a: int
b: int
foo2 = Foo2(1, 2)
def bench2(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
@dataclass(frozen=True)
class Foo7:
a: int
b: int
c: int
d: int
e: int
f: int
g: int
foo7 = Foo7(1, 2, 3, 4, 5, 6, 7)
def bench7(inst):
with suppress(FrozenInstanceError):
inst.a = 0
with suppress(FrozenInstanceError):
inst.b = 0
with suppress(FrozenInstanceError):
inst.c = 0
with suppress(FrozenInstanceError):
inst.d = 0
with suppress(FrozenInstanceError):
inst.e = 0
with suppress(FrozenInstanceError):
inst.f = 0
with suppress(FrozenInstanceError):
inst.g = 0
class Bar(Foo7):
def __init__(self, a, b, c, d, e, f, g):
super().__init__(a, b, c, d, e, f, g)
self.baz = 0
def bench(inst):
inst.baz = 1
Result:
set
-based lookup:
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
249 ns ± 6.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
tuple
-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [4]: %timeit bench(bar)
269 ns ± 4.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Result:
`set`-based lookup:
```python
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
tuple
-based lookup (original):
In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The set
-based is constantly faster than the old approach. And the theoretical time complexity is also smaller (
Ref: #102573
Pitch
(Explain why this feature or enhancement should be implemented and how it would be used.
Add examples, if applicable.)
In the autogenerate __setattr__
and __delattr__
, they have a sanity check at the beginning of the method. For example:
def __setattr__(self, name, value):
if type(self) is {{UserType}} or name in ({{a tuple of field names}}):
raise FrozenInstanceError(f"cannot assign to field {name!r}")
super(cls, self).__setattr__(name, value)
If someone inherits the frozen dataclass, the sanity check will take tuple__contains__(...)
and finally calls super().__setattr__(...)
. For example:
@dataclass(frozen=True)
class FrozenBase:
x: int
y: int
... # N_FIELDS
class Foo(FrozenBase):
def __init__(self, x, y, somevalue, someothervalue):
super().__init__(x, y)
self.somevalue = somevalue # takes O(N_FIELDS)
self.someothervalue = someothervalue # takes O(N_FIELDS) time again
foo = Foo(1, 2, 3, 4)
foo.extravalue = extravalue # takes O(N_FIELDS) time again
Previous discussion
N/A.