Closed
Description
Right now the implementation of Message.__contains__
looks like this:
Lines 450 to 451 in 2f2fa03
There are several problems here:
- We build intermediate structure (
list
in this case) - We use
list
forin
operation, which is slow
The fastest way to do check if actually have this item is simply by:
def __contains__(self, name):
name_lower = name.lower()
for k, v in self._headers:
if name_lower == k.lower():
return True
return False
We do not create any intermediate lists / sets. And we even don't iterate longer than needed.
This change makes in
check twice as fast.
Microbenchmark
Before
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.40 us +- 0.14 us
pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.42 us +- 0.06 us
After
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 904 ns +- 55 ns
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 715 ns +- 24 ns
The second case is now twice as fast.
It probably also consumes less memory now, but I don't think it is very significant.
Importance
Since EmailMessage
(a subclass of Message
) is quite widely used by users and 3rd party libs, I think it is important to be included.
And since the patch is quite simple and pure-python, I think the risks are very low.