Skip to content

gh-117348: restore import time performance of configparser #117703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 14, 2024

Conversation

jaraco
Copy link
Member

@jaraco jaraco commented Apr 10, 2024

As requested in #117372 (comment)

  • Get Iterable from collections.abc.
  • Replace dataclass with SimpleNamespace.
  • Add blurb

@jaraco
Copy link
Member Author

jaraco commented Apr 10, 2024

Analysis:

baseline prior to #117372

 cpython gh-117348/import-perf @ git checkout -q 019143fecbfc26e69800d28d2a9e3392a051780b~1
 cpython 01bd74eadb @ ./python.exe -X importtime -c 'import configparser' 2> /dev/null ; ./python.exe -X importtime -c 'import configparser' 2>1 | grep -E '(cumul|configparser)'
import time: self [us] | cumulative | imported package
import time:       869 |       3914 | configparser

degradation from #117372

 cpython 01bd74eadb @ git checkout -q main
 cpython main @ ./python.exe -X importtime -c 'import configparser' 2> /dev/null ; ./python.exe -X importtime -c 'import configparser' 2>1 | grep -E '(cumul|configparser)'
import time: self [us] | cumulative | imported package
import time:      1553 |      11762 | configparser

restoration with this PR

 cpython main @ git checkout -q gh-117348/import-perf
 cpython gh-117348/import-perf @ ./python.exe -X importtime -c 'import configparser' 2> /dev/null ; ./python.exe -X importtime -c 'import configparser' 2>1 | grep -E '(cumul|configparser)'
import time: self [us] | cumulative | imported package
import time:       995 |       4489 | configparser

jaraco added 2 commits April 9, 2024 20:24
Reduces import time by over 50% (10431µs vs 4350µs on Apple M3 Pro).
@jaraco jaraco force-pushed the gh-117348/import-perf branch from 039ff2d to 11bae33 Compare April 10, 2024 00:24
@jaraco jaraco requested a review from AlexWaygood April 10, 2024 00:25
@AlexWaygood
Copy link
Member

Nice, thank you!

  • Replace dataclass with SimpleNamespace.

I don't oppose this necessarily, but I'm curious what inheriting from SimpleNamespace gets us here. It seems like all tests pass for me locally if I apply this diff to your PR branch:

--- a/Lib/configparser.py
+++ b/Lib/configparser.py
@@ -154,7 +154,6 @@
 import os
 import re
 import sys
-import types
 
 __all__ = ("NoSectionError", "DuplicateOptionError", "DuplicateSectionError",
            "NoOptionError", "InterpolationError", "InterpolationDepthError",
@@ -539,7 +538,7 @@ def _interpolate_some(self, parser, option, accum, rest, section, map,
                     "found: %r" % (rest,))
 
 
-class _ReadState(types.SimpleNamespace):
+class _ReadState:
     elements_added : set[str]
     cursect : dict[str, str] | None = None
     sectname : str | None = None
@@ -553,9 +552,10 @@ def __init__(self):
         self.errors = list()
 
 
-class _Prefixes(types.SimpleNamespace):
-    full : Iterable[str]
-    inline : Iterable[str]
+class _Prefixes:
+    def __init__(self, full: Iterable[str], inline: Iterable[str]):
+        self.full = full
+        self.inline = inline

This also feels like it reduces conceptual complexity for me -- it's just a normal Python class; I don't have to go and look up what exactly the behaviour of SimpleNamespace is ;)

@jaraco
Copy link
Member Author

jaraco commented Apr 14, 2024

what inheriting from SimpleNamespace gets us

My intention is to re-use higher level abstractions that reduce the amount of boilerplate (number of times a member variable must be mentioned to be functional) and imperative code. It provides ~half of what dataclasses provides (namely, a default constructor with useful semantics and constraints).

You're right - using SimpleNamespace for _ReadState is unnecessary and adds little if any value. I'll remove it.

In the case of _Prefixes, it provides a default constructor, so allows for a more declarative description of the object. It follows a pattern that's familiar (construction from keyword arguments) and avoids exposing other less desirable patterns (construction from positional arguments). Reliance on the SimpleNamespace also avoids the temptation or risk of having anything but assignment happen in the constructor, e.g.

class _Prefixes:
    def __init__(self, full: Iterable[str], inline: Iterable[str]):
        self.full = full
        self.inline = inline + ('hot garbage',)
        self.other_state = random.random()

By re-using the constructor from SimpleNamespace, a reader can know at a glance the behavior of the constructor and static analysis tools can make assumptions that it can't about an explicit constructor.

I think it's worth using here for that value alone, but it also adds value in utilizing this pattern that's previously been relegated to third-party packages like munch and thus making it less likely that a reader will find it surprising.

Given that we're dropping dataclasses, however, it may not even be necessary to declare a type for this container at all and instead just construct a SimpleNamespace in situ. Oh, but without a declared _Prefixes, _Line would lose its type annotations for that object. i.e.

diff --git a/Lib/configparser.py b/Lib/configparser.py
index b2b85b7af2..f6438699c9 100644
--- a/Lib/configparser.py
+++ b/Lib/configparser.py
@@ -553,17 +553,12 @@ def __init__(self):
         self.errors = list()
 
 
-class _Prefixes(types.SimpleNamespace):
-    full : Iterable[str]
-    inline : Iterable[str]
-
-
 class _Line(str):
 
     def __new__(cls, val, *args, **kwargs):
         return super().__new__(cls, val)
 
-    def __init__(self, val, prefixes: _Prefixes):
+    def __init__(self, val, prefixes):
         self.prefixes = prefixes
 
     @functools.cached_property
@@ -656,7 +651,7 @@ def __init__(self, defaults=None, dict_type=_default_dict,
             else:
                 self._optcre = re.compile(self._OPT_TMPL.format(delim=d),
                                           re.VERBOSE)
-        self._prefixes = _Prefixes(
+        self._prefixes = types.SimpleNamespace(
             full=tuple(comment_prefixes or ()),
             inline=tuple(inline_comment_prefixes or ()),
         )

I think I prefer the presence of _Prefixes, even though it's more verbose and more like Java than classic Python. I know some users would appreciate having the code completion work for line.prefixes.*.

And until I can have something like dataclasses, I'd like to use SimpleNamespace to keep the implementation declarative and succinct.

@AlexWaygood
Copy link
Member

I know some users would appreciate having the code completion work for line.prefixes.*.

Note that tools that provide code completion generally ignore all type hints in the stdlib, and instead rely entirely on the type annotations in typeshed for this purpose

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still lean towards keeping _Prefixes dead simple, and defining it without SimpleNamespace (or using an inline SimpleNamespace as you suggested enough). If you want to enforce that keyword arguments are passed to the constructor, that's pretty easily accomplished without using SimpleNamespace:

class _Prefixes:
    def __init__(self, *, full: Iterable[str], inline: Iterable[str]):
        self.full = full
        self.inline = inline

But that's just my minor personal preference. What you have is fine. Thanks again for fixing!

@jaraco
Copy link
Member Author

jaraco commented Apr 14, 2024

I've opted to use the SimpleNamespace inline. Thanks for the review and thoughtful suggestions.

@jaraco jaraco enabled auto-merge (squash) April 14, 2024 10:41
@jaraco jaraco merged commit 9c93b74 into python:main Apr 14, 2024
@jaraco jaraco deleted the gh-117348/import-perf branch April 14, 2024 14:11
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…hon#117703)

Reduces import time by over 50% (10431µs vs 4350µs on Apple M3 Pro).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants