Skip to content

Weird n_fields attributes, etc. #6546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexWaygood opened this issue Dec 8, 2021 · 12 comments · Fixed by #6560
Closed

Weird n_fields attributes, etc. #6546

AlexWaygood opened this issue Dec 8, 2021 · 12 comments · Fixed by #6560

Comments

@AlexWaygood
Copy link
Member

There are a bunch of NamedTuple-like classes in some of the stranger corners of the stdlib that have the following three attributes:

  • n_fields
  • n_sequence_fields
  • n_unnamed_fields

For example:

(os|posix).stat_result.n_fields
(os|posix).stat_result.n_sequence_fields
(os|posix).stat_result.n_unnamed_fields
(os|posix).terminal_size.n_fields
(os|posix).terminal_size.n_sequence_fields
(os|posix).terminal_size.n_unnamed_fields
(os|posix).times_result.n_fields
(os|posix).times_result.n_sequence_fields
(os|posix).times_result.n_unnamed_fields
(os|posix).uname_result.n_fields
(os|posix).uname_result.n_sequence_fields
(os|posix).uname_result.n_unnamed_fields

@Akuli counts 22 classes that are missing these attributes. Some classes, however, do currently define these attributes in the stub currently. time._struct_time defines them as properties (which seems incorrect -- they seem more like ClassVars to me):

typeshed/stdlib/time.pyi

Lines 35 to 50 in 1fdd7e4

class _struct_time(NamedTuple):
tm_year: int
tm_mon: int
tm_mday: int
tm_hour: int
tm_min: int
tm_sec: int
tm_wday: int
tm_yday: int
tm_isdst: int
@property
def n_fields(self) -> int: ...
@property
def n_sequence_fields(self) -> int: ...
@property
def n_unnamed_fields(self) -> int: ...

As discussed here, these attributes don't seem particularly useful. I'm also not sure there's much harm in having them in the stubs, but it might be weird to have them popping up in autocompletion suggestions.

In any event, they should probably either all be added to the stubs, or all removed from stubs, and added to the "wontfix" section of the allowlists instead of the "missing from stubs" section of the allowlists.

Thoughts?

@srittau
Copy link
Collaborator

srittau commented Dec 8, 2021

If they're there they should probably be added.

@JelleZijlstra
Copy link
Member

These are instances of structseq (https://github.com/python/cpython/blob/main/Objects/structseq.c), an internal CPython type that is almost but not quite like a namedtuple. It may be worth adding a common base class for them in _typeshed, though at runtime they just inherit from tuple.

@AlexWaygood
Copy link
Member Author

Given this comment at the top of the source code for structseq, I'm inclined to say that these attributes should not be included in the stubs, and that they should just be treated like regular NamedTuples:

The structseq helper is considered an internal CPython implementation
detail. Docs for modules using structseqs should call them
"named tuples" (be sure to include a space between the two
words and add a link back to the term in Docs/glossary.rst).

@srittau
Copy link
Collaborator

srittau commented Dec 8, 2021

Treating them as namedtuples is also worth a try.

@JelleZijlstra
Copy link
Member

The constructor is different from namedtuples though. You have to initialize them with a single tuple.

@AlexWaygood
Copy link
Member Author

The constructor is different from namedtuples though. You have to initialize them with a single tuple.

Oh, I see. Yes, os.uname_result only takes a single argument, which must be a sequence of some kind, and must be of precisely the right length. So it's like a NamedTuple, but with a very different constructor:

>>> from os import uname_result
>>> uname_result('a', 'b', 'c', 'd', 'e')
TypeError: structseq() takes at most 2 arguments (5 given)
>>> uname_result(('a', 'b', 'c', 'd', 'e'))
posix.uname_result(sysname='a', nodename='b', release='c', version='d', ma
chine='e')

There appears to be an optional second argument, which has to be a dict of some kind.

So, creating a common base class for them in _typeshed sounds like a good shout.

@Akuli
Copy link
Collaborator

Akuli commented Dec 8, 2021

With a custom base class, it would be hard to support type-safe indexing, as type checkers hard-code this behavior for tuple and NamedTuple.

>>> os.uname_result('abcde')
posix.uname_result(sysname='a', nodename='b', release='c', version='d', machine='e')
>>> os.uname_result('abcde')[0]
'a'

This would be useful for unpacking:

sysname, nodename, release, version, machine = os.uname()

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Dec 8, 2021

We could do something like this:

from typing import ClassVar, Generic, Sequence, SupportsIndex, Type, TypeVar, overload

_T = TypeVar("_T")
_T_co = TypeVar("_T_co", covariant=True)

class structseq(Sequence[_T_co], Generic[_T_co]):
    n_fields: ClassVar[int]
    n_unnamed_fields: ClassVar[int]
    n_sequence_fields: ClassVar[int]
    def __new__(cls: Type[_T], seq: Sequence[_T_co]) -> _T: ...
    def __len__(self) -> int: ...
    @overload
    def __getitem__(self, __i: SupportsIndex) -> _T_co: ...
    @overload
    def __getitem__(self, __i: slice) -> Sequence[_T_co]: ...

class uname_result(structseq[str]):
    sysname: str
    nodename: str
    release: str
    version: str
    machine: str
    
x = uname_result('abcde')
sysname, nodename, release, version, machine = x
y = x.sysname
z = x.n_fields

reveal_type(x)  # Revealed type is "__main__.uname_result"
reveal_type(y)  # Revealed type is "builtins.str"
reveal_type(z)  # Revealed type is "builtins.int"
reveal_type(sysname)  # Revealed type is "builtins.str*"

Advantages:

  • The constructor is more accurate than the current constructor. uname_result('a', 'b', 'c', 'd', 'e') currently passes mypy but fails at runtime.
  • We can share code between these classes by having a common base class.
  • We add the missing attributes n_fields, n_sequence_fields and n_unnamed_fields to these classes.

Disadvantages:

  • There's no way to tell mypy that the seq argument to the constructor has to be exactly 5 items long. uname_result(('a', 'b', 'c', 'd', 'e', 'f')) will pass mypy but fail at runtime.
  • There's no way to tell mypy that uname_result instances will always contain exactly five items. x = uname_result(('a', 'b', 'c', 'd', 'e')); x[5] will pass mypy but fail at runtime.

@Akuli
Copy link
Collaborator

Akuli commented Dec 8, 2021

There's no way to tell mypy that the seq argument to the constructor has to be exactly 5 items long.

I don't think we can do this anyway. It works if we require it to be tuple[str, str, str, str, str], but then passing in a list will fail.

@Akuli
Copy link
Collaborator

Akuli commented Dec 8, 2021

class uname_result(structseq[str]):

This doesn't work so nicely when the struct sequence contains different types. Surprisingly, most struct sequences only have one type: uname_result has strings, terminal_size and stat_result have ints, times_result has floats. But resource.struct_rusage has ints and floats. In practice, structseq[float] might be good enough, because the attributes are more commonly accessed and they can be still int where needed.

@AlexWaygood
Copy link
Member Author

There's no way to tell mypy that the seq argument to the constructor has to be exactly 5 items long.

I don't think we can do this anyway. It works if we require it to be tuple[str, str, str, str, str], but then passing in a list will fail.

Yes, precisely. It's a "disadvantage" in that it is not a perfect solution, but it's not a regression vis-à-vis the current situation.

@AlexWaygood
Copy link
Member Author

We could do something like this:

My above suggestion has the disadvantage that the following would pass mypy, but fail at runtime:

from os import uname_result
x = uname_result('abcde')
x.sysname = 'foo'

A slightly more cumbersome, but more accurate, definition would be something like the following:

from typing import ClassVar, Generic, Sequence, SupportsIndex, Type, TypeVar, overload

_T = TypeVar("_T")
_T_co = TypeVar("_T_co", covariant=True)

class structseq(Sequence[_T_co], Generic[_T_co]):
    n_fields: ClassVar[int]
    n_unnamed_fields: ClassVar[int]
    n_sequence_fields: ClassVar[int]
    def __new__(cls: Type[_T], seq: Sequence[_T_co]) -> _T: ...
    def __len__(self) -> int: ...
    @overload
    def __getitem__(self, __i: SupportsIndex) -> _T_co: ...
    @overload
    def __getitem__(self, __i: slice) -> Sequence[_T_co]: ...

class uname_result(structseq[str]):
    @property
    def sysname(self): -> str: ...
    @property
    def nodename(self): -> str: ...
    @property
    def release(self): -> str: ...
    @property
    def version(self): -> str: ...
    @property
    def machine(self): -> str: ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants