Skip to content

add Hashable type to __get_item__ #592 #596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas-stubs/core/frame.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ class DataFrame(NDFrame, OpsMixin):
def T(self) -> DataFrame: ...
def __getattr__(self, name: str) -> Series: ...
@overload
def __getitem__(self, idx: Scalar | tuple[Hashable, ...]) -> Series: ...
def __getitem__(self, idx: Scalar | Hashable | tuple[Hashable, ...]) -> Series: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could reduce it to def __getitem__(self, idx: Scalar | Hashable) -> Series: ... as a tuple is declared as hashable in typeshed.

(Many members of Scalar are probably also hashable - not sure whether all of them are hashable.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I just removed tuple[Hashable, ...] from the signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should we include a list[Hashable] in the 3rd overload?

The 3rd overload:

    @overload
    def __getitem__(
        self,
        idx: Series[_bool]
        | DataFrame
        | Index
        | np_ndarray_str
        | np_ndarray_bool
        | list[_ScalarOrTupleT],
    ) -> DataFrame: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should we include a list[Hashable] in the 3rd overload?

Would need list[HashableT] to simulate the covariant behavior.

Do we have tests for __getitem__ and slice? I believe that should be failing now since slice is hashable. Probably would need to make the slice overload the first overload.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a more incremental approach here. I'm concerned that we might be making some of the types too wide. Let's have this PR address the issue as reported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually more complicated than I thought.

Do we have tests for getitem and slice?

Yes, there is a slice test, but it is without assert types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added more assert_types in the test to make sure the other overloads are still working fine.

@overload
def __getitem__(self, rows: slice) -> DataFrame: ...
@overload
Expand Down
14 changes: 14 additions & 0 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from collections import defaultdict
import csv
import datetime
from enum import Enum
import io
import itertools
from pathlib import Path
Expand Down Expand Up @@ -160,6 +161,19 @@ def test_types_getitem() -> None:
df[i]


def test_types_getitem_with_hashable() -> None:
# Testing getitem support for hashable types that are not scalar
# Due to the bug in https://github.com/pandas-dev/pandas-stubs/issues/592
class MyEnum(Enum):
FIRST = "tayyar"
SECOND = "haydar"

df = pd.DataFrame(
data=[[12.2, 10], [8.8, 15]], columns=[MyEnum.FIRST, MyEnum.SECOND]
)
check(assert_type(df[MyEnum.FIRST], pd.Series), pd.Series)


def test_slice_setitem() -> None:
# Due to the bug in pandas 1.2.3(https://github.com/pandas-dev/pandas/issues/40440), this is in separate test case
df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4], 5: [6, 7]})
Expand Down