-
-
Notifications
You must be signed in to change notification settings - Fork 143
error: Item "str" of "Union[str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta]" has no attribute "copy" [union-attr] #453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to be one of the weak points of the current python typing system. To properly type hint this, one would need to know if the The only possible approach I see is to do something along the lines of IndexType = TypeVar("IndexType", bound=Index)
ColumnType = TypeVar("IndexType", bound=Index)
class DataFrame(NDFrame, OpsMixin, Generic[IndexType, ColumnType]):
@property
def loc(self: DataFrame[IndexType, ColumnType]) -> _LocIndexerFrame[IndexType, ColumnType]: ...
class _LocIndexerFrame(_LocIndexer, Generic[IndexType, ColumnType]):
...
@overload
def __getitem__(self: _LocIndexerFrame[MultiIndex, MultiIndex], key: ...) -> ...: ...
@overload
def __getitem__(self: _LocIndexerFrame[MultiIndex, Index], key: ...) -> ...: ...
@overload
def __getitem__(self: _LocIndexerFrame[Index, MultiIndex], key: ...) -> ...: ...
@overload
def __getitem__(self: _LocIndexerFrame[Index, Index], key: ...) -> ...: ... But it looks super messy. |
It's actually more of an issue that If you use The issue here is that |
Given that the documentation explicitly shows such examples, I think that may be too much to ask. It will certainly make applying pandas-stubs to existing repositories very difficult. I used the examples in the documentation and turned them into a typing unit-test. Currently, this test alone raises 13 typing errors. from pandas import DataFrame, Index, MultiIndex, Series
from typing_extensions import assert_type, reveal_type
# Getting values
df = DataFrame(
[[1, 2], [4, 5], [7, 8]],
index=["cobra", "viper", "sidewinder"],
columns=["max_speed", "shield"],
)
assert_type(df, DataFrame)
assert_type(df.loc["viper"], Series)
assert_type(df.loc[["viper", "sidewinder"]], DataFrame)
assert_type(df.loc["cobra", "shield"], int)
assert_type(df.loc["cobra":"viper", "max_speed"], Series)
assert_type(df.loc[[False, False, True]], DataFrame)
assert_type(
df.loc[Series([False, True, False], index=["viper", "sidewinder", "cobra"])],
DataFrame,
)
assert_type(df.loc[Index(["cobra", "viper"], name="foo")], DataFrame)
assert_type(df.loc[df["shield"] > 6], DataFrame)
assert_type(df.loc[df["shield"] > 6, ["max_speed"]], Series)
assert_type(df.loc[lambda df: df["shield"] == 8], DataFrame)
# Setting values
df.loc[["viper", "sidewinder"], ["shield"]] = 50
df.loc["cobra"] = 10
df.loc[:, "max_speed"] = 30
df.loc[df["shield"] > 35] = 0
# Getting values on a DataFrame with an index that has integer labels
df = DataFrame(
[[1, 2], [4, 5], [7, 8]], index=[7, 8, 9], columns=["max_speed", "shield"]
)
assert_type(df, DataFrame)
assert_type(df.loc[7:9], DataFrame)
# Getting values with a MultiIndex
tuples = [
("cobra", "mark i"),
("cobra", "mark ii"),
("sidewinder", "mark i"),
("sidewinder", "mark ii"),
("viper", "mark ii"),
("viper", "mark iii"),
]
index = MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20], [1, 4], [7, 1], [16, 36]]
df = DataFrame(values, columns=["max_speed", "shield"], index=index)
assert_type(df, DataFrame)
assert_type(df.loc["cobra"], DataFrame)
assert_type(df.loc[("cobra", "mark ii")], Series)
assert_type(df.loc["cobra", "mark i"], Series)
assert_type(df.loc[[("cobra", "mark ii")]], DataFrame)
assert_type(df.loc[[("cobra", "mark ii"), "shield"]], int)
assert_type(df.loc[("cobra", "mark i"):"viper"], DataFrame)
assert_type(df.loc[("cobra", "mark i"):("viper", "mark ii")], DataFrame) |
Thanks for doing this. We've been doing a whack-a-mole approach to improve the typing on I should note that the current approach has evolved over time due to reports like these, and tested on code bases that I have. |
I looked into your test code and I don't think there is anything we can do to support all of these cases, because some of these ways of using pandas are ambiguous, and I don't think there is anything we can do in the stubs to support all the cases. Note: mypy doesn't support slices of non-integer slices. See python/mypy#2410 Here is an analysis of the failures I get and what the resolution is:
def bool_mask(df: pd.DataFrame) -> pd.Series[bool]: # better
return df["shield"] == 8
assert_type(df.loc[bool_mask(df)], pd.DataFrame) then the stubs correctly type check that line. df = pd.DataFrame([[1,2], [3,4]], index=["cobra", "viper"], columns=["mark i", "mark ii"]) then the expression assert_type(df.loc[("cobra", "mark i"):"viper"], pd.DataFrame)
assert_type(df.loc[("cobra", "mark i"):("viper", "mark ii")], pd.DataFrame) Based on this analysis, I'm going to close this issue. With static typing, we can't support pandas expressions that are ambiguous. We also can't do anything about mypy bugs! |
Using a tuple as the key for
DataFrame.loc
returnsScalarType
.I think the issue is these lines, that seem to just flatout make wrong assumptions in case when the DataFrame is equipped with a
MultiIndex
:pandas-stubs/pandas-stubs/core/frame.pyi
Lines 159 to 170 in 68780c7
Please complete the following information:
pandas-stubs
1.5.2.221124The text was updated successfully, but these errors were encountered: