Closed
Description
Modin version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest released version of Modin.
-
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
import pandas
import modin.pandas as pd
import modin.config as cfg
cfg.MinPartitionSize.put(2)
data = {"name": ["abc", "def", "ghi", "jkl"]}
pandas_df = pandas.DataFrame(data)
pandas_df = pandas_df.astype("category")
modin_df = pd.DataFrame(data)
modin_df = modin_df.astype("category")
print(pandas_df["name"].cat.codes)
# 0 0
# 1 1
# 2 2
# 3 3
# dtype: int8
print(modin_df["name"].cat.codes)
# AttributeError: Can only use .cat accessor with a 'category' dtype
Issue Description
We use pandas.concat to concatenate dfs and apply a function to them along a full axis. If dfs have columns of "category" dtype, then after dfs are concatenated the resultant dtype will not be "category". See details in pandas-dev/pandas#51362. That's why we see the error when calling .cat.
Expected Behavior
Should match pandas.
Error Logs
Replace this line with the error backtrace (if applicable).
Installed Versions
Replace this line with the output of pd.show_versions()