Skip to content

BUG: AttributeError: Can only use .cat accessor with a 'category' dtype #5650

Closed
@YarShev

Description

@YarShev

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import pandas
import modin.pandas as pd
import modin.config as cfg

cfg.MinPartitionSize.put(2)

data = {"name": ["abc", "def", "ghi", "jkl"]}
pandas_df = pandas.DataFrame(data)
pandas_df = pandas_df.astype("category")
modin_df = pd.DataFrame(data)
modin_df = modin_df.astype("category")
print(pandas_df["name"].cat.codes)
# 0    0
# 1    1
# 2    2
# 3    3
# dtype: int8
print(modin_df["name"].cat.codes)
# AttributeError: Can only use .cat accessor with a 'category' dtype

Issue Description

We use pandas.concat to concatenate dfs and apply a function to them along a full axis. If dfs have columns of "category" dtype, then after dfs are concatenated the resultant dtype will not be "category". See details in pandas-dev/pandas#51362. That's why we see the error when calling .cat.

Expected Behavior

Should match pandas.

Error Logs

Replace this line with the error backtrace (if applicable).

Installed Versions

Replace this line with the output of pd.show_versions()

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions