Skip to content

Serialize/deserialize a Categorical whose values are taken from an enum #25448

Open
@teto

Description

@teto

Code Sample, a copy-pastable example if possible

should run as standalone

# Your code here
import pandas as pd
from enum import Enum, IntEnum, auto
import argparse

# Your code here
class ConnectionRoles(Enum):
    Client = auto()
    Server = auto()

csv_filename = "test.csv"

dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)


df  = pd.DataFrame({ "tcpdest": [ConnectionRoles.Server] }, dtype=dtype_role)
print(df.info())
print(df)
df.to_csv(csv_filename)

loaded = pd.read_csv(csv_filename, dtype= {"tcpdest": dtype_role})
print(loaded.info())
print(loaded)

which outputs

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
tcpdest    1 non-null category
dtypes: category(1)
memory usage: 177.0 bytes
None
                  tcpdest
0  ConnectionRoles.Server
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
Unnamed: 0    1 non-null int64
tcpdest       0 non-null category
dtypes: category(1), int64(1)
memory usage: 185.0 bytes
None
   Unnamed: 0 tcpdest
0           0     NaN

The value ConnectionRoles.Server became nan through the serialization/deserialization process:

Problem description

I want to be able to serialize (to_csv) then read (read_csv) a CategoricalDType that takes its values from a python Enum (or IntEnum).

Actually the dtype I use in my project (contrary to the toy example) is:

dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)


class ConnectionRoles(Enum):
    """
    Used to filter datasets and keep packets flowing in only one direction !
    Parser should accept --destination Client --destination Server if you want both.
    """
    Client = auto()
    Server = auto()

    def __str__(self):
        # Note that defining __str__ is required to get ArgumentParser's help output to include
        # the human readable (values) of Color
        return self.name

    @staticmethod
    def from_string(s):
        try:
            return ConnectionRoles[s]
        except KeyError:
            raise ValueError()

    def __next__(self):
        if self.value == 0:
            return ConnectionRoles.Server
        else:
            return ConnectionRoles.Client

I've search the tracker and the most relevant ones (but yet different) might be:

Expected Output

Output of pd.show_versions()

I am using v0.23.4 with a patch from master to fix some bug.

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.0
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0+unknown
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 4.2.6
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions