Description
Code Sample, a copy-pastable example if possible
should run as standalone
# Your code here
import pandas as pd
from enum import Enum, IntEnum, auto
import argparse
# Your code here
class ConnectionRoles(Enum):
Client = auto()
Server = auto()
csv_filename = "test.csv"
dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)
df = pd.DataFrame({ "tcpdest": [ConnectionRoles.Server] }, dtype=dtype_role)
print(df.info())
print(df)
df.to_csv(csv_filename)
loaded = pd.read_csv(csv_filename, dtype= {"tcpdest": dtype_role})
print(loaded.info())
print(loaded)
which outputs
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
tcpdest 1 non-null category
dtypes: category(1)
memory usage: 177.0 bytes
None
tcpdest
0 ConnectionRoles.Server
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
Unnamed: 0 1 non-null int64
tcpdest 0 non-null category
dtypes: category(1), int64(1)
memory usage: 185.0 bytes
None
Unnamed: 0 tcpdest
0 0 NaN
The value ConnectionRoles.Server
became nan through the serialization/deserialization process:
Problem description
I want to be able to serialize (to_csv) then read (read_csv) a CategoricalDType that takes its values from a python Enum (or IntEnum).
Actually the dtype I use in my project (contrary to the toy example) is:
dtype_role = pd.api.types.CategoricalDtype(categories=list(ConnectionRoles), ordered=True)
class ConnectionRoles(Enum):
"""
Used to filter datasets and keep packets flowing in only one direction !
Parser should accept --destination Client --destination Server if you want both.
"""
Client = auto()
Server = auto()
def __str__(self):
# Note that defining __str__ is required to get ArgumentParser's help output to include
# the human readable (values) of Color
return self.name
@staticmethod
def from_string(s):
try:
return ConnectionRoles[s]
except KeyError:
raise ValueError()
def __next__(self):
if self.value == 0:
return ConnectionRoles.Server
else:
return ConnectionRoles.Client
I've search the tracker and the most relevant ones (but yet different) might be:
- Force boolean column to category while reading a csv #20498
- my past issue object of type 'CategoricalDtype' has no len() #22262
Expected Output
Output of pd.show_versions()
I am using v0.23.4 with a patch from master to fix some bug.
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.0
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
pandas: 0+unknown
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: None
numpy: 1.16.0
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.12
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 4.2.6
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None