Skip to content

Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work #29928

Open
@larrymouse

Description

@larrymouse

Code Sample, a copy-pastable example if possible

import pandas

#Scenario1
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # this runs but creates NaN indexes

#Scenario2
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)], names=['ind1']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
try:
    new_df==pandas.read_json(js, orient='table')  # this fails with KeyError: 'None of [None] are in the columns'
except KeyError as e:
    pass

#Scenario3
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,1),(1,2),(2,1),(2,2)], names=['ind1', 'ind2']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table')  # this works as is is multi level multiindex

#Scenario4
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
df.index=df.index.to_flat_index() # a workaround?
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # This is a workaround that does produce sensible results

Problem description

I have a DataFrame that has a MultiIndex, even though that index only has one level and could easily be a normal index, I am receiving this table from ipyaggrid, so am not in control of that.

Saving this table to json with df.to_json(orient='table) and then loading from pandas.read_json(js, orient='table) leads to either an exception or the index being changed to NaN values.

Neither of these outcomes seems desirable or correct.

I have a workaround to intercept DataFrames with this feature and convert their index so a flat Index, but it would be good if the table serialisation just worked.

I was drawn to orient='Table' by the comments on this similar issue, my main requirement here is to save/load the DataFrame preserving its row and column order. You don't get this with the standard to_json/from_json modes
#4889

Expected Output

new_df for scenario1 and scenario2 to should have a MultiIndex, in the first scenario with a level that has no name and in the second scenario with level called 'ind1'

Output of pd.show_versions()

pandas.show_versions()
INSTALLED VERSIONS

commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 10.0.1
setuptools : 39.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions