-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
How should I store frames with multiindex columns in CSV? #21976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried to experiment more with JSON, but found the only way to produce a valid JSON output is to specify
So far I don't know any way to save a dataframe with multiindexed columns to either CSV or JSON and load it back properly... |
I suspect you'll have trouble with this in most storage formats, since hierarchical columns are somewhat unique to pandas. You may be best of manually flattening your columns before and after IO. |
The best be here may be Here's a workaround to maintain those via transposition. You need a non-numeric index to account for the fact that numeric column labels are not valid in the JSON Table schema: >>> frame.index = list('abcd')
>>> pd.read_json(frame.T.to_json(orient="table"), orient="table").T
AAPL MSFT
OPEN CLOSE OPEN CLOSE
a 1 1 1 1
b 2 2 2 2
c 3 3 3 3
d 4 4 4 4 |
IIRC JSON Table Schema requires string column labels, so hierarchical
columns are probably out of scope for orient='table'.
…On Thu, Jul 19, 2018 at 10:21 AM William Ayd ***@***.***> wrote:
The best be here *may* be orient="table" though I'm not sure what the
JSON Table Schema specification says about hierarchical columns (indices
are fine). Any investigation or PRs there are certainly welcome
Here's a workaround to maintain those via transposition. You need a
non-numeric index to account for the fact that numeric column numbers are
not valid in the JSON Table schema:
>>> frame.index = list('abcd')>>> pd.read_json(frame.T.to_json(orient="table"), orient="table").T
AAPL MSFT
OPEN CLOSE OPEN CLOSE
a 1 1 1 1
b 2 2 2 2
c 3 3 3 3
d 4 4 4 4
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21976 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIssvUIDVN1FULaNUytXbUXsk-HELks5uIKPlgaJpZM4VWM9O>
.
|
from io import StringIO
buf = StringIO()
frame.to_csv(buf)
buf.seek(0)
In [109]: pd.read_csv(buf, header=[0,1], index_col=0)
Out[109]:
AAPL MSFT
CLOSE OPEN CLOSE OPEN
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4 |
Could we use some trickery like using a \xa0 to denote what “columns |
Thank you, CSV version works fine, while JSON version still requires frame index to be a string list and crashes if timestamps or integers are used as index. Additionally, it's looking like even frames with multi-level row indexes are stored in JSON a bit incorrectly if
|
Can you open a separate bug for the JSON As a side note on
|
Uh oh!
There was an error while loading. Please reload this page.
Hello.
I tried to save a dataframe with MultiIndex used as columns to a CSV file and load it back, but I had no luck.
As you see, both frames don't have multiindexed columns as original one. So, how should I save a DataFrame with multiindexed columns to CSV file and load it back to get a frame same to the original one?
I also tried to save as JSON, but also encountered problems. Here is what the frame shown above is converted to.
So, tupleized multiindexed column names are obviously incorrectly quoted.
With best regards,
Alex.
INSTALLED VERSIONS
commit: None
python: 3.4.2.final.0
python-bits: 32
OS: Linux
OS-release: 3.16.0-6-686-pae
machine: i686
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0.dev0+318.g272bbdc
pytest: 3.6.3
pip: 1.5.6
setuptools: 5.5.1
Cython: 0.28.4
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: