Description
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
iterables = [['bar', 'baz'], ['one', 'two'], ['dog', 'cat']]
df = pd.DataFrame(np.random.randn(2, 8), columns=pd.MultiIndex.from_product(iterables, names=['first', 'second', 'third']))
df.sum(axis=1, level=['first', 'third'], skipna=False)
>>> df
first bar baz
second one two one two
third dog cat dog cat dog cat dog cat
0 -1.093149 1.087846 -0.067161 1.129021 -1.024021 1.519433 -0.349039 1.215196
1 0.622675 -1.479139 -0.057143 -0.377272 -1.061766 1.606416 -0.172099 -0.564745
Problem description
Dataframe.sum returns an error with MultiIndex columns, several levels and skipna=False. However it appears to work fine with the transposed version of the dataframe (and axis=0), or with a single level, or with skipna=True.
>>> df.sum(axis=1, level=['first', 'third'], skipna=False)
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 368, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11419, in stat_func
return self._agg_by_level(
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10258, in _agg_by_level
return grouped.aggregate(applyf)
File "/usr/lib/python3/dist-packages/pandas/core/groupby/generic.py", line 959, in aggregate
return self._python_agg_general(func, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1083, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 641, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 701, in _aggregate_series_pure_python
res = func(group, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1060, in <lambda>
f = lambda x: func(x, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10257, in <lambda>
applyf = lambda x: method(x, axis=axis, skipna=skipna, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11422, in stat_func
return self._reduce(
File "/usr/lib/python3/dist-packages/pandas/core/series.py", line 4223, in _reduce
self._get_axis_number(axis)
File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 370, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series
Expected Output
I would expect a Dataframe with the sum of the columns. Please note that it works fine with MultiIndex index (i.e with the transposed Dataframe and axis=0)
>>> df.transpose().sum(level=['first', 'third'], skipna=False)
0 1
first third
bar dog -1.160310 0.565531
cat 2.216866 -1.856411
baz dog -1.373060 -1.233865
cat 2.734629 1.041672
Output of pd.show_versions()
INSTALLED VERSIONS
commit : db08276
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.0-1-amd64
Version : #1 SMP Debian 5.9.1-1 (2020-10-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8
pandas : 1.1.3
numpy : 1.19.3
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 50.3.0
Cython : None
pytest : 4.6.11
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.2.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2