Skip to content

Dataframe.sum() returns an error with MultiIndex columns and skipna=False #37622

Closed
@bertrandmarc

Description

@bertrandmarc

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

iterables = [['bar', 'baz'], ['one', 'two'], ['dog', 'cat']]

df = pd.DataFrame(np.random.randn(2, 8), columns=pd.MultiIndex.from_product(iterables, names=['first', 'second', 'third']))

df.sum(axis=1, level=['first', 'third'], skipna=False)

>>> df
first        bar                                     baz                              
second       one                 two                 one                 two          
third        dog       cat       dog       cat       dog       cat       dog       cat
0      -1.093149  1.087846 -0.067161  1.129021 -1.024021  1.519433 -0.349039  1.215196
1       0.622675 -1.479139 -0.057143 -0.377272 -1.061766  1.606416 -0.172099 -0.564745

Problem description

Dataframe.sum returns an error with MultiIndex columns, several levels and skipna=False. However it appears to work fine with the transposed version of the dataframe (and axis=0), or with a single level, or with skipna=True.

>>> df.sum(axis=1, level=['first', 'third'], skipna=False)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 368, in _get_axis_number
    return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11419, in stat_func
    return self._agg_by_level(
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10258, in _agg_by_level
    return grouped.aggregate(applyf)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/generic.py", line 959, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1083, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 641, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/ops.py", line 701, in _aggregate_series_pure_python
    res = func(group, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/groupby/groupby.py", line 1060, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 10257, in <lambda>
    applyf = lambda x: method(x, axis=axis, skipna=skipna, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 11422, in stat_func
    return self._reduce(
  File "/usr/lib/python3/dist-packages/pandas/core/series.py", line 4223, in _reduce
    self._get_axis_number(axis)
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 370, in _get_axis_number
    raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named 1 for object type Series

Expected Output

I would expect a Dataframe with the sum of the columns. Please note that it works fine with MultiIndex index (i.e with the transposed Dataframe and axis=0)

>>> df.transpose().sum(level=['first', 'third'], skipna=False)
                    0         1
first third                    
bar   dog   -1.160310  0.565531
      cat    2.216866 -1.856411
baz   dog   -1.373060 -1.233865
      cat    2.734629  1.041672

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.0-1-amd64
Version : #1 SMP Debian 5.9.1-1 (2020-10-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.1.3
numpy : 1.19.3
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 50.3.0
Cython : None
pytest : 4.6.11
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.2.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions