Skip to content

BUG: groupby.apply respects as_index=False if and only if group_keys=True #57656

Open
@mvashishtha

Description

@mvashishtha

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({'A': [7, -1, 4, 5], 'B': [10, 4, 2, 8]}, index= pd.Index(['i3', 'i2', 'i1', 'i0'], name='i0'))

################################
# For transforms, like lambda x: x
################################

# when group_keys=True, apply() respects as_index=False. same is true when grouping by 'i0' or by ['i0', 'A']
print(df.groupby('A', as_index=True, group_keys=True).apply(lambda x: x, include_groups=False))
print(df.groupby('A', as_index=False, group_keys=True).apply(lambda x: x, include_groups=False))

# when group_keys=False, apply() does not respect as_index=False. same is true when grouping by 'i0' or by ['i0', 'A']
print(df.groupby('A', as_index=True, group_keys=False).apply(lambda x: x, include_groups=False))
print(df.groupby('A', as_index=False, group_keys=False).apply(lambda x: x, include_groups=False))

################################
# For non-transform lambda x: pd.DataFrame([x.iloc[0].sum()])
################################

# when group_keys=True, grouping by data column respects as_index=False.  same is true when grouping by 'i0' or by ['i0', 'A']
print(df.groupby('A', as_index=True, group_keys=True).apply(lambda x: pd.DataFrame([x.iloc[0].sum()]), include_groups=False))
print(df.groupby('A', as_index=False, group_keys=True).apply(lambda x: pd.DataFrame([x.iloc[0].sum()]), include_groups=False))

# when group_keys=False, grouping by data column does not respect as_index=False.  same is true when grouping by 'i0' or by ['i0', 'A']
print(df.groupby('A', as_index=True, group_keys=False).apply(lambda x: pd.DataFrame([x.iloc[0].sum()]), include_groups=False))
print(df.groupby('A', as_index=False, group_keys=False).apply(lambda x: pd.DataFrame([x.iloc[0].sum()]), include_groups=False))

Issue Description

groupby.apply respects as_index=False if and only if group_keys=True, but the documentation suggests that it should only respect as_index if group_keys=False.

My apologies in advance if I'm duplicating an issue or misunderstanding the intended behavior here. I know there has been some relevant discussion in #49543.

Expected Behavior

I don't know what the correct behavior is here. A simple and easily explainable behavior would be to always respect as_index=False. However, to be consistent with the documentation here, transform-like applies should never respect as_index=False, and I suppose that non-transform-like applies should respect it:

Since transformations do not include the groupings that are used to split the result, the arguments as_index and sort in DataFrame.groupby() and Series.groupby() have no effect.

When group_keys=True, the result does include the "groupings that are used to split the result", so for the same reason that this note gives, as_index should have no effect. The current behavior is the opposite, though: as_index has an effect only when group_keys=True. (despite the description of group_keys, it appears that apply includes the group keys in the index if and only if group_keys=False, regardless of whether func is a transform.)

Installed Versions

INSTALLED VERSIONS
------------------
commit                : bdc79c146c2e32f2cab629be240f01658cfb6cc2
python                : 3.9.18.final.0
python-bits           : 64
OS                    : Darwin
OS-release            : 23.3.0
Version               : Darwin Kernel Version 23.3.0: Wed Dec 20 21:31:00 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6020
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.1
numpy                 : 1.26.3
pytz                  : 2023.3.post1
dateutil              : 2.8.2
setuptools            : 68.2.2
pip                   : 23.3.1
Cython                : None
pytest                : None
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : None
IPython               : 8.18.1
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2023.4
qtpy                  : None
pyqt5                 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions