Skip to content

Group by using time grouper and 1 other column does not work #14929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
namank opened this issue Dec 20, 2016 · 1 comment
Closed

Group by using time grouper and 1 other column does not work #14929

namank opened this issue Dec 20, 2016 · 1 comment

Comments

@namank
Copy link

namank commented Dec 20, 2016

Code Sample, a copy-pastable example if possible

list_of_dicts = [{
    'time': pandas.to_datetime('2016-12-03 18:00:00'),
    'building': 'tall',
    'type': 'steel'
}, {
    'time': pandas.to_datetime('2016-12-03 18:00:00'),
    'building': 'tall',
    'type': 'brick'
}]

df = pandas.DataFrame(list_of_dicts)

  building                time   type
0     tall 2016-12-03 18:00:00  steel
1     tall 2016-12-03 18:00:00  brick
    
    
df = df.groupby(['building', pandas.Grouper(key = 'time', freq = '1D')])

[(<pandas.tseries.resample.TimeGrouper at 0x2b1a5978aed0>,
    building                time   type
1     tall 2016-12-03 18:00:00  brick),
 ('building',
    building                time   type
0     tall 2016-12-03 18:00:00  steel)]

Problem description

Apparently this was fixed in #3794, but I am still seeing this issue. However, this goes away if I add 1 more grouper column to it, which is strange.

df.groupby(['building', pandas.Grouper(key = 'time', freq = '1D'), 'type'])

[(('tall', Timestamp('2016-12-03 00:00:00', offset='D'), 'brick'),
    building                time   type
1     tall 2016-12-03 18:00:00  brick),
 (('tall', Timestamp('2016-12-03 00:00:00', offset='D'), 'steel'),
    building                time   type
0     tall 2016-12-03 18:00:00  steel)]

Expected Output

If I give in the explicit list, it works correctly.

df.groupby([df['building'], pandas.Grouper(key = 'time', freq = '1D')])

[(('tall', Timestamp('2016-12-03 00:00:00', offset='D')),
    building                time   type
0     tall 2016-12-03 18:00:00  steel
1     tall 2016-12-03 18:00:00  brick)]

This should be the expected output by just using 'building' or even pandas.Grouper(key='building').

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 2.7.8.final.0 python-bits: 64 OS: Linux OS-release: 2.6.18-164.el5 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.15.2
nose: None
Cython: None
numpy: 1.9.0.dev-Unknown
scipy: 0.15.1
statsmodels: None
IPython: 0.13.2
sphinx: 1.3.1
patsy: None
dateutil: 2.4.2
pytz: 2013d
bottleneck: 0.8.0
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Dec 20, 2016

you are using a pretty old version. IIRC there were some addtiional fixes. try upgrading.

@jreback jreback closed this as completed Dec 20, 2016
@jreback jreback added this to the No action milestone Dec 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants