Skip to content

Commit 89502fc

Browse files
committed
Merge pull request #6913 from sinhrks/pivotg
ENH: pivot_table can now accept Grouper
2 parents 0d2966f + 5fa6a39 commit 89502fc

File tree

5 files changed

+169
-13
lines changed

5 files changed

+169
-13
lines changed

doc/source/release.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,7 @@ Improvements to existing features
286286
:func:`read_csv`/:func:`read_table` if no other C-unsupported options
287287
specified (:issue:`6607`)
288288
- ``read_excel`` can now read milliseconds in Excel dates and times with xlrd >= 0.9.3. (:issue:`5945`)
289+
- ``pivot_table`` can now accept ``Grouper`` by ``index`` and ``columns`` keywords (:issue:`6913`)
289290

290291
.. _release.bug_fixes-0.14.0:
291292

doc/source/reshaping.rst

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -264,19 +264,24 @@ It takes a number of arguments
264264

265265
- ``data``: A DataFrame object
266266
- ``values``: a column or a list of columns to aggregate
267-
- ``rows``: list of columns to group by on the table rows
268-
- ``cols``: list of columns to group by on the table columns
267+
- ``index``: a column, Grouper, array which has the same length as data, or list of them.
268+
Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
269+
- ``columns``: a column, Grouper, array which has the same length as data, or list of them.
270+
Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
269271
- ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean``
270272

271273
Consider a data set like this:
272274

273275
.. ipython:: python
274276
277+
import datetime
275278
df = DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
276279
'B' : ['A', 'B', 'C'] * 8,
277280
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
278281
'D' : np.random.randn(24),
279-
'E' : np.random.randn(24)})
282+
'E' : np.random.randn(24),
283+
'F' : [datetime.datetime(2013, i, 1) for i in range(1, 13)] +
284+
[datetime.datetime(2013, i, 15) for i in range(1, 13)]})
280285
df
281286
282287
We can produce pivot tables from this data very easily:
@@ -296,6 +301,12 @@ hierarchy in the columns:
296301
297302
pivot_table(df, index=['A', 'B'], columns=['C'])
298303
304+
Also, you can use ``Grouper`` for ``index`` and ``columns`` keywords. For detail of ``Grouper``, see :ref:`Grouping with a Grouper specification <groupby.specify>`.
305+
306+
.. ipython:: python
307+
308+
pivot_table(df, values='D', index=Grouper(freq='M', key='F'), columns='C')
309+
299310
You can render a nice output of the table omitting the missing values by
300311
calling ``to_string`` if you wish:
301312

doc/source/v0.14.0.txt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -484,6 +484,26 @@ Enhancements
484484
- ``CustomBuisnessMonthBegin`` and ``CustomBusinessMonthEnd`` are now available (:issue:`6866`)
485485
- :meth:`Series.quantile` and :meth:`DataFrame.quantile` now accept an array of
486486
quantiles.
487+
- ``pivot_table`` can now accept ``Grouper`` by ``index`` and ``columns`` keywords (:issue:`6913`)
488+
489+
.. ipython:: python
490+
491+
import datetime
492+
df = DataFrame({
493+
'Branch' : 'A A A A A B'.split(),
494+
'Buyer': 'Carl Mark Carl Carl Joe Joe'.split(),
495+
'Quantity': [1, 3, 5, 1, 8, 1],
496+
'Date' : [datetime.datetime(2013,11,1,13,0), datetime.datetime(2013,9,1,13,5),
497+
datetime.datetime(2013,10,1,20,0), datetime.datetime(2013,10,2,10,0),
498+
datetime.datetime(2013,11,1,20,0), datetime.datetime(2013,10,2,10,0)],
499+
'PayDay' : [datetime.datetime(2013,10,4,0,0), datetime.datetime(2013,10,15,13,5),
500+
datetime.datetime(2013,9,5,20,0), datetime.datetime(2013,11,2,10,0),
501+
datetime.datetime(2013,10,7,20,0), datetime.datetime(2013,9,5,10,0)]})
502+
df
503+
504+
pivot_table(df, index=Grouper(freq='M', key='Date'),
505+
columns=Grouper(freq='M', key='PayDay'),
506+
values='Quantity', aggfunc=np.sum)
487507

488508
Performance
489509
~~~~~~~~~~~

pandas/tools/pivot.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from pandas import Series, DataFrame
66
from pandas.core.index import MultiIndex
7+
from pandas.core.groupby import Grouper
78
from pandas.tools.merge import concat
89
from pandas.tools.util import cartesian_product
910
from pandas.compat import range, lrange, zip
@@ -25,10 +26,12 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
2526
----------
2627
data : DataFrame
2728
values : column to aggregate, optional
28-
index : list of column names or arrays to group on
29-
Keys to group on the x-axis of the pivot table
30-
columns : list of column names or arrays to group on
31-
Keys to group on the y-axis of the pivot table
29+
index : a column, Grouper, array which has the same length as data, or list of them.
30+
Keys to group by on the pivot table index.
31+
If an array is passed, it is being used as the same manner as column values.
32+
columns : a column, Grouper, array which has the same length as data, or list of them.
33+
Keys to group by on the pivot table column.
34+
If an array is passed, it is being used as the same manner as column values.
3235
aggfunc : function, default numpy.mean, or list of functions
3336
If list of functions passed, the resulting pivot table will have
3437
hierarchical columns whose top level are the function names (inferred
@@ -98,6 +101,8 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
98101
if values_passed:
99102
to_filter = []
100103
for x in keys + values:
104+
if isinstance(x, Grouper):
105+
x = x.key
101106
try:
102107
if x in data:
103108
to_filter.append(x)
@@ -297,7 +302,7 @@ def _all_key():
297302
def _convert_by(by):
298303
if by is None:
299304
by = []
300-
elif (np.isscalar(by) or isinstance(by, (np.ndarray, Series))
305+
elif (np.isscalar(by) or isinstance(by, (np.ndarray, Series, Grouper))
301306
or hasattr(by, '__call__')):
302307
by = [by]
303308
else:

pandas/tools/tests/test_pivot.py

Lines changed: 124 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
import datetime
2-
import unittest
3-
import warnings
42

53
import numpy as np
64
from numpy.testing import assert_equal
75

86
import pandas
9-
from pandas import DataFrame, Series, Index, MultiIndex
7+
from pandas import DataFrame, Series, Index, MultiIndex, Grouper
108
from pandas.tools.merge import concat
119
from pandas.tools.pivot import pivot_table, crosstab
1210
from pandas.compat import range, u, product
@@ -288,8 +286,7 @@ def test_pivot_columns_lexsorted(self):
288286
iproduct = np.random.randint(0, len(products), n)
289287
items['Index'] = products['Index'][iproduct]
290288
items['Symbol'] = products['Symbol'][iproduct]
291-
dr = pandas.date_range(datetime.date(2000, 1, 1),
292-
datetime.date(2010, 12, 31))
289+
dr = pandas.date_range(datetime.date(2000, 1, 1), datetime.date(2010, 12, 31))
293290
dates = dr[np.random.randint(0, len(dr), n)]
294291
items['Year'] = dates.year
295292
items['Month'] = dates.month
@@ -333,6 +330,128 @@ def test_margins_no_values_two_row_two_cols(self):
333330
result = self.data[['A', 'B', 'C', 'D']].pivot_table(index=['A', 'B'], columns=['C', 'D'], aggfunc=len, margins=True)
334331
self.assertEqual(result.All.tolist(), [3.0, 1.0, 4.0, 3.0, 11.0])
335332

333+
def test_pivot_timegrouper(self):
334+
df = DataFrame({
335+
'Branch' : 'A A A A A A A B'.split(),
336+
'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
337+
'Quantity': [1, 3, 5, 1, 8, 1, 9, 3],
338+
'Date' : [datetime.datetime(2013, 1, 1), datetime.datetime(2013, 1, 1),
339+
datetime.datetime(2013, 10, 1), datetime.datetime(2013, 10, 2),
340+
datetime.datetime(2013, 10, 1), datetime.datetime(2013, 10, 2),
341+
datetime.datetime(2013, 12, 2), datetime.datetime(2013, 12, 2),]}).set_index('Date')
342+
343+
expected = DataFrame(np.array([10, 18, 3]).reshape(1, 3),
344+
index=[datetime.datetime(2013, 12, 31)],
345+
columns='Carl Joe Mark'.split())
346+
expected.index.name = 'Date'
347+
expected.columns.name = 'Buyer'
348+
349+
result = pivot_table(df, index=Grouper(freq='A'), columns='Buyer',
350+
values='Quantity', aggfunc=np.sum)
351+
tm.assert_frame_equal(result,expected)
352+
353+
result = pivot_table(df, index='Buyer', columns=Grouper(freq='A'),
354+
values='Quantity', aggfunc=np.sum)
355+
tm.assert_frame_equal(result,expected.T)
356+
357+
expected = DataFrame(np.array([1, np.nan, 3, 9, 18, np.nan]).reshape(2, 3),
358+
index=[datetime.datetime(2013, 1, 1), datetime.datetime(2013, 7, 1)],
359+
columns='Carl Joe Mark'.split())
360+
expected.index.name = 'Date'
361+
expected.columns.name = 'Buyer'
362+
363+
result = pivot_table(df, index=Grouper(freq='6MS'), columns='Buyer',
364+
values='Quantity', aggfunc=np.sum)
365+
tm.assert_frame_equal(result, expected)
366+
367+
result = pivot_table(df, index='Buyer', columns=Grouper(freq='6MS'),
368+
values='Quantity', aggfunc=np.sum)
369+
tm.assert_frame_equal(result, expected.T)
370+
371+
# passing the name
372+
df = df.reset_index()
373+
result = pivot_table(df, index=Grouper(freq='6MS', key='Date'), columns='Buyer',
374+
values='Quantity', aggfunc=np.sum)
375+
tm.assert_frame_equal(result, expected)
376+
377+
result = pivot_table(df, index='Buyer', columns=Grouper(freq='6MS', key='Date'),
378+
values='Quantity', aggfunc=np.sum)
379+
tm.assert_frame_equal(result, expected.T)
380+
381+
self.assertRaises(KeyError, lambda : pivot_table(df, index=Grouper(freq='6MS', key='foo'),
382+
columns='Buyer', values='Quantity', aggfunc=np.sum))
383+
self.assertRaises(KeyError, lambda : pivot_table(df, index='Buyer',
384+
columns=Grouper(freq='6MS', key='foo'), values='Quantity', aggfunc=np.sum))
385+
386+
# passing the level
387+
df = df.set_index('Date')
388+
result = pivot_table(df, index=Grouper(freq='6MS', level='Date'), columns='Buyer',
389+
values='Quantity', aggfunc=np.sum)
390+
tm.assert_frame_equal(result, expected)
391+
392+
result = pivot_table(df, index='Buyer', columns=Grouper(freq='6MS', level='Date'),
393+
values='Quantity', aggfunc=np.sum)
394+
tm.assert_frame_equal(result, expected.T)
395+
396+
self.assertRaises(ValueError, lambda : pivot_table(df, index=Grouper(freq='6MS', level='foo'),
397+
columns='Buyer', values='Quantity', aggfunc=np.sum))
398+
self.assertRaises(ValueError, lambda : pivot_table(df, index='Buyer',
399+
columns=Grouper(freq='6MS', level='foo'), values='Quantity', aggfunc=np.sum))
400+
401+
# double grouper
402+
df = DataFrame({
403+
'Branch' : 'A A A A A A A B'.split(),
404+
'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(),
405+
'Quantity': [1,3,5,1,8,1,9,3],
406+
'Date' : [datetime.datetime(2013,11,1,13,0), datetime.datetime(2013,9,1,13,5),
407+
datetime.datetime(2013,10,1,20,0), datetime.datetime(2013,10,2,10,0),
408+
datetime.datetime(2013,11,1,20,0), datetime.datetime(2013,10,2,10,0),
409+
datetime.datetime(2013,10,2,12,0), datetime.datetime(2013,12,5,14,0)],
410+
'PayDay' : [datetime.datetime(2013,10,4,0,0), datetime.datetime(2013,10,15,13,5),
411+
datetime.datetime(2013,9,5,20,0), datetime.datetime(2013,11,2,10,0),
412+
datetime.datetime(2013,10,7,20,0), datetime.datetime(2013,9,5,10,0),
413+
datetime.datetime(2013,12,30,12,0), datetime.datetime(2013,11,20,14,0),]})
414+
415+
result = pivot_table(df, index=Grouper(freq='M', key='Date'),
416+
columns=Grouper(freq='M', key='PayDay'),
417+
values='Quantity', aggfunc=np.sum)
418+
expected = DataFrame(np.array([np.nan, 3, np.nan, np.nan, 6, np.nan, 1, 9,
419+
np.nan, 9, np.nan, np.nan, np.nan, np.nan, 3, np.nan]).reshape(4, 4),
420+
index=[datetime.datetime(2013, 9, 30), datetime.datetime(2013, 10, 31),
421+
datetime.datetime(2013, 11, 30), datetime.datetime(2013, 12, 31)],
422+
columns=[datetime.datetime(2013, 9, 30), datetime.datetime(2013, 10, 31),
423+
datetime.datetime(2013, 11, 30), datetime.datetime(2013, 12, 31)])
424+
expected.index.name = 'Date'
425+
expected.columns.name = 'PayDay'
426+
427+
tm.assert_frame_equal(result, expected)
428+
429+
result = pivot_table(df, index=Grouper(freq='M', key='PayDay'),
430+
columns=Grouper(freq='M', key='Date'),
431+
values='Quantity', aggfunc=np.sum)
432+
tm.assert_frame_equal(result, expected.T)
433+
434+
tuples = [(datetime.datetime(2013, 9, 30), datetime.datetime(2013, 10, 31)),
435+
(datetime.datetime(2013, 10, 31), datetime.datetime(2013, 9, 30)),
436+
(datetime.datetime(2013, 10, 31), datetime.datetime(2013, 11, 30)),
437+
(datetime.datetime(2013, 10, 31), datetime.datetime(2013, 12, 31)),
438+
(datetime.datetime(2013, 11, 30), datetime.datetime(2013, 10, 31)),
439+
(datetime.datetime(2013, 12, 31), datetime.datetime(2013, 11, 30)),]
440+
idx = MultiIndex.from_tuples(tuples, names=['Date', 'PayDay'])
441+
expected = DataFrame(np.array([3, np.nan, 6, np.nan, 1, np.nan,
442+
9, np.nan, 9, np.nan, np.nan, 3]).reshape(6, 2),
443+
index=idx, columns=['A', 'B'])
444+
expected.columns.name = 'Branch'
445+
446+
result = pivot_table(df, index=[Grouper(freq='M', key='Date'),
447+
Grouper(freq='M', key='PayDay')], columns=['Branch'],
448+
values='Quantity', aggfunc=np.sum)
449+
tm.assert_frame_equal(result, expected)
450+
451+
result = pivot_table(df, index=['Branch'], columns=[Grouper(freq='M', key='Date'),
452+
Grouper(freq='M', key='PayDay')],
453+
values='Quantity', aggfunc=np.sum)
454+
tm.assert_frame_equal(result, expected.T)
336455

337456
class TestCrosstab(tm.TestCase):
338457

0 commit comments

Comments
 (0)