Skip to content

Commit 5ee3a4f

Browse files
committed
Merge pull request #11079 from cpcloud/fix-series-nunique-groupby-with-object
BUG: Fix Series nunique groupby with object dtype
2 parents 9be2180 + f9e6c3d commit 5ee3a4f

File tree

3 files changed

+28
-2
lines changed

3 files changed

+28
-2
lines changed

doc/source/whatsnew/v0.17.0.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1014,7 +1014,7 @@ Performance Improvements
10141014
- Development support for benchmarking with the `Air Speed Velocity library <https://github.com/spacetelescope/asv/>`_ (:issue:`8316`)
10151015
- Added vbench benchmarks for alternative ExcelWriter engines and reading Excel files (:issue:`7171`)
10161016
- Performance improvements in ``Categorical.value_counts`` (:issue:`10804`)
1017-
- Performance improvements in ``SeriesGroupBy.nunique`` and ``SeriesGroupBy.value_counts`` (:issue:`10820`)
1017+
- Performance improvements in ``SeriesGroupBy.nunique`` and ``SeriesGroupBy.value_counts`` (:issue:`10820`, :issue:`11077`)
10181018
- Performance improvements in ``DataFrame.drop_duplicates`` with integer dtypes (:issue:`10917`)
10191019
- 4x improvement in ``timedelta`` string parsing (:issue:`6755`, :issue:`10426`)
10201020
- 8x improvement in ``timedelta64`` and ``datetime64`` ops (:issue:`6755`)

pandas/core/groupby.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2565,7 +2565,17 @@ def nunique(self, dropna=True):
25652565
ids, _, _ = self.grouper.group_info
25662566
val = self.obj.get_values()
25672567

2568-
sorter = np.lexsort((val, ids))
2568+
try:
2569+
sorter = np.lexsort((val, ids))
2570+
except TypeError: # catches object dtypes
2571+
assert val.dtype == object, \
2572+
'val.dtype must be object, got %s' % val.dtype
2573+
val, _ = algos.factorize(val, sort=False)
2574+
sorter = np.lexsort((val, ids))
2575+
isnull = lambda a: a == -1
2576+
else:
2577+
isnull = com.isnull
2578+
25692579
ids, val = ids[sorter], val[sorter]
25702580

25712581
# group boundries are where group ids change

pandas/tests/test_groupby.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5511,6 +5511,22 @@ def test_sort(x):
55115511

55125512
g.apply(test_sort)
55135513

5514+
def test_nunique_with_object(self):
5515+
# GH 11077
5516+
data = pd.DataFrame(
5517+
[[100, 1, 'Alice'],
5518+
[200, 2, 'Bob'],
5519+
[300, 3, 'Charlie'],
5520+
[-400, 4, 'Dan'],
5521+
[500, 5, 'Edith']],
5522+
columns=['amount', 'id', 'name']
5523+
)
5524+
5525+
result = data.groupby(['id', 'amount'])['name'].nunique()
5526+
index = MultiIndex.from_arrays([data.id, data.amount])
5527+
expected = pd.Series([1] * 5, name='name', index=index)
5528+
tm.assert_series_equal(result, expected)
5529+
55145530

55155531
def assert_fp_equal(a, b):
55165532
assert (np.abs(a - b) < 1e-12).all()

0 commit comments

Comments
 (0)