-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fixed Inconsistent GroupBy Output Shape with Duplicate Column Labels #29124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
fd53827
Added any_all test
WillAyd 6f60cd0
Added complete test for output shape of elements
WillAyd 0aa1813
Fixed shape issues with bool aggs
WillAyd 4af22f6
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 9756e74
Added test for transformation shape
WillAyd b675963
Changed transform output
WillAyd 444d542
Updated test with required args
WillAyd 98a9901
Fixed cummin issue
WillAyd c8648b1
Fixed ohlc one-off handling
WillAyd 1626de1
Fixed output shape of nunique
WillAyd 2a6b8d7
Fixed tshift
WillAyd 12d1ca0
lint and black
WillAyd 5a3fcd7
Added whatsnew
WillAyd dee597a
Quantile special case hack
WillAyd a2f1b64
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd fdb36f6
Test fix for np dev
WillAyd 8975009
Used position as labels
WillAyd 9adde1b
Code cleanup
WillAyd 63b35f9
docstrings
WillAyd 9eb7c73
style and more typing
WillAyd 0e49bdb
Fixed parallel test collection
WillAyd 2ad7632
numpy dev warning fix
WillAyd 11fda39
More generic ohlc handling
WillAyd caf8f11
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 7c4bad9
Converted Dict -> Mapping
WillAyd b9dca96
doc fixups
WillAyd a878e67
numpy dev compat
WillAyd 037f9af
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd dd3b1dc
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd a66d37f
Renamed index -> idx
WillAyd 6d50448
Removed rogue TODO
WillAyd 4dd8f5b
Added failing test for multiindex
WillAyd 9d39862
MultiIndex support
WillAyd d6b197b
More exact typing
WillAyd 3cfd1a2
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 16e9512
jbrockmendel feedback
WillAyd d297684
Removed failing type
WillAyd a234bed
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd e3959b0
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 391d106
Aligned annotations and comments
WillAyd c30ca82
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 3a78051
mypy fix
WillAyd f4f9e61
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 936591e
asserts and mypy fixes
WillAyd 5dd131e
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd 6cc1607
Fix issue in merge resolution
WillAyd d5ce753
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd ce97eff
Correct merge with 29629
WillAyd d1a92b4
Fixed issue with dict assignment
WillAyd 23eb803
modernized annotations
WillAyd 4aa9f4c
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd faa08c9
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd b07335b
Changed to assert, removed collections
WillAyd fb71185
Removed breakpoint and whatsnew space
WillAyd d7b84a2
Fixed issue with DTA
WillAyd 7934422
typing fixup
WillAyd c8f0b19
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd acf22d3
packed key into namedtuple
WillAyd a0aae64
mypy fixes
WillAyd a9b411a
docstring cleanup
WillAyd 51b8050
Merge remote-tracking branch 'upstream/master' into grpby-dup-cols
WillAyd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,17 @@ | |
from functools import partial | ||
from textwrap import dedent | ||
import typing | ||
from typing import Any, Callable, FrozenSet, Iterable, Sequence, Type, Union, cast | ||
from typing import ( | ||
Any, | ||
Callable, | ||
FrozenSet, | ||
Iterable, | ||
Mapping, | ||
Sequence, | ||
Type, | ||
Union, | ||
cast, | ||
) | ||
|
||
import numpy as np | ||
|
||
|
@@ -309,28 +319,91 @@ def _aggregate_multiple_funcs(self, arg): | |
|
||
return DataFrame(results, columns=columns) | ||
|
||
def _wrap_series_output(self, output, index, names=None): | ||
""" common agg/transform wrapping logic """ | ||
output = output[self._selection_name] | ||
def _wrap_series_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]], index: Index, | ||
) -> Union[Series, DataFrame]: | ||
""" | ||
Wraps the output of a SeriesGroupBy operation into the expected result. | ||
|
||
Parameters | ||
---------- | ||
output : Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
Data to wrap. | ||
index : pd.Index | ||
Index to apply to the output. | ||
|
||
if names is not None: | ||
return DataFrame(output, index=index, columns=names) | ||
Returns | ||
------- | ||
Series or DataFrame | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Notes | ||
----- | ||
In the vast majority of cases output and columns will only contain one | ||
element. The exception is operations that expand dimensions, like ohlc. | ||
""" | ||
indexed_output = {key.position: val for key, val in output.items()} | ||
columns = Index(key.label for key in output) | ||
|
||
result: Union[Series, DataFrame] | ||
if len(output) > 1: | ||
result = DataFrame(indexed_output, index=index) | ||
result.columns = columns | ||
else: | ||
name = self._selection_name | ||
if name is None: | ||
name = self._selected_obj.name | ||
return Series(output, index=index, name=name) | ||
result = Series(indexed_output[0], index=index, name=columns[0]) | ||
|
||
return result | ||
|
||
def _wrap_aggregated_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
) -> Union[Series, DataFrame]: | ||
""" | ||
Wraps the output of a SeriesGroupBy aggregation into the expected result. | ||
|
||
def _wrap_aggregated_output(self, output, names=None): | ||
Parameters | ||
---------- | ||
output : Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
Data to wrap. | ||
|
||
Returns | ||
------- | ||
Series or DataFrame | ||
|
||
Notes | ||
----- | ||
In the vast majority of cases output will only contain one element. | ||
The exception is operations that expand dimensions, like ohlc. | ||
""" | ||
result = self._wrap_series_output( | ||
output=output, index=self.grouper.result_index, names=names | ||
output=output, index=self.grouper.result_index | ||
) | ||
return self._reindex_output(result)._convert(datetime=True) | ||
|
||
def _wrap_transformed_output(self, output, names=None): | ||
return self._wrap_series_output( | ||
output=output, index=self.obj.index, names=names | ||
) | ||
def _wrap_transformed_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
) -> Series: | ||
""" | ||
Wraps the output of a SeriesGroupBy aggregation into the expected result. | ||
|
||
Parameters | ||
---------- | ||
output : dict[base.OutputKey, Union[Series, np.ndarray]] | ||
Dict with a sole key of 0 and a value of the result values. | ||
|
||
Returns | ||
------- | ||
Series | ||
|
||
Notes | ||
----- | ||
output should always contain one element. It is specified as a dict | ||
for consistency with DataFrame methods and _wrap_aggregated_output. | ||
""" | ||
assert len(output) == 1 | ||
result = self._wrap_series_output(output=output, index=self.obj.index) | ||
|
||
# No transformations increase the ndim of the result | ||
assert isinstance(result, Series) | ||
return result | ||
|
||
def _wrap_applied_output(self, keys, values, not_indexed_same=False): | ||
if len(keys) == 0: | ||
|
@@ -1082,17 +1155,6 @@ def _aggregate_item_by_item(self, func, *args, **kwargs) -> DataFrame: | |
|
||
return DataFrame(result, columns=result_columns) | ||
|
||
def _decide_output_index(self, output, labels): | ||
if len(output) == len(labels): | ||
output_keys = labels | ||
else: | ||
output_keys = sorted(output) | ||
|
||
if isinstance(labels, MultiIndex): | ||
output_keys = MultiIndex.from_tuples(output_keys, names=labels.names) | ||
|
||
return output_keys | ||
|
||
def _wrap_applied_output(self, keys, values, not_indexed_same=False): | ||
if len(keys) == 0: | ||
return DataFrame(index=keys) | ||
|
@@ -1559,27 +1621,62 @@ def _insert_inaxis_grouper_inplace(self, result): | |
if in_axis: | ||
result.insert(0, name, lev) | ||
|
||
def _wrap_aggregated_output(self, output, names=None): | ||
agg_axis = 0 if self.axis == 1 else 1 | ||
agg_labels = self._obj_with_exclusions._get_axis(agg_axis) | ||
def _wrap_aggregated_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
) -> DataFrame: | ||
""" | ||
Wraps the output of DataFrameGroupBy aggregations into the expected result. | ||
|
||
output_keys = self._decide_output_index(output, agg_labels) | ||
Parameters | ||
---------- | ||
output : Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
Data to wrap. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
""" | ||
indexed_output = {key.position: val for key, val in output.items()} | ||
columns = Index(key.label for key in output) | ||
|
||
result = DataFrame(indexed_output) | ||
result.columns = columns | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if not self.as_index: | ||
result = DataFrame(output, columns=output_keys) | ||
self._insert_inaxis_grouper_inplace(result) | ||
result = result._consolidate() | ||
else: | ||
index = self.grouper.result_index | ||
result = DataFrame(output, index=index, columns=output_keys) | ||
result.index = index | ||
|
||
if self.axis == 1: | ||
result = result.T | ||
|
||
return self._reindex_output(result)._convert(datetime=True) | ||
|
||
def _wrap_transformed_output(self, output, names=None) -> DataFrame: | ||
return DataFrame(output, index=self.obj.index) | ||
def _wrap_transformed_output( | ||
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
) -> DataFrame: | ||
""" | ||
Wraps the output of DataFrameGroupBy transformations into the expected result. | ||
|
||
Parameters | ||
---------- | ||
output : Mapping[base.OutputKey, Union[Series, np.ndarray]] | ||
Data to wrap. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
""" | ||
indexed_output = {key.position: val for key, val in output.items()} | ||
columns = Index(key.label for key in output) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above, I think you need to sort the columns (index is ok ) |
||
result = DataFrame(indexed_output) | ||
result.columns = columns | ||
result.index = self.obj.index | ||
|
||
return result | ||
|
||
def _wrap_agged_blocks(self, items, blocks): | ||
if not self.as_index: | ||
|
@@ -1699,9 +1796,11 @@ def groupby_series(obj, col=None): | |
if isinstance(obj, Series): | ||
results = groupby_series(obj) | ||
else: | ||
# TODO: this is duplicative of how GroupBy naturally works | ||
# Try to consolidate with normal wrapping functions | ||
from pandas.core.reshape.concat import concat | ||
|
||
results = [groupby_series(obj[col], col) for col in obj.columns] | ||
results = [groupby_series(content, label) for label, content in obj.items()] | ||
results = concat(results, axis=1) | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
results.columns.names = obj.columns.names | ||
|
||
|
@@ -1743,7 +1842,7 @@ def _normalize_keyword_aggregation(kwargs): | |
""" | ||
Normalize user-provided "named aggregation" kwargs. | ||
|
||
Transforms from the new ``Dict[str, NamedAgg]`` style kwargs | ||
Transforms from the new ``Mapping[str, NamedAgg]`` style kwargs | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
to the old OrderedDict[str, List[scalar]]]. | ||
|
||
Parameters | ||
|
@@ -1764,7 +1863,7 @@ def _normalize_keyword_aggregation(kwargs): | |
>>> _normalize_keyword_aggregation({'output': ('input', 'sum')}) | ||
(OrderedDict([('input', ['sum'])]), ('output',), [('input', 'sum')]) | ||
""" | ||
# Normalize the aggregation functions as Dict[column, List[func]], | ||
# Normalize the aggregation functions as Mapping[column, List[func]], | ||
# process normally, then fixup the names. | ||
# TODO(Py35): When we drop python 3.5, change this to | ||
# defaultdict(list) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.