Skip to content

"Reverse" groupby method for split/apply/combine #830

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hottwaj opened this issue Apr 18, 2016 · 5 comments
Closed

"Reverse" groupby method for split/apply/combine #830

hottwaj opened this issue Apr 18, 2016 · 5 comments

Comments

@hottwaj
Copy link

hottwaj commented Apr 18, 2016

When dealing with high-dimensional data, algorithms often involve operations or aggregation on a particular dimension only, whilst keeping all other dimensions in the dataset.

For example, I might know that I want to average all data along the time axis, and I'm indifferent to the other dimensions present, i.e. I want my algorithm to work whenever there is a time axis, and to be indifferent to the presence/lack of any other dimensions.

Mapping this kind of implementation to xarray is awkward though because I can only use groupby() for the split/apply/combine operation.

For example, in xarray I have to do this:

averages = dataarray.groupby([dimensions excluding time dimension]).apply(my_method_that_works_on_time_dimension)

instead of this (where aggregate_over() is my "reverse" groupby method):

averages = dataarray.aggregate_over([time_dimension]).apply(my_method_that_works_on_time_dimension)

For the first example I have to do some extra work: I have to write additional code to fetch all the dimensions in the array, remove the time dimension from that list, and then use that list with groupby, in order to make my code depend on the time dimension only.

It would be really helpful to add a aggregate_over() method (name TBD of course!) as an alternative to groupby() that automates this extra work.

hottwaj added a commit to hottwaj/xarray that referenced this issue Apr 18, 2016
@hottwaj
Copy link
Author

hottwaj commented Apr 18, 2016

Note that this new function cannot support passing of coordinates.

In fact I feel that the current groupby() implementation should not accept coordinates either - that should be up to the user to do in a separate step using .sel() or equivalent methods.

@hottwaj
Copy link
Author

hottwaj commented Apr 18, 2016

Wooah, I'm so sorry, I didn't realise that groupby() cannot be applied to multiple dimensions yet!

So none of this works. Please ignore and I'll revisit when #818 is resolved

@shoyer
Copy link
Member

shoyer commented Apr 18, 2016

I agree, this would be great! See #324 for more discussion -- I proposed calling this group_over but it's essentially the same idea.

It should be relatively straightforward to implement once finish #818, but it will also require support for multiple groupby arguments, beyond just support for multi-dimensional arguments.

@stale
Copy link

stale bot commented Oct 4, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Oct 4, 2020
@dcherian
Copy link
Contributor

dcherian commented Oct 4, 2020

Closing as dupe of #324 (group_over)

@dcherian dcherian closed this as completed Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants