Skip to content

rolling_exp (nee ewm) #2650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Jun 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
327bc54
WIP on ewm using numbagg
max-sixty Dec 28, 2018
80019b6
Merge branch 'master' into ewm
max-sixty Dec 28, 2018
1a3ae0b
basic functionality, no dims working yet
max-sixty Jan 4, 2019
af1477b
Merge branch 'master' into ewm
max-sixty Jan 4, 2019
6b6b755
Merge branch 'master' into ewm
max-sixty Jan 7, 2019
5b0b6ff
Merge branch 'master' into ewm
max-sixty Jan 12, 2019
d119905
rename to `rolling_exp`
max-sixty Jan 12, 2019
1ef07f0
ensure works on either dimensions
max-sixty Jan 12, 2019
b30a096
window_type working
max-sixty Jan 13, 2019
89541b5
add numbagg to travis install
max-sixty Jan 13, 2019
9fe91d2
naming
max-sixty Jan 13, 2019
c71c949
Merge remote-tracking branch 'upstream/master' into ewm
max-sixty Jan 14, 2019
072b7f8
formatting
max-sixty Jan 14, 2019
2e4dab2
Merge branch 'master' into ewm
max-sixty Jan 15, 2019
bf7ac74
@shoyer's function to abstract the type of self.obj
max-sixty Jan 15, 2019
061ef87
initial docstring
max-sixty Jan 15, 2019
fe3c9e1
add docstrings to docs
max-sixty Jan 15, 2019
e45629e
example
max-sixty Jan 15, 2019
a9809f4
correct location for docs
max-sixty Jan 15, 2019
c6b939b
add numbagg to print_versions
max-sixty Jan 15, 2019
9cdfcb5
Merge branch 'master' into ewm
max-sixty Jan 16, 2019
eb38403
whatsnew
max-sixty Jan 16, 2019
86b4112
updating my GH username
max-sixty Jan 16, 2019
8c88f4f
Merge branch 'master' into ewm
max-sixty Jan 18, 2019
818c342
Merge branch 'master' into ewm
max-sixty Jan 23, 2019
d65cbf2
Merge branch 'master' into ewm
max-sixty Jan 23, 2019
6df2578
Merge branch 'master' into ewm
max-sixty Jan 24, 2019
c3481d4
Merge branch 'master' into ewm
max-sixty Jan 29, 2019
b28fe10
Merge branch 'master' into ewm
max-sixty Jan 30, 2019
42ccd0c
pin to numbagg release
max-sixty Jan 30, 2019
61da1d7
rename inner func to move_exp_nanmean
max-sixty Jan 30, 2019
3996f90
Merge branch 'master' into ewm
max-sixty Jan 30, 2019
55ad124
merge
max-sixty Jan 30, 2019
b6228d3
typo
max-sixty Jan 30, 2019
9cdd140
Merge branch 'master' into ewm
max-sixty Jun 10, 2019
e856162
comments from PR
max-sixty Jun 10, 2019
102a473
window -> alpha in numbagg
max-sixty Jun 10, 2019
aac0ce9
add docs
max-sixty Jun 10, 2019
d313821
Merge branch 'master' into ewm
max-sixty Jun 11, 2019
b6eebb4
doc fix
max-sixty Jun 11, 2019
7e5f9d8
whatsnew update
max-sixty Jun 11, 2019
f64a24f
revert formatting changes to unchanged file
max-sixty Jun 11, 2019
800f096
Merge branch 'master' into ewm
max-sixty Jun 12, 2019
3dd86fd
Merge branch 'master' into ewm
max-sixty Jun 20, 2019
431a10a
Merge branch 'master' into ewm
max-sixty Jun 20, 2019
ac299c2
update docstrings, adjust kwarg names
max-sixty Jun 21, 2019
8b95388
mypy
max-sixty Jun 21, 2019
f3fc3f7
flake
max-sixty Jun 21, 2019
efdbd1f
pytest config tiny tweak while I'm here
max-sixty Jun 21, 2019
7b094ba
Rolling exp doc updates
shoyer Jun 23, 2019
dd2a791
remove _attributes from RollingExp class
max-sixty Jun 24, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/requirements-py37.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@ dependencies:
- pydap
- pip:
- mypy==0.650
- numbagg
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ Computation
Dataset.groupby
Dataset.groupby_bins
Dataset.rolling
Dataset.rolling_exp
Dataset.coarsen
Dataset.resample
Dataset.diff
Expand Down Expand Up @@ -315,6 +316,7 @@ Computation
DataArray.groupby
DataArray.groupby_bins
DataArray.rolling
DataArray.rolling_exp
DataArray.coarsen
DataArray.dt
DataArray.resample
Expand Down Expand Up @@ -535,6 +537,7 @@ Rolling objects
core.rolling.DatasetRolling
core.rolling.DatasetRolling.construct
core.rolling.DatasetRolling.reduce
core.rolling_exp.RollingExp

Resample objects
================
Expand Down
16 changes: 16 additions & 0 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,22 @@ We can also manually iterate through ``Rolling`` objects:
for label, arr_window in r:
# arr_window is a view of x

.. _comput.rolling_exp:

While ``rolling`` provides a simple moving average, ``DataArray`` also supports
an exponential moving average with :py:meth:`~xarray.DataArray.rolling_exp`.
This is similiar to pandas' ``ewm`` method. numbagg_ is required.

.. _numbagg: https://github.com/shoyer/numbagg

.. code:: python

arr.rolling_exp(y=3).mean()

The ``rolling_exp`` method takes a ``window_type`` kwarg, which can be ``'alpha'``,
``'com'`` (for ``center-of-mass``), ``'span'``, and ``'halflife'``. The default is
``span``.

Finally, the rolling object has a ``construct`` method which returns a
view of the original ``DataArray`` with the windowed dimension in
the last position.
Expand Down
2 changes: 2 additions & 0 deletions doc/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ For accelerating xarray
- `bottleneck <https://github.com/kwgoodman/bottleneck>`__: speeds up
NaN-skipping and rolling window aggregations by a large factor
(1.1 or later)
- `numbagg <https://github.com/shoyer/numbagg>`_: for exponential rolling
window operations

For parallel computing
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
20 changes: 13 additions & 7 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ Enhancements
- Add ``fill_value`` argument for reindex, align, and merge operations
to enable custom fill values. (:issue:`2876`)
By `Zach Griffith <https://github.com/zdgriffith>`_.
- :py:meth:`~xarray.DataArray.rolling_exp` and
:py:meth:`~xarray.Dataset.rolling_exp` added, similar to pandas'
``pd.DataFrame.ewm`` method. Calling ``.mean`` on the resulting object
will return an exponentially weighted moving average.
By `Maximilian Roos <https://github.com/max-sixty>`_.
- Character arrays' character dimension name decoding and encoding handled by
``var.encoding['char_dim_name']`` (:issue:`2895`)
By `James McCreight <https://github.com/jmccreight>`_.
Expand Down Expand Up @@ -188,6 +193,7 @@ Other enhancements
- Upsampling an array via interpolation with resample is now dask-compatible,
as long as the array is not chunked along the resampling dimension.
By `Spencer Clark <https://github.com/spencerkclark>`_.

- :py:func:`xarray.testing.assert_equal` and
:py:func:`xarray.testing.assert_identical` now provide a more detailed
report showing what exactly differs between the two objects (dimensions /
Expand Down Expand Up @@ -737,20 +743,20 @@ Enhancements
arguments in ``data_vars`` to indexes set explicitly in ``coords``,
where previously an error would be raised.
(:issue:`674`)
By `Maximilian Roos <https://github.com/maxim-lian>`_.
By `Maximilian Roos <https://github.com/max-sixty>`_.

- :py:meth:`~DataArray.sel`, :py:meth:`~DataArray.isel` & :py:meth:`~DataArray.reindex`,
(and their :py:class:`Dataset` counterparts) now support supplying a ``dict``
as a first argument, as an alternative to the existing approach
of supplying `kwargs`. This allows for more robust behavior
of dimension names which conflict with other keyword names, or are
not strings.
By `Maximilian Roos <https://github.com/maxim-lian>`_.
By `Maximilian Roos <https://github.com/max-sixty>`_.

- :py:meth:`~DataArray.rename` now supports supplying ``**kwargs``, as an
alternative to the existing approach of supplying a ``dict`` as the
first argument.
By `Maximilian Roos <https://github.com/maxim-lian>`_.
By `Maximilian Roos <https://github.com/max-sixty>`_.

- :py:meth:`~DataArray.cumsum` and :py:meth:`~DataArray.cumprod` now support
aggregation over multiple dimensions at the same time. This is the default
Expand Down Expand Up @@ -915,7 +921,7 @@ Enhancements
which test each value in the array for whether it is contained in the
supplied list, returning a bool array. See :ref:`selecting values with isin`
for full details. Similar to the ``np.isin`` function.
By `Maximilian Roos <https://github.com/maxim-lian>`_.
By `Maximilian Roos <https://github.com/max-sixty>`_.
- Some speed improvement to construct :py:class:`~xarray.DataArrayRolling`
object (:issue:`1993`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
Expand Down Expand Up @@ -2110,7 +2116,7 @@ Enhancements
~~~~~~~~~~~~

- New documentation on :ref:`panel transition`. By
`Maximilian Roos <https://github.com/maximilianr>`_.
`Maximilian Roos <https://github.com/max-sixty>`_.
- New ``Dataset`` and ``DataArray`` methods :py:meth:`~xarray.Dataset.to_dict`
and :py:meth:`~xarray.Dataset.from_dict` to allow easy conversion between
dictionaries and xarray objects (:issue:`432`). See
Expand All @@ -2131,9 +2137,9 @@ Bug fixes
(:issue:`953`). By `Stephan Hoyer <https://github.com/shoyer>`_.
- ``Dataset.__dir__()`` (i.e. the method python calls to get autocomplete
options) failed if one of the dataset's keys was not a string (:issue:`852`).
By `Maximilian Roos <https://github.com/maximilianr>`_.
By `Maximilian Roos <https://github.com/max-sixty>`_.
- ``Dataset`` constructor can now take arbitrary objects as values
(:issue:`647`). By `Maximilian Roos <https://github.com/maximilianr>`_.
(:issue:`647`). By `Maximilian Roos <https://github.com/max-sixty>`_.
- Clarified ``copy`` argument for :py:meth:`~xarray.DataArray.reindex` and
:py:func:`~xarray.align`, which now consistently always return new xarray
objects (:issue:`927`).
Expand Down
11 changes: 6 additions & 5 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ filterwarnings =
ignore:Using a non-tuple sequence for multidimensional indexing is deprecated:FutureWarning
env =
UVCDAT_ANONYMOUS_LOG=no
markers =
flaky: flaky tests
network: tests requiring a network connection
slow: slow tests

# This should be kept in sync with .pep8speaks.yml
[flake8]
max-line-length=79
ignore=
Expand All @@ -23,10 +26,6 @@ ignore=
F401
exclude=
doc
markers =
flaky: flaky tests
network: tests requiring a network connection
slow: slow tests

[isort]
default_section=THIRDPARTY
Expand Down Expand Up @@ -62,6 +61,8 @@ ignore_missing_imports = True
ignore_missing_imports = True
[mypy-nc_time_axis.*]
ignore_missing_imports = True
[mypy-numbagg.*]
ignore_missing_imports = True
[mypy-numpy.*]
ignore_missing_imports = True
[mypy-netCDF4.*]
Expand Down
55 changes: 49 additions & 6 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .arithmetic import SupportsArithmetic
from .options import _get_keep_attrs
from .pycompat import dask_array_type
from .rolling_exp import RollingExp
from .utils import Frozen, ReprObject, SortedKeysDict, either_dict_or_kwargs

# Used as a sentinel value to indicate a all dimensions
Expand Down Expand Up @@ -86,6 +87,7 @@ def wrapped_func(self, dim=None, **kwargs): # type: ignore
class AbstractArray(ImplementsArrayReduce):
"""Shared base class for DataArray and Variable.
"""

def __bool__(self: Any) -> bool:
return bool(self.values)

Expand Down Expand Up @@ -249,6 +251,8 @@ def get_squeeze_dims(xarray_obj,
class DataWithCoords(SupportsArithmetic, AttrAccessMixin):
"""Shared base class for Dataset and DataArray."""

_rolling_exp_cls = RollingExp
Copy link
Collaborator Author

@max-sixty max-sixty Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this class attribute any longer, but leaving for the sake of conformity until we have consensus


def squeeze(self, dim: Union[Hashable, Iterable[Hashable], None] = None,
drop: bool = False,
axis: Union[int, Iterable[int], None] = None):
Expand Down Expand Up @@ -553,7 +557,7 @@ def groupby_bins(self, group, bins, right: bool = True, labels=None,

def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
min_periods: Optional[int] = None, center: bool = False,
**dim_kwargs: int):
**window_kwargs: int):
"""
Rolling window object.

Expand All @@ -568,9 +572,9 @@ def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
setting min_periods equal to the size of the window.
center : boolean, default False
Set the labels at the center of the window.
**dim_kwargs : optional
**window_kwargs : optional
The keyword arguments form of ``dim``.
One of dim or dim_kwargs must be provided.
One of dim or window_kwargs must be provided.

Returns
-------
Expand Down Expand Up @@ -609,15 +613,54 @@ def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
core.rolling.DataArrayRolling
core.rolling.DatasetRolling
""" # noqa
dim = either_dict_or_kwargs(dim, dim_kwargs, 'rolling')
dim = either_dict_or_kwargs(dim, window_kwargs, 'rolling')
return self._rolling_cls(self, dim, min_periods=min_periods,
center=center)

def rolling_exp(
self,
window: Optional[Mapping[Hashable, int]] = None,
window_type: str = 'span',
**window_kwargs
):
"""
Exponentially-weighted moving window.
Similar to EWM in pandas

Requires the optional Numbagg dependency.

Parameters
----------
window : A single mapping from a dimension name to window value,
optional
dim : str
Name of the dimension to create the rolling exponential window
along (e.g., `time`).
window : int
Size of the moving window. The type of this is specified in
`window_type`
window_type : str, one of ['span', 'com', 'halflife', 'alpha'],
default 'span'
The format of the previously supplied window. Each is a simple
numerical transformation of the others. Described in detail:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html
**window_kwargs : optional
The keyword arguments form of ``window``.
One of window or window_kwargs must be provided.

See Also
--------
core.rolling_exp.RollingExp
"""
window = either_dict_or_kwargs(window, window_kwargs, 'rolling_exp')

return self._rolling_exp_cls(self, window, window_type)

def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
boundary: str = 'exact',
side: Union[str, Mapping[Hashable, str]] = 'left',
coord_func: str = 'mean',
**dim_kwargs: int):
**window_kwargs: int):
"""
Coarsen object.

Expand Down Expand Up @@ -671,7 +714,7 @@ def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
core.rolling.DataArrayCoarsen
core.rolling.DatasetCoarsen
"""
dim = either_dict_or_kwargs(dim, dim_kwargs, 'coarsen')
dim = either_dict_or_kwargs(dim, window_kwargs, 'coarsen')
return self._coarsen_cls(
self, dim, boundary=boundary, side=side,
coord_func=coord_func)
Expand Down
106 changes: 106 additions & 0 deletions xarray/core/rolling_exp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import numpy as np

from .pycompat import dask_array_type


def _get_alpha(com=None, span=None, halflife=None, alpha=None):
# pandas defines in terms of com (converting to alpha in the algo)
# so use its function to get a com and then convert to alpha

com = _get_center_of_mass(com, span, halflife, alpha)
return 1 / (1 + com)


def move_exp_nanmean(array, *, axis, alpha):
if isinstance(array, dask_array_type):
raise TypeError("rolling_exp is not currently support for dask arrays")
import numbagg
if axis == ():
return array.astype(np.float64)
else:
return numbagg.move_exp_nanmean(
array, axis=axis, alpha=alpha)


def _get_center_of_mass(comass, span, halflife, alpha):
"""
Vendored from pandas.core.window._get_center_of_mass

See licenses/PANDAS_LICENSE for the function's license
"""
from pandas.core import common as com
valid_count = com.count_not_none(comass, span, halflife, alpha)
if valid_count > 1:
raise ValueError("comass, span, halflife, and alpha "
"are mutually exclusive")

# Convert to center of mass; domain checks ensure 0 < alpha <= 1
if comass is not None:
if comass < 0:
raise ValueError("comass must satisfy: comass >= 0")
elif span is not None:
if span < 1:
raise ValueError("span must satisfy: span >= 1")
comass = (span - 1) / 2.
elif halflife is not None:
if halflife <= 0:
raise ValueError("halflife must satisfy: halflife > 0")
decay = 1 - np.exp(np.log(0.5) / halflife)
comass = 1 / decay - 1
elif alpha is not None:
if alpha <= 0 or alpha > 1:
raise ValueError("alpha must satisfy: 0 < alpha <= 1")
comass = (1.0 - alpha) / alpha
else:
raise ValueError("Must pass one of comass, span, halflife, or alpha")

return float(comass)


class RollingExp:
"""
Exponentially-weighted moving window object.
Similar to EWM in pandas

Parameters
----------
obj : Dataset or DataArray
Object to window.
windows : A single mapping from a single dimension name to window value
dim : str
Name of the dimension to create the rolling exponential window
along (e.g., `time`).
window : int
Size of the moving window. The type of this is specified in
`window_type`
window_type : str, one of ['span', 'com', 'halflife', 'alpha'], default 'span'
The format of the previously supplied window. Each is a simple
numerical transformation of the others. Described in detail:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html

Returns
-------
RollingExp : type of input argument
""" # noqa

def __init__(self, obj, windows, window_type='span'):
self.obj = obj
dim, window = next(iter(windows.items()))
self.dim = dim
self.alpha = _get_alpha(**{window_type: window})

def mean(self):
"""
Exponentially weighted moving average

Examples
--------
>>> da = xr.DataArray([1,1,2,2,2], dims='x')
>>> da.rolling_exp(x=2, window_type='span').mean()
<xarray.DataArray (x: 5)>
array([1. , 1. , 1.692308, 1.9 , 1.966942])
Dimensions without coordinates: x
"""

return self.obj.reduce(
move_exp_nanmean, dim=self.dim, alpha=self.alpha)
1 change: 1 addition & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def LooseVersion(vstring):
has_np113, requires_np113 = _importorskip('numpy', minversion='1.13.0')
has_iris, requires_iris = _importorskip('iris')
has_cfgrib, requires_cfgrib = _importorskip('cfgrib')
has_numbagg, requires_numbagg = _importorskip('numbagg')

# some special cases
has_h5netcdf07, requires_h5netcdf07 = _importorskip('h5netcdf',
Expand Down
Loading