Skip to content

WIP: Feature/interpolate #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Dec 30, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1582c1f
initial interpolate commit
Oct 17, 2017
ab727e7
fix to interpolate wrapper function
Nov 11, 2017
95006c4
remove duplicate limit handling in ffill/bfill
Nov 11, 2017
4a4f6eb
tests are passing
Nov 12, 2017
42d63ef
more docs, more tests
Nov 13, 2017
263ec98
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Nov 13, 2017
19d21b8
backward compat and add benchmarks
Nov 13, 2017
f937c07
skip tests for numpy versions before 1.12
Nov 13, 2017
8717e38
test fixes for py27 fixture
Nov 13, 2017
3d5c1b1
try reording decorators
Nov 13, 2017
1864e8f
minor reorg of travis to make the flake8 check useful
Nov 16, 2017
f58d464
cleanup following @fujiisoup's comments
Nov 18, 2017
1b93808
dataset missing methods, some more docs, and more scipy interpolators
Nov 27, 2017
6f83b7b
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 6, 2017
33df6af
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 10, 2017
eafe67a
Merge remote-tracking branch 'upstream/master' into feature/interpolate
Dec 16, 2017
dd9fa8c
workaround for parameterized tests that are skipped in missing.py module
Dec 16, 2017
88d1569
requires_np112 for dataset interpolate test
Dec 18, 2017
3fb9261
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 21, 2017
37882b7
remove req for np 112
Dec 21, 2017
a04e83e
fix typo in docs
Dec 21, 2017
48505a5
@requires_np112 for methods that use apply_ufunc in missing.py
Dec 21, 2017
20f957d
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 21, 2017
282bb65
reuse type in apply over vars with dim
Dec 21, 2017
a6fcb7f
rework the fill value convention for linear interpolation, no longer …
Dec 22, 2017
2b0d9e1
flake8
Dec 30, 2017
d3220f3
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 30, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,9 +93,9 @@ install:
- python xarray/util/print_versions.py

script:
- git diff upstream/master xarray/**/*py | flake8 --diff --exit-zero || true
- python -OO -c "import xarray"
- py.test xarray --cov=xarray --cov-config ci/.coveragerc --cov-report term-missing --verbose $EXTRA_FLAGS
- git diff upstream/master **/*py | flake8 --diff --exit-zero || true

after_success:
- coveralls
73 changes: 73 additions & 0 deletions asv_bench/benchmarks/dataarray_missing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import pandas as pd

try:
import dask
except ImportError:
pass

import xarray as xr

from . import randn, requires_dask


def make_bench_data(shape, frac_nan, chunks):
vals = randn(shape, frac_nan)
coords = {'time': pd.date_range('2000-01-01', freq='D',
periods=shape[0])}
da = xr.DataArray(vals, dims=('time', 'x', 'y'), coords=coords)

if chunks is not None:
da = da.chunk(chunks)

return da


def time_interpolate_na(shape, chunks, method, limit):
if chunks is not None:
requires_dask()
da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.interpolate_na(dim='time', method='linear', limit=limit)

if chunks is not None:
actual = actual.compute()


time_interpolate_na.param_names = ['shape', 'chunks', 'method', 'limit']
time_interpolate_na.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
['linear', 'spline', 'quadratic', 'cubic'],
[None, 3])


def time_ffill(shape, chunks, limit):

da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.ffill(dim='time', limit=limit)

if chunks is not None:
actual = actual.compute()


time_ffill.param_names = ['shape', 'chunks', 'limit']
time_ffill.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
[None, 3])


def time_bfill(shape, chunks, limit):

da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.bfill(dim='time', limit=limit)

if chunks is not None:
actual = actual.compute()


time_bfill.param_names = ['shape', 'chunks', 'limit']
time_bfill.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
[None, 3])
6 changes: 6 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ Computation
:py:attr:`~Dataset.count`
:py:attr:`~Dataset.dropna`
:py:attr:`~Dataset.fillna`
:py:attr:`~Dataset.ffill`
:py:attr:`~Dataset.bfill`
:py:attr:`~Dataset.interpolate_na`
:py:attr:`~Dataset.where`

**ndarray methods**:
Expand Down Expand Up @@ -299,6 +302,9 @@ Computation
:py:attr:`~DataArray.count`
:py:attr:`~DataArray.dropna`
:py:attr:`~DataArray.fillna`
:py:attr:`~DataArray.ffill`
:py:attr:`~DataArray.bfill`
:py:attr:`~DataArray.interpolate_na`
:py:attr:`~DataArray.where`

**ndarray methods**:
Expand Down
20 changes: 18 additions & 2 deletions doc/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,9 @@ Missing values

xarray objects borrow the :py:meth:`~xarray.DataArray.isnull`,
:py:meth:`~xarray.DataArray.notnull`, :py:meth:`~xarray.DataArray.count`,
:py:meth:`~xarray.DataArray.dropna` and :py:meth:`~xarray.DataArray.fillna` methods
for working with missing data from pandas:
:py:meth:`~xarray.DataArray.dropna`, :py:meth:`~xarray.DataArray.fillna`,
:py:meth:`~xarray.DataArray.ffill`, and :py:meth:`~xarray.DataArray.bfill`
methods for working with missing data from pandas:

.. ipython:: python

Expand All @@ -70,10 +71,25 @@ for working with missing data from pandas:
x.count()
x.dropna(dim='x')
x.fillna(-1)
x.ffill()
x.bfill()

Like pandas, xarray uses the float value ``np.nan`` (not-a-number) to represent
missing values.

xarray objects also have an :py:meth:`~xarray.DataArray.interpolate_na` method
for filling missing values via 1D interpolation.

.. ipython:: python

x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
coords={'xx': xr.Variable('x', [0, 1, 1.1, 1.9, 3])})
x.interpolate_na(dim='x', method='linear', use_coordinate='xx')

Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
providing the ``use_coordinate`` keyword which facilitates a clear specification
of which values to use as the index in the interpolation.

Aggregation
===========

Expand Down
97 changes: 94 additions & 3 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1228,6 +1228,97 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def interpolate_na(self, dim=None, method='linear', limit=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too late to be helpful - but are we sure about the name here? We don't generally add _na onto methods (bfill_na?), and pandas is interpolate only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment from @shoyer above: #1640 (comment)

use_coordinate=True,
**kwargs):
"""Interpolate values according to different methods.

Parameters
----------
dim : str
Specifies the dimension along which to interpolate.
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial', 'barycentric', 'krog', 'pchip',
'spline', 'akima'}, optional
String indicating which method to use for interpolation:

- 'linear': linear interpolation (Default). Additional keyword
arguments are passed to ``numpy.interp``
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
method=='polynomial', the ``order`` keyword argument must also be
provided.
- 'barycentric', 'krog', 'pchip', 'spline', and `akima`: use their
respective``scipy.interpolate`` classes.
use_coordinate : boolean or str, default True
Specifies which index to use as the x values in the interpolation
formulated as `y = f(x)`. If False, values are treated as if
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
used. If use_coordinate is a string, it specifies the name of a
coordinate variariable to use as the index.
limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0
or None for no limit.

Returns
-------
DataArray

See also
--------
numpy.interp
scipy.interpolate
"""
from .missing import interp_na
return interp_na(self, dim=dim, method=method, limit=limit,
use_coordinate=use_coordinate, **kwargs)

def ffill(self, dim, limit=None):
'''Fill NaN values by propogating values forward
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change now, but FYI PEP8 is """ on its own line IIRC


*Requires bottleneck.*

Parameters
----------
dim : str
Specifies the dimension along which to propagate values when
filling.
limit : int, default None
The maximum number of consecutive NaN values to forward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
DataArray
'''
from .missing import ffill
return ffill(self, dim, limit=limit)

def bfill(self, dim, limit=None):
'''Fill NaN values by propogating values backward

*Requires bottleneck.*

Parameters
----------
dim : str
Specifies the dimension along which to propagate values when
filling.
limit : int, default None
The maximum number of consecutive NaN values to backward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
DataArray
'''
from .missing import bfill
return bfill(self, dim, limit=limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need bottleneck installed to use bfill or ffill? Maybe it should be noted in docstrings.


def combine_first(self, other):
"""Combine two DataArray objects, with union of coordinates.

Expand Down Expand Up @@ -1935,10 +2026,10 @@ def sortby(self, variables, ascending=True):
sorted: DataArray
A new dataarray where all the specified dims are sorted by dim
labels.

Examples
--------

>>> da = xr.DataArray(np.random.rand(5),
... coords=[pd.date_range('1/1/2000', periods=5)],
... dims='time')
Expand All @@ -1952,7 +2043,7 @@ def sortby(self, variables, ascending=True):
<xarray.DataArray (time: 5)>
array([ 0.26532 , 0.270962, 0.552878, 0.615637, 0.965471])
Coordinates:
* time (time) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 ...
* time (time) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 ...
"""
ds = self._to_temp_dataset().sortby(variables, ascending=ascending)
return self._from_temp_dataset(ds)
Expand Down
99 changes: 99 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -2410,6 +2410,105 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def interpolate_na(self, dim=None, method='linear', limit=None,
use_coordinate=True,
**kwargs):
"""Interpolate values according to different methods.

Parameters
----------
dim : str
Specifies the dimension along which to interpolate.
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial', 'barycentric', 'krog', 'pchip',
'spline'}, optional
String indicating which method to use for interpolation:

- 'linear': linear interpolation (Default). Additional keyword
arguments are passed to ``numpy.interp``
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
method=='polynomial', the ``order`` keyword argument must also be
provided.
- 'barycentric', 'krog', 'pchip', 'spline': use their respective
``scipy.interpolate`` classes.
use_coordinate : boolean or str, default True
Specifies which index to use as the x values in the interpolation
formulated as `y = f(x)`. If False, values are treated as if
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
used. If use_coordinate is a string, it specifies the name of a
coordinate variariable to use as the index.
limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0
or None for no limit.

Returns
-------
Dataset

See also
--------
numpy.interp
scipy.interpolate
"""
from .missing import interp_na, _apply_over_vars_with_dim

new = _apply_over_vars_with_dim(interp_na, self, dim=dim,
method=method, limit=limit,
use_coordinate=use_coordinate,
**kwargs)
return new

def ffill(self, dim, limit=None):
'''Fill NaN values by propogating values forward

*Requires bottleneck.*

Parameters
----------
dim : str
Specifies the dimension along which to propagate values when
filling.
limit : int, default None
The maximum number of consecutive NaN values to forward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
Dataset
'''
from .missing import ffill, _apply_over_vars_with_dim

new = _apply_over_vars_with_dim(ffill, self, dim=dim, limit=limit)
return new

def bfill(self, dim, limit=None):
'''Fill NaN values by propogating values backward

*Requires bottleneck.*

Parameters
----------
dim : str
Specifies the dimension along which to propagate values when
filling.
limit : int, default None
The maximum number of consecutive NaN values to backward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
Dataset
'''
from .missing import bfill, _apply_over_vars_with_dim

new = _apply_over_vars_with_dim(bfill, self, dim=dim, limit=limit)
return new

def combine_first(self, other):
"""Combine two Datasets, default to data_vars of self.

Expand Down
Loading