-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add pointwise indexing via isel_points method #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
b080142
ae1f3a8
c3cb3a5
bf0915b
e4851ba
10b67e3
5ab9d4b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,8 @@ below and summarized in this table: | |
| By name | By label | ``arr.sel(space='IA')`` or |br| | ``ds.sel(space='IA')`` or |br| | | ||
| | | ``arr.loc[dict(space='IA')]`` | ``ds.loc[dict(space='IA')]`` | | ||
+------------------+--------------+---------------------------------+--------------------------------+ | ||
| By name | By integers | ``arr.isel_points(x=[0, 1])`` | ``ds.isel_points(x=[0, 1])`` | | ||
+------------------+--------------+---------------------------------+--------------------------------+ | ||
|
||
Positional indexing | ||
------------------- | ||
|
@@ -57,6 +59,7 @@ DataArray: | |
|
||
Positional indexing deviates from the NumPy when indexing with multiple | ||
arrays like ``arr[[0, 1], [0, 1]]``, as described in :ref:`indexing details`. | ||
Use :py:meth:`~xray.Dataset.isel_points` to achieve this functionality. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would reference the section on pointwise indexing in this same document, e.g., by using
as the link and just before the "Pointwise indexing" section below add:
|
||
|
||
xray also supports label-based indexing, just like pandas. Because | ||
we use a :py:class:`pandas.Index` under the hood, label based indexing is very | ||
|
@@ -108,6 +111,13 @@ use them explicitly to slice data. There are two ways to do this: | |
# index by dimension coordinate labels | ||
arr.sel(time=slice('2000-01-01', '2000-01-02')) | ||
|
||
3. Use the :py:meth:`~xray.DataArray.isel_points` method: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would put this in a new top level section, "Pointwise indexing" |
||
|
||
.. ipython:: python | ||
|
||
# index by integer array indices | ||
arr.isel_points(space=[0, 1], dim='points') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A more compelling example would show actual pointwise indexing, instead of only indexing along one dimension. |
||
|
||
The arguments to these methods can be any objects that could index the array | ||
along the dimension given by the keyword, e.g., labels for an individual value, | ||
Python :py:func:`slice` objects or 1-dimensional arrays. | ||
|
@@ -122,7 +132,7 @@ __ http://legacy.python.org/dev/peps/pep-0472/ | |
|
||
.. warning:: | ||
|
||
Do not try to assign values when using ``isel`` or ``sel``:: | ||
Do not try to assign values when using ``isel``, ``isel_points`` or ``sel``:: | ||
|
||
# DO NOT do this | ||
arr.isel(space=0) = 0 | ||
|
@@ -145,6 +155,7 @@ simultaneously, returning a new dataset: | |
ds = arr.to_dataset() | ||
ds.isel(space=[0], time=[0]) | ||
ds.sel(time='2000-01-01') | ||
ds.isel_points(space=[0, 1], dim='points') | ||
|
||
Positional indexing on a dataset is not supported because the ordering of | ||
dimensions in a dataset is somewhat ambiguous (it can vary between different | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ v0.5.3 (unreleased) | |
|
||
- Dataset variables are now written to netCDF files in order of appearance | ||
when using the netcdf4 backend (:issue:`479`). | ||
- Added :py:meth:`~xray.Dataset.isel_points` and :py:meth:`~xray.DataArray.isel_points` to support pointwise indexing of Datasets and DataArrays (:issue:`475`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A brief example would be nice to add here. You could also link to the new documentation section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
|
||
v0.5.2 (16 July 2015) | ||
--------------------- | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
import functools | ||
import warnings | ||
from collections import Mapping | ||
from collections import Mapping, Sequence | ||
from numbers import Number | ||
|
||
import numpy as np | ||
|
@@ -21,6 +21,7 @@ | |
from .variable import as_variable, Variable, Coordinate, broadcast_variables | ||
from .pycompat import (iteritems, itervalues, basestring, OrderedDict, | ||
dask_array_type) | ||
from .combine import concat | ||
|
||
|
||
# list of attributes of pd.DatetimeIndex that are ndarrays of time info | ||
|
@@ -1028,6 +1029,58 @@ def sel(self, method=None, **indexers): | |
return self.isel(**indexing.remap_label_indexers(self, indexers, | ||
method=method)) | ||
|
||
def isel_points(self, dim='points', **indexers): | ||
"""Returns a new dataset with each array indexed pointwise along the | ||
specified dimension(s). | ||
|
||
This method selects pointwise values from each array and is akin to | ||
the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this | ||
method does not require knowing the order of each array's dimensions. | ||
|
||
Parameters | ||
---------- | ||
dim : str, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A neat trick about using Might be worth mentioning that. |
||
Dimension name for which the points will be added to. | ||
**indexers : {dim: indexer, ...} | ||
Keyword arguments with names matching dimensions and values given | ||
by integers, slice objects or arrays. All indexers must be the same | ||
length. | ||
|
||
Returns | ||
------- | ||
obj : Dataset | ||
A new Dataset with the same contents as this dataset, except each | ||
array and dimension is indexed by the appropriate indexers. In | ||
general, each array's data will be a view of the array's data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with pointwise indexing it's always a copy |
||
in this dataset, unless numpy fancy indexing was triggered by using | ||
an array indexer, in which case the data will be a copy. | ||
|
||
See Also | ||
-------- | ||
Dataset.sel | ||
DataArray.isel | ||
DataArray.sel | ||
DataArray.isel_points | ||
""" | ||
invalid = [k for k in indexers if k not in self.dims] | ||
if invalid: | ||
raise ValueError("dimensions %r do not exist" % invalid) | ||
|
||
# all the indexers should be iterables | ||
keys = indexers.keys() | ||
indexers = [(k, ([v] if not isinstance(v, Sequence) else v)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably better to raise an error if someone tries to pass in something that isn't a sequence? I would probably coerce with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thought was that you may not know the shape of x and y a priori but am happy to remove this. After thinking about how numpy treats scalar vs array indices (example below), I think it is best to remove this and raise an error. In [1]: x = np.arange(12).reshape((3, 4))
In [2]: x[2, 3]
Out[2]: 11
In [3]: x[[2], [3]]
Out[3]: array([11]) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been trying to implement this and I'm not getting it to work. Specifically, how do we want to handle Any thoughts on the best way to handle this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would raise an error with slice objects, too. 1d arrays of integers is enough functionality. On Wed, Jul 22, 2015 at 9:46 PM, Joe Hamman [email protected]
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay. That is done in my last commit so no further change needed here. |
||
for k, v in iteritems(indexers)] | ||
|
||
# all the indexers should have the same length | ||
lengths = set([len(v) for k, v in indexers]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: creating the intermediate list here is unnecessary -- this works exactly the same (except it's a little faster): |
||
if len(lengths) > 1: | ||
raise ValueError('All indexers must be the same length') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a TODO note about speeding this up using vectorized indexing? |
||
return concat([self.isel(**d) for d in | ||
[dict(zip(keys, inds)) for inds in | ||
zip(*[v for k, v in indexers])]], | ||
dim=dim) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's explicitly provide the indexer_dims = set(indexers)
def relevant_keys(mapping):
return [k for k, v in mapping.items()
if any(d in indexer_dims for d in v.dims)]
data_vars = relevant_keys(self.data_vars)
coords = relevant_keys(self.coords) This means The test case where would be a dataset that aren't has a constant variable, e.g.,
If you do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. test added in |
||
|
||
def reindex_like(self, other, method=None, copy=True): | ||
"""Conform this object onto the indexes of another object, filling | ||
in missing values with NaN. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -382,6 +382,64 @@ def test_sel_method(self): | |
actual = data.sel(x=[0.9, 1.9], method='backfill') | ||
self.assertDataArrayIdentical(expected, actual) | ||
|
||
def test_isel_points_method(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a test case for negative indexers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
shape = (10, 5, 6) | ||
np_array = np.random.random(shape) | ||
da = DataArray(np_array, dims=['time', 'y', 'x']) | ||
y = [1, 3] | ||
x = [3, 0] | ||
|
||
expected = da.values[:, y, x] | ||
|
||
actual = da.isel_points(y=y, x=x, dim='test_coord') | ||
assert 'test_coord' in actual.coords | ||
assert actual.coords['test_coord'].shape == (len(y), ) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also verify that Probably easier just to construct the expected data-array and then compare them with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
|
||
actual = da.isel_points(y=y, x=x) | ||
assert 'points' in actual.coords | ||
# not sure why actual needs to be transposed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. xray always concatenates along the first dimension. NumPy tries to do something very clever with fancy indexing and concatenates along the last dimension in this case:
|
||
np.testing.assert_equal(actual.T, expected) | ||
|
||
# test scalars (should match isel but will have points dim) | ||
y = 1 | ||
x = 3 | ||
expected = da.values[:, y, x] | ||
|
||
actual = da.isel_points(y=y, x=x) | ||
# squeeze to drop "points" dim | ||
assert 'points' in actual.coords | ||
np.testing.assert_allclose(actual.squeeze().values, expected) | ||
self.assertDataArrayIdentical(actual.squeeze().drop(['points']), | ||
da.isel(y=y, x=x)) | ||
|
||
# a few corner cases | ||
da.isel_points(time=[1, 2], x=[2, 2], y=[3, 4]) | ||
np.testing.assert_allclose( | ||
da.isel_points(time=1, x=2, y=4).values.squeeze(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, let's consider if we really want to support indexing with scalars in |
||
np_array[1, 4, 2].squeeze()) | ||
|
||
da.isel_points(time=1) | ||
da.isel_points(time=[1, 2]) | ||
|
||
# test that leaving out a dim is the same as slice(None) | ||
self.assertDataArrayIdentical( | ||
da.isel_points(time=slice(None), y=y, x=x), | ||
da.isel_points(time=np.arange(len(da['time'])), y=y, x=x)) | ||
self.assertDataArrayIdentical( | ||
da.isel_points(time=slice(None), y=y, x=x), | ||
da.isel_points(y=y, x=x)) | ||
|
||
# test that the order of the indexers doesn't matter | ||
self.assertDataArrayIdentical( | ||
da.isel_points(y=y, x=x), | ||
da.isel_points(x=x, y=y)) | ||
|
||
# make sure we're raising errors in the right places | ||
with self.assertRaises(ValueError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like using |
||
da.isel_points(y=[1, 2], x=[1, 2, 3]) | ||
with self.assertRaises(ValueError): | ||
da.isel_points(bad_key=[1, 2]) | ||
|
||
def test_loc(self): | ||
self.ds['x'] = ('x', np.array(list('abcdefghij'))) | ||
da = self.ds['foo'] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave this outside the table for now -- it doesn't quite fit in with the other methods, because it does pointwise indexing, not outer/orthogonal indexing.