-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Public data for Series and Index: .array and .to_numpy() #23623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
7959eb6
5b15894
4781a36
15cc0b7
888853f
2cfca30
3e76f02
7e43cf0
bceb612
c19c9bb
fe813ff
8619790
639b6fb
95f19bc
3292e43
5a905ab
1e6eed4
4545d93
2d7abb4
a7a13a0
c0a63c0
661b9eb
52f5407
566a027
062c49f
78e5824
e805c26
f9eee65
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -765,6 +765,97 @@ def base(self): | |
FutureWarning, stacklevel=2) | ||
return self.values.base | ||
|
||
@property | ||
def array(self): | ||
# type: () -> Union[np.ndarray, ExtensionArray] | ||
"""The actual Array backing this Series or Index. | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Returns | ||
------- | ||
Union[ndarray, ExtensionArray] | ||
This is the actual array stored within this object. | ||
|
||
Notes | ||
----- | ||
This table lays out the different array types for each extension | ||
dtype within pandas. | ||
|
||
================== ============================= | ||
dtype array type | ||
================== ============================= | ||
category Categorical | ||
period PeriodArray | ||
interval IntervalArray | ||
IntegerNA IntegerArray | ||
datetime64[ns, tz] datetime64[ns]? DatetimeArray | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
================== ============================= | ||
|
||
For any 3rd-party extension types, the array type will be an | ||
ExtensionArray. | ||
|
||
All remaining arrays (ndarrays), ``.array`` will be the ndarray | ||
stored within. | ||
|
||
See Also | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
-------- | ||
to_numpy : Similar method that always returns a NumPy array. | ||
|
||
Examples | ||
-------- | ||
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a'])) | ||
>>> ser.array | ||
[a, b, a] | ||
Categories (2, object): [a, b] | ||
""" | ||
return self._values | ||
|
||
def to_numpy(self): | ||
"""A NumPy array representing the values in this Series or Index. | ||
|
||
The returned array will be the same up to equality (values equal | ||
in `self` will be equal in the returned array; likewise for values | ||
that are not equal). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When reading this, I then wonder: and in what sense is it not the same? (loose some type information? Eg in case of categoricals) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. I'll add that. |
||
|
||
Returns | ||
------- | ||
numpy.ndarray | ||
An ndarray with | ||
|
||
Notes | ||
----- | ||
For NumPy arrays, this will be a reference to the actual data stored | ||
in this Series or Index. | ||
|
||
For extension types, this may involve copying data and coercing the | ||
result to a NumPy type (possibly object), which may be expensive. | ||
|
||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This table lays out the different array types for each extension | ||
dtype within pandas. | ||
|
||
================== ================================ | ||
dtype array type | ||
================== ================================ | ||
category[T] ndarray[T] (same dtype as input) | ||
period ndarray[object] (Periods) | ||
interval ndarray[object] (Intervals) | ||
IntegerNA IntegerArray[object] | ||
datetime64[ns, tz] datetime64[ns]? object? | ||
================== ================================ | ||
|
||
See Also | ||
-------- | ||
array : Get the actual data stored within. | ||
|
||
Examples | ||
-------- | ||
>>> ser = pd.Series(pd.Categorical(['a', 'b', 'a'])) | ||
>>> ser.to_numpy() | ||
array(['a', 'b', 'a'], dtype=object) | ||
""" | ||
if is_extension_array_dtype(self.dtype): | ||
return np.asarray(self._values) | ||
return self._values | ||
|
||
@property | ||
def _ndarray_values(self): | ||
# type: () -> np.ndarray | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1269,3 +1269,54 @@ def test_ndarray_values(array, expected): | |
r_values = pd.Index(array)._ndarray_values | ||
tm.assert_numpy_array_equal(l_values, r_values) | ||
tm.assert_numpy_array_equal(l_values, expected) | ||
|
||
|
||
@pytest.mark.parametrize("array, attr", [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe put in pandas/tests/arrays/test_arrays.py? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't exist yet :), though my This seemed a bit more appropriate since it's next to our tests for |
||
(np.array([1, 2], dtype=np.int64), None), | ||
(pd.Categorical(['a', 'b']), '_codes'), | ||
(pd.core.arrays.period_array(['2000', '2001'], freq='D'), '_data'), | ||
(pd.core.arrays.integer_array([0, np.nan]), '_data'), | ||
(pd.core.arrays.IntervalArray.from_breaks([0, 1]), '_left'), | ||
(pd.SparseArray([0, 1]), '_sparse_values'), | ||
# TODO: DatetimeArray(add) | ||
]) | ||
@pytest.mark.parametrize('box', [pd.Series, pd.Index]) | ||
def test_array(array, attr, box): | ||
if array.dtype.name in ('Int64', 'Sparse[int64, 0]'): | ||
pytest.skip("No index type for {}".format(array.dtype)) | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result = box(array, copy=False).array | ||
|
||
if attr: | ||
array = getattr(array, attr) | ||
result = getattr(result, attr) | ||
|
||
assert result is array | ||
|
||
|
||
def test_array_multiindex_raises(): | ||
idx = pd.MultiIndex.from_product([['A'], ['a', 'b']]) | ||
with pytest.raises(ValueError, match='MultiIndex'): | ||
idx.array | ||
|
||
|
||
@pytest.mark.parametrize('array, expected', [ | ||
(np.array([1, 2], dtype=np.int64), np.array([1, 2], dtype=np.int64)), | ||
(pd.Categorical(['a', 'b']), np.array(['a', 'b'], dtype=object)), | ||
(pd.core.arrays.period_array(['2000', '2001'], freq='D'), | ||
np.array([pd.Period('2000', freq="D"), pd.Period('2001', freq='D')])), | ||
(pd.core.arrays.integer_array([0, np.nan]), | ||
np.array([1, np.nan], dtype=object)), | ||
(pd.core.arrays.IntervalArray.from_breaks([0, 1, 2]), | ||
np.array([pd.Interval(0, 1), pd.Interval(1, 2)], dtype=object)), | ||
(pd.SparseArray([0, 1]), np.array([0, 1], dtype=np.int64)), | ||
# TODO: DatetimeArray(add) | ||
]) | ||
@pytest.mark.parametrize('box', [pd.Series, pd.Index]) | ||
def test_to_numpy(array, expected, box): | ||
thing = box(array) | ||
|
||
if array.dtype.name in ('Int64', 'Sparse[int64, 0]'): | ||
pytest.skip("No index type for {}".format(array.dtype)) | ||
|
||
result = thing.to_numpy() | ||
tm.assert_numpy_array_equal(result, expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also test for the case where it is not a copy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean here? (in case you missed it, the first case is a regular ndarray, so that won't be a copy. Though perhaps you're saying I should assert this for that case?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's what I meant. If we return a view, and people can rely on it, we should test it. |
Uh oh!
There was an error while loading. Please reload this page.