Skip to content

DOC: Improved the docstring of pandas.DataFrame.values #20065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 10, 2018
30 changes: 29 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4232,7 +4232,35 @@ def as_matrix(self, columns=None):

@property
def values(self):
"""Numpy representation of NDFrame
"""
Generate and return a Numpy representation of NDFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe only use "Return" without the "generate" ?


Only the values in the NDFrame will be returned, the axes labels will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original docstring contained the private class "NDFrame", so I kept it in there despite failing the docstring test.

Yes, I know there are occurrences of NDFrame in the current docstrings, but that is exactly why we added the check in the validation script :). As that name should never be exposed to users.

Now in this specific case: this docstring is actually only used for DataFrame (Series has a separate docstring), so you can just use "DataFrame" instead of "NDFrame"
(it is also used for Panel, but since that is deprecated, it is not really a problem that its docstring is not fully correct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had noticed the inheritances - thank you for the clarification; I made the recommended changes

be removed.

Returns
-------
numpy.ndarray
The values of the NDFrame

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from_array could be a good option for a See Also section. If I'm not wrong it's kind of the inverse method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with from_records, thank you for suggesting to include the inverse in that section.

Examples
--------
>>> df = pd.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan)],
... columns=('name', 'class', 'max_speed'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it doesn't add so much value to have a so big example. But what IMO could be nice, is to show two examples, one with a numerical type, and another with mixed types. With some comments on how the data is casted to the most generic type (object if we mix str/object with int)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea - I'll add a strictly numerical example.
I wanted to keep one example with NaN mixed in - I'll rework the existing example and edit the commentary in the Notes section to make the casting clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great, I think the comment in Notes is a great idea. For the examples I personally find easier to understand the very simple ones, with different steps. But feel free to write them the way you find the best.

>>> df
name class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN
>>> df.values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show here df.dtypes as well? (like you did below, that will be useful to see they are all the same)

array([['falcon', 'bird', 389.0],
['parrot', 'bird', 24.0],
['lion', 'mammal', 80.5],
['monkey', 'mammal', nan]], dtype=object)

Notes
-----
Expand Down