-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG GH22858 When creating empty dataframe, only cast int to float if index given #22963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @JustinZhengBC! Thanks for submitting the PR.
|
Looks pretty good. Can you add a whatsnew note for v0.24? |
I updated the whatsnew for v0.24. Also, after reverting to master, I found that while passing "int64" as the dtype caused the bug, merely passing int did not. I have updated the test to reflect this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the tests in this module could be improved with parametrization. On the fence but I'm generally OK with not requiring that for this PR (others may have a differing opinion), but would be a good follow up if you are interested.
@@ -806,6 +806,9 @@ def test_constructor_corner(self): | |||
df = DataFrame(index=lrange(10), columns=['a', 'b'], dtype=object) | |||
assert df.values.dtype == np.object_ | |||
|
|||
df = DataFrame(columns=['a', 'b'], dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the GH issue number as a comment to reference?
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -194,6 +194,7 @@ Other Enhancements | |||
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`). | |||
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`). | |||
- Compatibility with Matplotlib 3.0 (:issue:`22790`). | |||
- A newly constructed empty :class:`DataFrame` of integers will now only be cast to ``float64`` if an index is specified (:issue:`22858`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think the wording DataFrame of integers
is somewhat misleading as there aren't actually any integers here; the type is just passed explicitly during construction and not subsequently coerced.
Codecov Report
@@ Coverage Diff @@
## master #22963 +/- ##
==========================================
+ Coverage 92.18% 92.18% +<.01%
==========================================
Files 169 169
Lines 50833 50823 -10
==========================================
- Hits 46862 46853 -9
+ Misses 3971 3970 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments lgtm
ping on green
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -194,6 +194,7 @@ Other Enhancements | |||
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`). | |||
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`). | |||
- Compatibility with Matplotlib 3.0 (:issue:`22790`). | |||
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to api breaking changes
pandas/core/dtypes/cast.py
Outdated
if is_integer_dtype(dtype) and isna(value): | ||
# GH 22858: only cast to float if an index | ||
# (passed here as length) is specified | ||
if is_integer_dtype(dtype) and isna(value) and length: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the length check first
dtype=int) | ||
assert df.values.dtype == np.dtype('float64') | ||
@pytest.mark.parametrize("data, index, columns, dtype, ans", [ | ||
(None, lrange(10), ['a', 'b'], object, np.object_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u rename ans -> expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comment, otherwise lgtm ping on green
doc/source/whatsnew/v0.24.0.txt
Outdated
|
||
.. _whatsnew_0240.api_breaking: | ||
- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in the section itself, just below this (it’s a backward incompatible change)
@WillAyd merge when satisfied |
Very nice change - thanks @JustinZhengBC ! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Previously, when creating a dataframe with no data of dtype int, the dtype would be changed to float. This is necessary when a predefined number of rows is included as the index parameter, so that they can be filled with nan. However, when no index is passed, this cast is unexpected. This PR changes it so dtype is only altered when necessary.