Skip to content

DataFrame.combine_first turns timestamp columns into float columns #27364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msalib opened this issue Jul 12, 2019 · 2 comments
Closed

DataFrame.combine_first turns timestamp columns into float columns #27364

msalib opened this issue Jul 12, 2019 · 2 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Regression Functionality that used to work in a prior pandas version

Comments

@msalib
Copy link

msalib commented Jul 12, 2019

Code Sample, a copy-pastable example if possible

from pandas import Timestamp, DataFrame

x = dict(a={0: 0.45})
y = dict(b={0: Timestamp('2015-06-19 00:00:00')})
combined = DataFrame(x).combine_first(DataFrame(y))
print(combined)
print(combined.b.dtype)

assert combined.b.dtype.name == 'datetime64[ns]'

Problem description

In version 0.23.4, the code above works. Starting with release 0.24.0rc1, this code breaks.

The problem is that a when you call x.combine_first(y) and y has timestamp columns that are not present at all in x, the result will have those columns cast to float64. That might be fine, but the fact that behavior changed without warning in 0.24.0rc1 makes it seem like a bug, especially since combine_first has special case handling for timestamp-like columns.

Expected Output

      a          b
0  0.45 2015-06-19
datetime64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7.final.0 python-bits: 64 OS: Linux OS-release: 4.18.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 4.5.0
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.27.1
numpy: 1.16.2
scipy: 1.2.1
pyarrow: 0.14.0
xarray: 0.12.1
IPython: 7.4.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.3.2
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@simonjayhawkins
Copy link
Member

@msalib Thanks for the report.

it looks like it's not just timestamps...

>>> from pandas import Timestamp, DataFrame
>>> x = dict(a={0: 0.45})
>>> y = dict(b={0: 5})
>>> df = DataFrame(x)
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 1 columns):
a    1 non-null float64
dtypes: float64(1)
memory usage: 16.0 bytes
>>> other = DataFrame(y)
>>> other.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 1 columns):
b    1 non-null int64
dtypes: int64(1)
memory usage: 16.0 bytes
>>> combined = df.combine_first(other)
>>> combined.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 2 columns):
a    1 non-null float64
b    1 non-null float64
dtypes: float64(2)
memory usage: 24.0 bytes
>>>

further investigation and PRs welcome.

@simonjayhawkins simonjayhawkins added Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version labels Jul 13, 2019
@simonjayhawkins
Copy link
Member

i think this is a duplicate of #24357 so closing.

@simonjayhawkins simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

2 participants