-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pandas >= 2.0 parses 'May' in ambiguous way. #57521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
And the installed version is above 2.0? |
My bad. I can reproduce the error. I am using pandas 2.2.0 Now. |
I did some debugging. The below format is also failing and working fine in pandas < 2.0 versions. csv_error = """ When we pass
|
Thanks, passing format='mixed' works fine. I still think it is a bug or a regression though, possibly breaking old code bases, since the format really isn't mixed in this case. |
Thanks for the report!
How many? And if they are all "May"?
The breaking change was announced here: https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#datetimes-are-now-parsed-with-a-consistent-format
I'd recommend It seems to me that inference is a guess, and sometimes guesses can be wrong. I'd recommend not relying on inference when possible. |
Thanks, definitely not a bug then. I think your viewpoint as reasonable and this can be closed. |
Not a bug. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Code above works with pandas<2.0 and fails with pandas>=2.0
Failure occurs because pandas >= 2.0 parses the date format in second example as "%d-%B-%Y" (long datename) instead if "%d-%b-%Y" (3-letter dastename) since "May" is both the full month name and three-letter-abbreviation of the 5th month of the year.
The cause is the removal of the default argument
infer_datetime_format=False
incore.tools.datetimes._convert_listlike_datetimes
Pandas>=2.0 then uses
using the ambiguous first element.
Expected Behavior
pandas should infer the datetime format correctly, using several entries when the format is ambiguous.
Installed Versions
INSTALLED VERSIONS
commit : f538741
python : 3.11.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 5, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Swedish_Sweden.1252
pandas : 2.2.0
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 69.1.0
pip : 24.0
Cython : None
pytest : 8.0.0
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.21.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.2.0
gcsfs : None
matplotlib : 3.8.3
numba : 0.59.0
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : 2.0.27
tables : None
tabulate : 0.9.0
xarray : 2024.1.1
The text was updated successfully, but these errors were encountered: