-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: date_range and DatetimeIndex return different formats with a single timestamp for midnight #53470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I think this is an issue the repr of the indexes, e.g. we already get: >>> first_index
DatetimeIndex(['2012-01-01'], dtype='datetime64[ns]', freq='60T')
>>> first[0]
Timestamp('2012-01-01 00:00:00') |
I was doing some digging and I found out the root cause of this. The line that is the offender is here What's happening is that when you pass in a However, if the date includes a time, then it is not an "even day", so it returns With that said, I am unsure on how to fix this, since the code seems to be working with the intention that it was written with in my opinion. Edit: Fix Formating |
As a potential fix I think we could perhaps add a flag passed to the |
A flag to the constructor is not a good idea IMO, it's too complicated. Can you try to change the repr code to emit time-data based on the freq, e.g. if freq < 1D -> always show date and time. |
It seems to me that the problem is that a datetime input (with more than just the date) is incorrectly recognised as a date. In this sense, adding a flag would not really address the root issue, though it might mitigate the consequences. In any case, it seems to me that the root issue should be tackled directly. Is there any reason to assume that datetime inputs where the time of the day is specified should be interpreted as days? |
Yes, that sound right, @pmlpm1986. A PR would be welcome. |
With what @pmlpm1986 said, the current logic determines whether the day is even or not. So do we want to keep that logic and add on the check for freq < 1 day, or do we replace the current logic with freq < 1 day? |
Sorry. I tried to fix it but couldn't get the information I needed from DatetimeArray objects -- about which I could find only limited information -- only from DatetimeIndex objects. I do not think "freq" is the issue. The issue is that, despite providing one string for a datetime (just one, not multiple), we still get a date. The "freq" can be any and we get the same outcome. My fix was based on the input. If the input is a datetime, we should get a datetime representation, no matter the "freq". In the examples above, the "freq" is always the same and the problem only appears with single entries. In other words, I do see why "freq" should be part of the solution. |
Just to follow up on what I wrote previously about "freq" not being the issue. Consider the following examples:
Is there any reason to get a datetime output for first_index (1 hour "freq") but not for second_index (36 hour "freq")? |
I'm guessing the idea behind using freq less than 1 day is because if you're looking at data with frequency of more than day you may not be interested in the time. But that is just a thought on why it was suggested. |
@Simar-B Okay, but I do not see what "frequency" has to do with being interested in the time of the day. If somebody specifies the time of the day, shouldn't we take that to mean they are interested in time too? The issue is an inconsistent behaviour based on the input string, not the "frequency". The counter-example I provided above further illustrates why "frequency" is not the issue: a 36-hour "frequency" means users are also interested in the time, but the bug still happens because there is only one timestamp for midnight. Conclusion: "frequency" is not the issue. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I have been trying to create DatetimeIndexes whose number of entries is determined by a parameter.
In doing this, I noticed that the date_range and DatetimeIndex return different formats for the same qualitative input (in the code above, a datetime-formatted string) and I do not know how to adjust the output format through the parameters. This happens when only one timestamp is provided/requested and it is for midnight.
Expected Behavior
By providing the same qualitative input (a datetime-formatted string), I was expecting the DatetimeIndex constructor and the date_range method to return the same format. This makes comparisons between time-indexes more cumbersome.
Installed Versions
INSTALLED VERSIONS
commit : 965ceca
python : 3.11.2.final.0
python-bits : 64
OS : Linux
OS-release : 6.1.0-7-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-2 (2023-04-08)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 2.0.2
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 66.1.1
pip : 23.0.1
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 7.0.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.13.1
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None
The text was updated successfully, but these errors were encountered: