Skip to content

ENH: Added stride/offset aliases in to_datetime #26631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
59 changes: 40 additions & 19 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ from pandas._libs.tslibs.c_timestamp cimport _Timestamp

from pandas._libs.tslibs.ccalendar import DAY_SECONDS

from pandas._libs.tslibs.frequencies import _base_and_stride

from pandas._libs.tslibs.np_datetime cimport (
cmp_scalar, reverse_ops, td64_to_tdstruct, pandas_timedeltastruct)

Expand Down Expand Up @@ -252,35 +254,54 @@ cpdef inline object precision_from_unit(object unit):
int64_t m
int p

if unit == 'Y':
m = 1000000000L * 31556952
if unit is None:
m = 1L
p = 0
return m, p

unit, stride = _base_and_stride(unit)

# Normalize old or lowercase codes to standard offset aliases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure we are actually testing all of these, can you see where we are and validate

if unit in ['min', 'm']:
unit = 'T'
elif unit == 'ns':
unit = 'N'
elif unit == 'ms':
unit = 'L'
elif unit == 'us':
unit = 'U'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this an else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I actually want "unit" to be unchanged if not one of the tested values so what would an "else" do? Wouldn't it force all "unit" values to be "U"? I have in the past used a dictionary instead to do this same thing if that is a better and more readable approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
    unit = unit.upper()

unit = unit.upper()

if unit in ['Y', 'A']:
m = stride * 1000000000L * 31556952
p = 9
elif unit == 'M':
m = 1000000000L * 2629746
m = stride * 1000000000L * 2629746
p = 9
elif unit == 'W':
m = 1000000000L * DAY_SECONDS * 7
m = stride * 1000000000L * DAY_SECONDS * 7
p = 9
elif unit == 'D' or unit == 'd':
m = 1000000000L * DAY_SECONDS
elif unit == 'D':
m = stride * 1000000000L * DAY_SECONDS
p = 9
elif unit == 'h':
m = 1000000000L * 3600
elif unit == 'H':
m = stride * 1000000000L * 3600
p = 9
elif unit == 'm':
m = 1000000000L * 60
elif unit == 'T':
m = stride * 1000000000L * 60
p = 9
elif unit == 's':
m = 1000000000L
elif unit == 'S':
m = stride * 1000000000L
p = 9
elif unit == 'ms':
m = 1000000L
elif unit == 'L':
m = stride * 1000000L
p = 6
elif unit == 'us':
m = 1000L
elif unit == 'U':
m = stride * 1000L
p = 3
elif unit == 'ns' or unit is None:
m = 1L
elif unit == 'N':
m = stride * 1L
p = 0
else:
raise ValueError("cannot cast unit {unit}".format(unit=unit))
Expand All @@ -300,7 +321,7 @@ cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
if ts is None:
return m

# cast the unit, multiply base/frace separately
# cast the unit, multiply base/frac separately
# to avoid precision issues from float -> int
base = <int64_t>ts
frac = ts - base
Expand Down
25 changes: 20 additions & 5 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -470,11 +470,26 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
- If True, require an exact format match.
- If False, allow the format to match anywhere in the target string.

unit : string, default 'ns'
unit of the arg (D,s,ms,us,ns) denote the unit, which is an
integer or float number. This will be based off the origin.
Example, with unit='ms' and origin='unix' (the default), this
would calculate the number of milliseconds to the unix epoch start.
unit : string, default is 'N'
The unit of the arg. Uses a subset of the pandas offset aliases.

- 'Y', 'A' for yearly (long term average of 365.2425 days)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove Y, A, M here I think we deprecated these anyhow

Copy link
Contributor Author

@timcera timcera Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already my working version is a deprecation warning for "Y", "A", "M" and "W". The "W" deprecation is discussed earlier in this conversation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so don't mention the deprecated ones

- 'M' for monthly (long term average of 30.436875 days)
- 'W' for weekly
- 'D' for daily
- 'H' for hourly
- 'T', 'min' for minutely
- 'S' for seconds
- 'L', 'ms' for milliseconds
- 'U', 'us' for microseconds
- 'N' for nanoseconds

This will be based off the origin. Example, with unit='L' and
origin='unix' (the default), this would calculate the number of
milliseconds to the unix epoch start.

The offset alias can be prefixed with a stride. For example, results
would be equivalent between unit='7D' and unit='W'.
infer_datetime_format : boolean, default False
If True and no `format` is given, attempt to infer the format of the
datetime strings, and if it can be inferred, switch to a faster
Expand Down