-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Added stride/offset aliases in to_datetime #26631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
9c67f70
bd3d10f
0104f17
14e987b
6037971
87c9aee
1892f9e
b00576a
c35549b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,8 @@ from pandas._libs.tslibs.c_timestamp cimport _Timestamp | |
|
||
from pandas._libs.tslibs.ccalendar import DAY_SECONDS | ||
|
||
from pandas._libs.tslibs.frequencies import _base_and_stride | ||
|
||
from pandas._libs.tslibs.np_datetime cimport ( | ||
cmp_scalar, reverse_ops, td64_to_tdstruct, pandas_timedeltastruct) | ||
|
||
|
@@ -252,35 +254,54 @@ cpdef inline object precision_from_unit(object unit): | |
int64_t m | ||
int p | ||
|
||
if unit == 'Y': | ||
m = 1000000000L * 31556952 | ||
if unit is None: | ||
m = 1L | ||
p = 0 | ||
return m, p | ||
|
||
unit, stride = _base_and_stride(unit) | ||
|
||
# Normalize old or lowercase codes to standard offset aliases. | ||
if unit in ['min', 'm']: | ||
unit = 'T' | ||
elif unit == 'ns': | ||
unit = 'N' | ||
elif unit == 'ms': | ||
unit = 'L' | ||
elif unit == 'us': | ||
unit = 'U' | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make this an else There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand. I actually want "unit" to be unchanged if not one of the tested values so what would an "else" do? Wouldn't it force all "unit" values to be "U"? I have in the past used a dictionary instead to do this same thing if that is a better and more readable approach. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
unit = unit.upper() | ||
|
||
if unit in ['Y', 'A']: | ||
m = stride * 1000000000L * 31556952 | ||
p = 9 | ||
elif unit == 'M': | ||
m = 1000000000L * 2629746 | ||
m = stride * 1000000000L * 2629746 | ||
p = 9 | ||
elif unit == 'W': | ||
m = 1000000000L * DAY_SECONDS * 7 | ||
m = stride * 1000000000L * DAY_SECONDS * 7 | ||
p = 9 | ||
elif unit == 'D' or unit == 'd': | ||
m = 1000000000L * DAY_SECONDS | ||
elif unit == 'D': | ||
m = stride * 1000000000L * DAY_SECONDS | ||
p = 9 | ||
elif unit == 'h': | ||
m = 1000000000L * 3600 | ||
elif unit == 'H': | ||
m = stride * 1000000000L * 3600 | ||
p = 9 | ||
elif unit == 'm': | ||
m = 1000000000L * 60 | ||
elif unit == 'T': | ||
m = stride * 1000000000L * 60 | ||
p = 9 | ||
elif unit == 's': | ||
m = 1000000000L | ||
elif unit == 'S': | ||
m = stride * 1000000000L | ||
p = 9 | ||
elif unit == 'ms': | ||
m = 1000000L | ||
elif unit == 'L': | ||
m = stride * 1000000L | ||
p = 6 | ||
elif unit == 'us': | ||
m = 1000L | ||
elif unit == 'U': | ||
m = stride * 1000L | ||
p = 3 | ||
elif unit == 'ns' or unit is None: | ||
m = 1L | ||
elif unit == 'N': | ||
m = stride * 1L | ||
p = 0 | ||
else: | ||
raise ValueError("cannot cast unit {unit}".format(unit=unit)) | ||
|
@@ -300,7 +321,7 @@ cdef inline int64_t cast_from_unit(object ts, object unit) except? -1: | |
if ts is None: | ||
return m | ||
|
||
# cast the unit, multiply base/frace separately | ||
# cast the unit, multiply base/frac separately | ||
# to avoid precision issues from float -> int | ||
base = <int64_t>ts | ||
frac = ts - base | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -470,11 +470,26 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, | |
- If True, require an exact format match. | ||
- If False, allow the format to match anywhere in the target string. | ||
|
||
unit : string, default 'ns' | ||
unit of the arg (D,s,ms,us,ns) denote the unit, which is an | ||
integer or float number. This will be based off the origin. | ||
Example, with unit='ms' and origin='unix' (the default), this | ||
would calculate the number of milliseconds to the unix epoch start. | ||
unit : string, default is 'N' | ||
The unit of the arg. Uses a subset of the pandas offset aliases. | ||
|
||
- 'Y', 'A' for yearly (long term average of 365.2425 days) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove Y, A, M here I think we deprecated these anyhow There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Already my working version is a deprecation warning for "Y", "A", "M" and "W". The "W" deprecation is discussed earlier in this conversation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so don't mention the deprecated ones |
||
- 'M' for monthly (long term average of 30.436875 days) | ||
- 'W' for weekly | ||
- 'D' for daily | ||
- 'H' for hourly | ||
- 'T', 'min' for minutely | ||
- 'S' for seconds | ||
- 'L', 'ms' for milliseconds | ||
- 'U', 'us' for microseconds | ||
- 'N' for nanoseconds | ||
|
||
This will be based off the origin. Example, with unit='L' and | ||
origin='unix' (the default), this would calculate the number of | ||
milliseconds to the unix epoch start. | ||
|
||
The offset alias can be prefixed with a stride. For example, results | ||
would be equivalent between unit='7D' and unit='W'. | ||
infer_datetime_format : boolean, default False | ||
If True and no `format` is given, attempt to infer the format of the | ||
datetime strings, and if it can be inferred, switch to a faster | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure we are actually testing all of these, can you see where we are and validate