-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: to_json incorrectly localizes tz-naive datetimes to UTC #46730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
acdcb05
9652002
16332fc
aa6444c
94ad92a
ab895dc
075e7ca
ca3316d
9772fc3
fe90bb3
f983dce
65f321b
e314020
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,8 +54,8 @@ char *int64ToIso(int64_t value, NPY_DATETIMEUNIT base, size_t *len) { | |
PyErr_NoMemory(); | ||
return NULL; | ||
} | ||
|
||
ret_code = make_iso_8601_datetime(&dts, result, *len, base); | ||
// datetime64 is always naive | ||
ret_code = make_iso_8601_datetime(&dts, result, *len, 0, base); | ||
if (ret_code != 0) { | ||
PyErr_SetString(PyExc_ValueError, | ||
"Could not convert datetime value to string"); | ||
|
@@ -90,7 +90,19 @@ char *PyDateTimeToIso(PyObject *obj, NPY_DATETIMEUNIT base, | |
|
||
*len = (size_t)get_datetime_iso_8601_strlen(0, base); | ||
char *result = PyObject_Malloc(*len); | ||
ret = make_iso_8601_datetime(&dts, result, *len, base); | ||
// Check to see if PyDateTime has a timezone. | ||
// Don't convert to UTC if it doesn't. | ||
int is_tz_aware = 0; | ||
lithomas1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (PyObject_HasAttrString(obj, "tzinfo")) { | ||
PyObject *offset = extract_utc_offset(obj); | ||
if (offset == NULL) { | ||
PyObject_Free(result); | ||
return NULL; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we also need to |
||
} | ||
is_tz_aware = offset != Py_None; | ||
Py_DECREF(offset); | ||
} | ||
ret = make_iso_8601_datetime(&dts, result, *len, is_tz_aware, base); | ||
|
||
if (ret != 0) { | ||
PyErr_SetString(PyExc_ValueError, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -221,8 +221,18 @@ static PyObject *get_values(PyObject *obj) { | |
// The special cases to worry about are dt64tz and category[dt64tz]. | ||
// In both cases we want the UTC-localized datetime64 ndarray, | ||
// without going through and object array of Timestamps. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd In my current (very rough) implementation of this, I've decided to force an array of Timestamps(not sure what the perf/behavior implications of this would be). Alternatively, I could try to just use the dt64 ndarray, and grab the tz of the DTA(as the tz should be the same). Then, I can probably store this inside the JSONTypeContext and pass that through to int64ToISO function. This muddles up the code a little, since numpy dt64 objects are supposed to be tz-naive and PyDateTimetoISO should really be handling this case. Do you have a preference for one way or the other? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would avoid adding anything to JSONTypeContext as it’s already way too overloaded. Is this not something we can just manage in the serializer? Working with an ndarray should be much faster. However, there are also cases where we need to handle datetimes being inside of an object array so need to be robust in how we manage (you might want to add a test for the latter) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the test. Can you elaborate more about how to handle this in the serializer? I guess the way it is handled now is fine, but it would be nice to avoid the performance hit from casting to object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might get away with storing this information in the |
||
if (PyObject_HasAttrString(obj, "tz")) { | ||
PyObject *tz = PyObject_GetAttrString(obj, "tz"); | ||
if (tz != Py_None) { | ||
// Go through object array if we have dt64tz, since tz info will | ||
// be lost if values is used directly. | ||
Py_DECREF(tz); | ||
values = PyObject_CallMethod(obj, "__array__", NULL); | ||
return values; | ||
} | ||
Py_DECREF(tz); | ||
} | ||
values = PyObject_GetAttrString(obj, "values"); | ||
|
||
if (values == NULL) { | ||
// Clear so we can subsequently try another method | ||
PyErr_Clear(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1974,8 +1974,6 @@ class DatetimeLikeBlock(NDArrayBackedExtensionBlock): | |
values: DatetimeArray | TimedeltaArray | ||
|
||
def values_for_json(self) -> np.ndarray: | ||
# special casing datetimetz to avoid conversion through | ||
# object dtype | ||
return self.values._ndarray | ||
|
||
|
||
|
@@ -1989,6 +1987,12 @@ class DatetimeTZBlock(DatetimeLikeBlock): | |
_validate_ndim = True | ||
_can_consolidate = False | ||
|
||
def values_for_json(self) -> np.ndarray: | ||
# force dt64tz to go through object dtype | ||
# tz info will be lost when converting to | ||
# dt64 which is naive | ||
return self.values.astype(object) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with this change, behavior is equivalent to just removing both this and DatimeLikeBlock.values_for_json. Might mean a perf hit for dt64naive and td64 though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I'm not sure I follow here about dt64naive and td64. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right. what im saying is that i think NDArrayBackedBlock.values_for_json works, so no longer needs to be overridden There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll take care of this in a follow up. |
||
|
||
|
||
class ObjectBlock(NumpyBlock): | ||
__slots__ = () | ||
|
Uh oh!
There was an error while loading. Please reload this page.