BUG GH#43414 Infer and coerce datetimes am pm #43416

ghost · 2021-09-05T08:59:26Z

closes BUG: pd.to_datetime cannot infer and coerce dates with AM/PM and infer_datetime_format=True and errors="coerce". #43414
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

ghost · 2021-09-05T09:06:38Z

Something interesting I found is that when we infer_datetime_format we actually only use the first record. This is important because in some of the tests we generate data with pd.date_range where by default the first record will start at midnight. This can lead to different behavior than if the data did not start with a record at midnight, which is probably more common in practice.

simonjayhawkins · 2021-09-05T13:27:52Z

Thanks @mikephung122 for the PR.

tests are failing https://github.com/pandas-dev/pandas/pull/43416/checks?check_run_id=3517046932

Something interesting I found is that when we infer_datetime_format we actually only use the first record.

This is clearly stated in the docs.

infer_datetime_format : bool, default False
        If True and no `format` is given, attempt to infer the format of the
        datetime strings based on the first non-NaN element,
        and if it can be inferred, switch to a faster method of parsing them.
        In some cases this can increase the parsing speed by ~5-10x.

There is an underlying issue with infer_datetime_format=True and errors='coerce' specified together, #25143

The fact that specifying errors='raise' or not specifying infer_datetime_format=True gives the correct result with the code sample in the issue OP #43414, would fixing the underlying issue not fix this issue without the code added here?

ghost · 2021-09-05T17:45:16Z

@simonjayhawkins When I found this problem I also thought it was related to #25143. However, the more I looked into it the more I feel like it's unclear and now I believe fixing this as a separate issue might make more sense.

I think this issue and fix by itself is straight-forward. When converting we always try to guess the datetime format first, regardless of the errors argument and error handling. Ideally, the guess_datetime_format function can and should be able to guess date and time formats with AM/PM. However, right now it will return something like this '%m/%d/%Y %H:%M:%S AM' for AM/PM. So when we try to use that to parse with our best guess it never works and the subsequent issue of inconsistent error handling and the fallback is surfaced.

We might be able to completely ignore this issue since the fallback method works. However, I'm wondering why it was including in the first place if that's true. The other option would be trying the fallback before accepting the coerced NaT, but then I would say this is still an issue worth fixing. Overall, I think cleaning this up and ensuring that no existing functionality is broken would be significantly more difficult. What do you think?

ghost · 2021-09-05T20:10:42Z

@simonjayhawkins I checked the test, it looks like its the new unit tests that are failing for one posix environment while another posix environment shows INTERNALERROR and 'Error: Process completed with exit code 3.'. However, the unit tests don't fail locally or on the other posix environments. I'm wondering if this is an issue with how these are built and ran but don't see anything immediately obvious from the logs during the build or test? This is actually my first time touching any of the cython, are there any additional steps that need to be done outside of changing the .pyx file?

Mike Phung added 2 commits September 5, 2021 02:01

TST GH#43414 Add tests for to infer and coerce datetime with am / pm.

aa6c02a

BUG GH#43414 Add am/pm logic to guess_datetime_format.

384f44b

simonjayhawkins added Bug Datetime Datetime data dtype labels Sep 5, 2021

simonjayhawkins mentioned this pull request Sep 5, 2021

BUG: pd.to_datetime cannot infer and coerce dates with AM/PM and infer_datetime_format=True and errors="coerce". #43414

Closed

3 tasks

ghost changed the title ~~Infer datetimes am pm~~ BUG GH#43414 Infer datetimes am pm Sep 5, 2021

ghost changed the title ~~BUG GH#43414 Infer datetimes am pm~~ BUG GH#43414 Infer and coerce datetimes am pm Sep 5, 2021

Merge branch 'master' into infer-datetimes-am-pm

ec55349

ghost closed this Sep 18, 2021

ghost deleted the infer-datetimes-am-pm branch September 30, 2021 03:03

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG GH#43414 Infer and coerce datetimes am pm #43416

BUG GH#43414 Infer and coerce datetimes am pm #43416

Uh oh!

ghost commented Sep 5, 2021

Uh oh!

ghost commented Sep 5, 2021 •

edited by ghost

Loading

Uh oh!

simonjayhawkins commented Sep 5, 2021

Uh oh!

ghost commented Sep 5, 2021 •

edited by ghost

Loading

Uh oh!

ghost commented Sep 5, 2021 •

edited by ghost

Loading

Uh oh!

Uh oh!

Uh oh!

BUG GH#43414 Infer and coerce datetimes am pm #43416

BUG GH#43414 Infer and coerce datetimes am pm #43416

Uh oh!

Conversation

ghost commented Sep 5, 2021

Uh oh!

ghost commented Sep 5, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonjayhawkins commented Sep 5, 2021

Uh oh!

ghost commented Sep 5, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Sep 5, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ghost commented Sep 5, 2021 •

edited by ghost

Loading

ghost commented Sep 5, 2021 •

edited by ghost

Loading

ghost commented Sep 5, 2021 •

edited by ghost

Loading