Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

richierocks · 2020-01-24T17:06:52Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

# A data frame with a date column
dates = pd.date_range("2010-01-01", "2020-01-01", freq="1D").tolist()
groups = ["first", "second", "third"]
df = pd.DataFrame({
    "date": dates * len(groups),
    "group": np.repeat(groups, len(dates)).tolist(),
    "value": np.random.normal(0.0, 1.0, len(groups) * len(dates))
})

# This pivot table has months alphabetically, which isn't helpful
df["month"] = df["date"].dt.month_name()
df.pivot_table("value", "month", "group")

## group         first    second     third
## month                                  
## April      0.100473  0.038078 -0.044501
## August     0.036322  0.092148  0.076523
## December   0.007469 -0.008177 -0.001926
## February  -0.177943 -0.050950  0.079945
## January    0.034046 -0.019727  0.035488
## July      -0.095040  0.123110 -0.079442
## June      -0.023626  0.003913 -0.028143
## March     -0.057740  0.050930 -0.059962
## May       -0.110125 -0.053679  0.009097
## November  -0.097954 -0.001033  0.002073
## October   -0.023615 -0.008795  0.033985
## September -0.030819 -0.024159 -0.046538

# To get months chronologically, it's much more effort
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
          'August', 'September', 'October', 'November', 'December']
df["month2"] = pd.Categorical(
    df["date"].dt.month_name(), 
    categories=month_names, 
    ordered=True)
df.pivot_table("value", "month2", "group")

## group         first    second     third
## month2                                 
## January    0.034046 -0.019727  0.035488
## February  -0.177943 -0.050950  0.079945
## March     -0.057740  0.050930 -0.059962
## April      0.100473  0.038078 -0.044501
## May       -0.110125 -0.053679  0.009097
## June      -0.023626  0.003913 -0.028143
## July      -0.095040  0.123110 -0.079442
## August     0.036322  0.092148  0.076523
## September -0.030819 -0.024159 -0.046538
## October   -0.023615 -0.008795  0.033985
## November  -0.097954 -0.001033  0.002073
## December   0.007469 -0.008177 -0.001926

# What I wish I could have typed
df["month"] = df["date"].dt.month_name(ordered=True)
df.pivot_table("value", "month", "group")

Problem description

pandas.Series.dt.month_name() returns the month names as strings, which means that they get sorted alphabetically. In most cases, that's less useful than them begin sorted from January to December. (See the pivot table generated by the code snippet as an example.)

It is possible to generate month names as an ordered categorical variable, but it feels like a lot of extra code for such a routine task.

I'd like to be able to pass an argument to .dt.month_name() to specify that I want ordered months, and have pandas do the work for me.

There are a few options for the user interface,; I'm not sure which is best. ordered=True is one possible interface. Another would be having a sort_order argument that could be alphabetical or chronological. A third possibility would would be return_type as string or categorical.

The situation is similar for pandas.Series.dt.day_name(), though that has the added complication that sometimes you want an order of Sunday -> Saturday, and sometimes you want an order of Monday -> Sunday.

Expected Output

Months are ordered alphabetically. This is expected, but having them ordered from January to December is more useful.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.6.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.13.3
pytz : 2017.2
dateutil : 2.7.3
pip : 19.2.3
setuptools : 44.0.0.post20200106
Cython : 0.25.2
pytest : 3.0.7
hypothesis : None
sphinx : 1.5.6
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 3.7.3
html5lib : 0.999
pymysql : 0.9.3
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 5.3.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 3.7.3
matplotlib : 2.0.2
numexpr : 2.6.2
odfpy : None
openpyxl : 2.4.7
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 0.19.1
sqlalchemy : 1.3.8
tables : 3.3.0
xarray : None
xlrd : 1.0.0
xlwt : 1.2.0
xlsxwriter : 0.9.6

The text was updated successfully, but these errors were encountered:

mroeschke · 2020-01-24T18:38:37Z

Thanks for the suggestion. We have a similar request in #12993, so I will consolidate these issues and add these methods in that issue.

richierocks · 2020-01-24T21:42:15Z

Gah. I'd searched for dt.month_name instead of just month_name.

mroeschke closed this as completed Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

richierocks commented Jan 24, 2020

INSTALLED VERSIONS

mroeschke commented Jan 24, 2020

Uh oh!

richierocks commented Jan 24, 2020

Uh oh!

Uh oh!

Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

Comments

richierocks commented Jan 24, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

mroeschke commented Jan 24, 2020

Uh oh!

richierocks commented Jan 24, 2020

Uh oh!

Output of `pd.show_versions()`