Skip to content

Allow .dt.month_name() and .dt.day_name() to return ordered categorical #31287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
richierocks opened this issue Jan 24, 2020 · 2 comments
Closed

Comments

@richierocks
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

# A data frame with a date column
dates = pd.date_range("2010-01-01", "2020-01-01", freq="1D").tolist()
groups = ["first", "second", "third"]
df = pd.DataFrame({
    "date": dates * len(groups),
    "group": np.repeat(groups, len(dates)).tolist(),
    "value": np.random.normal(0.0, 1.0, len(groups) * len(dates))
})

# This pivot table has months alphabetically, which isn't helpful
df["month"] = df["date"].dt.month_name()
df.pivot_table("value", "month", "group")

## group         first    second     third
## month                                  
## April      0.100473  0.038078 -0.044501
## August     0.036322  0.092148  0.076523
## December   0.007469 -0.008177 -0.001926
## February  -0.177943 -0.050950  0.079945
## January    0.034046 -0.019727  0.035488
## July      -0.095040  0.123110 -0.079442
## June      -0.023626  0.003913 -0.028143
## March     -0.057740  0.050930 -0.059962
## May       -0.110125 -0.053679  0.009097
## November  -0.097954 -0.001033  0.002073
## October   -0.023615 -0.008795  0.033985
## September -0.030819 -0.024159 -0.046538

# To get months chronologically, it's much more effort
month_names = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
          'August', 'September', 'October', 'November', 'December']
df["month2"] = pd.Categorical(
    df["date"].dt.month_name(), 
    categories=month_names, 
    ordered=True)
df.pivot_table("value", "month2", "group")

## group         first    second     third
## month2                                 
## January    0.034046 -0.019727  0.035488
## February  -0.177943 -0.050950  0.079945
## March     -0.057740  0.050930 -0.059962
## April      0.100473  0.038078 -0.044501
## May       -0.110125 -0.053679  0.009097
## June      -0.023626  0.003913 -0.028143
## July      -0.095040  0.123110 -0.079442
## August     0.036322  0.092148  0.076523
## September -0.030819 -0.024159 -0.046538
## October   -0.023615 -0.008795  0.033985
## November  -0.097954 -0.001033  0.002073
## December   0.007469 -0.008177 -0.001926

# What I wish I could have typed
df["month"] = df["date"].dt.month_name(ordered=True)
df.pivot_table("value", "month", "group")

Problem description

pandas.Series.dt.month_name() returns the month names as strings, which means that they get sorted alphabetically. In most cases, that's less useful than them begin sorted from January to December. (See the pivot table generated by the code snippet as an example.)

It is possible to generate month names as an ordered categorical variable, but it feels like a lot of extra code for such a routine task.

I'd like to be able to pass an argument to .dt.month_name() to specify that I want ordered months, and have pandas do the work for me.

There are a few options for the user interface,; I'm not sure which is best. ordered=True is one possible interface. Another would be having a sort_order argument that could be alphabetical or chronological. A third possibility would would be return_type as string or categorical.

The situation is similar for pandas.Series.dt.day_name(), though that has the added complication that sometimes you want an order of Sunday -> Saturday, and sometimes you want an order of Monday -> Sunday.

Expected Output

Months are ordered alphabetically. This is expected, but having them ordered from January to December is more useful.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.13.3
pytz : 2017.2
dateutil : 2.7.3
pip : 19.2.3
setuptools : 44.0.0.post20200106
Cython : 0.25.2
pytest : 3.0.7
hypothesis : None
sphinx : 1.5.6
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 3.7.3
html5lib : 0.999
pymysql : 0.9.3
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 5.3.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 3.7.3
matplotlib : 2.0.2
numexpr : 2.6.2
odfpy : None
openpyxl : 2.4.7
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 0.19.1
sqlalchemy : 1.3.8
tables : 3.3.0
xarray : None
xlrd : 1.0.0
xlwt : 1.2.0
xlsxwriter : 0.9.6

@mroeschke
Copy link
Member

Thanks for the suggestion. We have a similar request in #12993, so I will consolidate these issues and add these methods in that issue.

@richierocks
Copy link
Author

Gah. I'd searched for dt.month_name instead of just month_name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants