You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importpandasaspdimportnumpyasnp# A data frame with a date columndates=pd.date_range("2010-01-01", "2020-01-01", freq="1D").tolist()
groups= ["first", "second", "third"]
df=pd.DataFrame({
"date": dates*len(groups),
"group": np.repeat(groups, len(dates)).tolist(),
"value": np.random.normal(0.0, 1.0, len(groups) *len(dates))
})
# This pivot table has months alphabetically, which isn't helpfuldf["month"] =df["date"].dt.month_name()
df.pivot_table("value", "month", "group")
## group first second third## month ## April 0.100473 0.038078 -0.044501## August 0.036322 0.092148 0.076523## December 0.007469 -0.008177 -0.001926## February -0.177943 -0.050950 0.079945## January 0.034046 -0.019727 0.035488## July -0.095040 0.123110 -0.079442## June -0.023626 0.003913 -0.028143## March -0.057740 0.050930 -0.059962## May -0.110125 -0.053679 0.009097## November -0.097954 -0.001033 0.002073## October -0.023615 -0.008795 0.033985## September -0.030819 -0.024159 -0.046538# To get months chronologically, it's much more effortmonth_names= ['January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December']
df["month2"] =pd.Categorical(
df["date"].dt.month_name(),
categories=month_names,
ordered=True)
df.pivot_table("value", "month2", "group")
## group first second third## month2 ## January 0.034046 -0.019727 0.035488## February -0.177943 -0.050950 0.079945## March -0.057740 0.050930 -0.059962## April 0.100473 0.038078 -0.044501## May -0.110125 -0.053679 0.009097## June -0.023626 0.003913 -0.028143## July -0.095040 0.123110 -0.079442## August 0.036322 0.092148 0.076523## September -0.030819 -0.024159 -0.046538## October -0.023615 -0.008795 0.033985## November -0.097954 -0.001033 0.002073## December 0.007469 -0.008177 -0.001926# What I wish I could have typeddf["month"] =df["date"].dt.month_name(ordered=True)
df.pivot_table("value", "month", "group")
Problem description
pandas.Series.dt.month_name() returns the month names as strings, which means that they get sorted alphabetically. In most cases, that's less useful than them begin sorted from January to December. (See the pivot table generated by the code snippet as an example.)
It is possible to generate month names as an ordered categorical variable, but it feels like a lot of extra code for such a routine task.
I'd like to be able to pass an argument to .dt.month_name() to specify that I want ordered months, and have pandas do the work for me.
There are a few options for the user interface,; I'm not sure which is best. ordered=True is one possible interface. Another would be having a sort_order argument that could be alphabetical or chronological. A third possibility would would be return_type as string or categorical.
The situation is similar for pandas.Series.dt.day_name(), though that has the added complication that sometimes you want an order of Sunday -> Saturday, and sometimes you want an order of Monday -> Sunday.
Expected Output
Months are ordered alphabetically. This is expected, but having them ordered from January to December is more useful.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Code Sample, a copy-pastable example if possible
Problem description
pandas.Series.dt.month_name()
returns the month names as strings, which means that they get sorted alphabetically. In most cases, that's less useful than them begin sorted from January to December. (See the pivot table generated by the code snippet as an example.)It is possible to generate month names as an ordered categorical variable, but it feels like a lot of extra code for such a routine task.
I'd like to be able to pass an argument to
.dt.month_name()
to specify that I want ordered months, and have pandas do the work for me.There are a few options for the user interface,; I'm not sure which is best.
ordered=True
is one possible interface. Another would be having asort_order
argument that could bealphabetical
orchronological
. A third possibility would would bereturn_type
asstring
orcategorical
.The situation is similar for
pandas.Series.dt.day_name()
, though that has the added complication that sometimes you want an order of Sunday -> Saturday, and sometimes you want an order of Monday -> Sunday.Expected Output
Months are ordered alphabetically. This is expected, but having them ordered from January to December is more useful.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.13.3
pytz : 2017.2
dateutil : 2.7.3
pip : 19.2.3
setuptools : 44.0.0.post20200106
Cython : 0.25.2
pytest : 3.0.7
hypothesis : None
sphinx : 1.5.6
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 3.7.3
html5lib : 0.999
pymysql : 0.9.3
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 5.3.0
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 3.7.3
matplotlib : 2.0.2
numexpr : 2.6.2
odfpy : None
openpyxl : 2.4.7
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 0.19.1
sqlalchemy : 1.3.8
tables : 3.3.0
xarray : None
xlrd : 1.0.0
xlwt : 1.2.0
xlsxwriter : 0.9.6
The text was updated successfully, but these errors were encountered: