Skip to content

sort_index does not work with levels not aligned with index #25775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ClementWalter opened this issue Mar 19, 2019 · 4 comments · Fixed by #26492
Closed

sort_index does not work with levels not aligned with index #25775

ClementWalter opened this issue Mar 19, 2019 · 4 comments · Fixed by #26492

Comments

@ClementWalter
Copy link

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd

pd.np.random.seed(0)
(
    pd.DataFrame(pd.np.random.rand(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
    .assign(b=lambda df: (df.b*10).astype(int))
    .set_index(['a', 'b', 'c'])
    .sort_index(axis=0, level=['b', 'a'])
)

Out[96]: 
                            d         e
a        b c                           
0.087129 0 0.832620  0.778157  0.870012
0.639921 1 0.944669  0.521848  0.414662
0.670638 2 0.128926  0.315428  0.363711
0.359508 4 0.697631  0.060225  0.666767
0.645894 4 0.891773  0.963663  0.383442
0.791725 5 0.568045  0.925597  0.071036
0.617635 6 0.616934  0.943748  0.681820
0.264556 7 0.456150  0.568434  0.018790
0.978618 7 0.461479  0.780529  0.118274
0.548814 7 0.602763  0.544883  0.423655

Problem description

I don't understand why the a index is not sorted (see the b=7 rows).
Indeed, set_index does not seem to take into account levels not aligned with dataframe indexes. Since index is ['a', 'b', 'c'] it can only sort with levels=['a'], levels=['a', 'b'], levels=['a', 'b', 'c'], levels=['b'], levels=['b', 'c'], levels=['c'].

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

pd.np.random.seed(0)
(
    pd.DataFrame(pd.np.random.rand(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
    .assign(b=lambda df: (df.b*10).astype(int))
    .sort_values(['b', 'a'])
    .set_index(['a', 'b', 'c'])
)

Out[104]: 
                            d         e
a        b c                           
0.087129 0 0.832620  0.778157  0.870012
0.639921 1 0.944669  0.521848  0.414662
0.670638 2 0.128926  0.315428  0.363711
0.359508 4 0.697631  0.060225  0.666767
0.645894 4 0.891773  0.963663  0.383442
0.791725 5 0.568045  0.925597  0.071036
0.617635 6 0.616934  0.943748  0.681820
0.264556 7 0.456150  0.568434  0.018790
0.548814 7 0.602763  0.544883  0.423655
0.978618 7 0.461479  0.780529  0.118274

Output of pd.show_versions()

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: fr_FR.UTF-8
pandas: 0.23.2
pytest: 3.6.1
pip: 18.0
setuptools: 40.4.3
Cython: None
numpy: 1.14.3
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 7.0.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Mar 19, 2019

Hmm that does look strange - thanks for the report! Investigation and PRs are always welcome

@WillAyd WillAyd added this to the Contributions Welcome milestone Mar 19, 2019
@WillAyd WillAyd added the Bug label Mar 19, 2019
@mahepe
Copy link
Contributor

mahepe commented May 19, 2019

I get the expected output at the current master.

@jreback
Copy link
Contributor

jreback commented May 19, 2019

@mahepe great! would u do a PR with a validation test

@mahepe
Copy link
Contributor

mahepe commented May 21, 2019

sure @jreback, will take some days due to work though

mahepe pushed a commit to mahepe/pandas that referenced this issue May 22, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue May 22, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue May 26, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue May 26, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue May 26, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue May 29, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 30, 2019
mahepe pushed a commit to mahepe/pandas that referenced this issue Jun 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants