Skip to content

BUG: exponential moving window covariance fails for a non-from_product MultiIndexed DataFrame #43082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
sberglin opened this issue Aug 17, 2021 · 3 comments
Closed
2 of 3 tasks
Labels
Bug MultiIndex Window rolling, ewma, expanding

Comments

@sberglin
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

columns = pd.MultiIndex.from_arrays([[1, 1, 2], ['a', 'b', 'b']])
index = range(1000)
df = pd.DataFrame(
    np.random.normal(size=(len(index), len(columns))), index=index, 
    columns=columns)
df.ewm(alpha=0.05).cov()

Problem description

cov works fine when the MultiIndex is created with from_product, but fails here since the MultiIndex here cannot be created in this way. I would expect the .cov to work on any MultiIndex since the form of the column names should not impact covariance computations. It works for pd.DataFrame(df.values).ewm(alpha=0.1).cov(). This issue is closely related to this older bug.

Expected Output

A standard ewm.cov output.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 5f648bf1706dd75a9ca0d29f26eadfbb595fe52b
python           : 3.8.8.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.5.0
Version          : Darwin Kernel Version 20.5.0: Sat May  8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.2
numpy            : 1.19.2
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 21.0.1
setuptools       : 52.0.0.post20210125
Cython           : 0.29.22
pytest           : 6.2.2
hypothesis       : None
sphinx           : 3.5.3
blosc            : None
feather          : None
xlsxwriter       : 1.3.8
lxml.etree       : 4.6.3
html5lib         : 1.1
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 2.11.3
IPython          : 7.22.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : 1.3.2
fsspec           : 0.8.7
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.4
numexpr          : 2.7.3
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.6.2
sqlalchemy       : 1.4.4
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 2.0.1
xlwt             : 1.3.0
numba            : 0.53.1
@sberglin sberglin added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 17, 2021
@Dinkarkumar
Copy link

I would like to work on this issue.

@Dinkarkumar
Copy link

MultiIndex_from_arrays

I am also getting the same error in the case of MultiIndex.from_arrays as seen in the above snippet but unexpectedly I am getting a similar error for MultiIndex.from_product as could be seen in the below snippets.

MultiIndex_from_product

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7c48ff4
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.2.5
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

My Understanding of the Issue:
This inconsistency of ewm.cov() output could be resolved by the following workaround for now:

  1. Convert the multiIndex dataframe into a single Indexed DataFrame and feed the newly created single Indexed DataFrame into ewm.cov() for getting the consistent output. This method is a workaround to solve the issue.
    Working_from_arrays
    Above Snippet is a working version of MultiIndex.from_arrays using the method as mentioned for achieving consistency.
    Working_from_product
    Above Snippet is a working version of MultiIndex.from_product using the method as mentioned for achieving consistency.

Still working on the issue to resolve it internally if possible.
Expecting @sberglin to correct me or give a better insight of the issue if you feel , i need.

@mroeschke mroeschke added MultiIndex Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2021
@mroeschke
Copy link
Member

Thanks for the report. I think this is a duplicate of #21157 (which is generally an issue when calling rolling/expanding/ewm on a DataFrame with MultiIndex columns). Going to close in favor of tracking in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

3 participants