Skip to content

Using 'by' and 'weights' together with DataFrame.hist() results in ValueError: weights should have the same shape as x #9540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
awhan opened this issue Feb 24, 2015 · 4 comments

Comments

@awhan
Copy link

awhan commented Feb 24, 2015

Wanted to produce grouped histogram such that the heights of the bars add up to 1. The following code results in ValueError: weights should have the same shape as x

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 100
df = pd.DataFrame(np.random.randn(n), columns=['a'])
by = np.random.randint(1,5,n)
df.hist(by=by) # works
plt.show()
weights = np.repeat(1/len(df), len(df))
df.hist(weights = weights) # works
plt.show()
df.hist(by = by, weights = weights) # does not work
plt.show()

In [15]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.18.6-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 2.4.1
sphinx: None
patsy: 0.3.0
dateutil: 2.4.0
pytz: 2014.10
bottleneck: 1.0.0
tables: None
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.5.6
sqlalchemy: None
pymysql: None
psycopg2: None

@mgdadv
Copy link

mgdadv commented Feb 27, 2015

Could you clarify a bit more what you are trying to achieve?

The by splits the original data into groups. df.hist() then calls the matplotlib histogram function for each group with the original weights. In your case the size of each by-group will be random and different. The weights however always are of length 100.

The by and weights combination seems to work if the groups all have the same size and match the weights as in this example:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)

n = 100
df = pd.DataFrame(np.random.randn(n), columns=['a'])

by = np.repeat([1,2,3,4,5], 20)
weights = np.repeat(1/20., 20)
df.hist(by = by, weights = weights)
plt.show()

@awhan
Copy link
Author

awhan commented Mar 1, 2015

Thanks @mgdadv for the reply. Yes you understand exactly what I wanted to achieve and yes I did guess that the weights and data size within the groups probably did not match. If this is not a bug (as I thought) could it be a feature request then?

@Twizzledrizzle
Copy link

I think #11028 will fix this

@jreback jreback added the Visualization plotting label Sep 10, 2015
@MaxGhenis
Copy link

#11028 became #11441, which was closed as stale. It'd be great to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants