-
Notifications
You must be signed in to change notification settings - Fork 0
new pandas broke correlation #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looking into it, I already did a nasty quick fix over these lines in my branch here ebe8b17 (sorry for the commit not to be properly isolated, anyway that is not a proper solution). I spotted it while working with Etienne correlation plots, but did not blame pandas for it at that moment. Right now I get this traceback that indicates a list indexing problem:
(Note that I get the error in a different line, so I guess you are looking at some modified working copy). pandas doc states that corr should ignore missing values Will keep looking at it soon. |
Mistery solved: there is nothing broken. The offending vector is constant, and correlation with a constant vector is undefined (div by 0). Therefore the correct result is nan. What I guess they have changed is the behavior or argmax: import pandas as pd
import numpy as np
argmax = pd.Series(data=[np.nan, np.nan, np.nan]).argmax() # this is now nan
max_latency = latencies[argmax] # and we cannot index with nan I will now write a fix and push |
I'd like a test too. In general, no touching this stuff without a test.
|
There are already some correlation tests in test_correlation.py. add one I'll run the test on buzz and on my desktop
|
I have written some tests in 2a97fb4. I did my best but I found hard to really test what we want to test with the current all-in-one design of plot_correlation_analysis. Some thoughts:
|
Works on buzz
Fails on new pandas
looking into the code
returns nan if any series has nan in it. depressingly
also returns nan
The text was updated successfully, but these errors were encountered: