Skip to content

numpy FutureWarning in df.equals #8509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Oct 8, 2014 · 9 comments
Closed

numpy FutureWarning in df.equals #8509

jorisvandenbossche opened this issue Oct 8, 2014 · 9 comments
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

From the example in the v0.13.1 whatsnew docs (http://pandas.pydata.org/pandas-docs/version/0.15.0/whatsnew.html#id12):

df = DataFrame({'col':['foo', 0, np.nan]}).sort()
df2 = DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
df.equals(df)

gives:

In [3]: df.equals(df)
C:\Anaconda\lib\site-packages\numpy\core\numeric.py:2367: FutureWarning: numpy e
qual will not check object identity in the future. The comparison did not return
 the same result as suggested by the identity (`is`)) and will change.
  return bool(asarray(a1 == a2).all())
Out[3]: True

The example in basics.rst section does not have this warning (http://pandas.pydata.org/pandas-docs/version/0.15.0/basics.html#comparing-if-objects-are-equivalent):

df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
                'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
                'three' : Series(randn(3), index=['b', 'c', 'd'])})
(df+df).equals(df*2)
@jorisvandenbossche
Copy link
Member Author

cc @unutbu

@jorisvandenbossche jorisvandenbossche added this to the 0.15.1 milestone Oct 8, 2014
@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

this has to do with the object dtype equality testing (only happens in 1.9), before that it is ok.

these warning are ALL over (because numpy put them in). Their is an issue about this ....

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

see here: #7065
and the linked numpy issue (which I ultimately closed as they didn't/couldn't fix it)

Its really a dumb warning

>>> import numpy as np
>>> np.__version__
'1.9.0'
>>> left = np.array([['foo', 0, np.nan]], dtype=object)
>>> right = np.array([['foo', 0, np.nan]], dtype=object)
>>> left == right
__main__:1: FutureWarning: numpy equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
array([[ True,  True,  True]], dtype=bool)

The only way around AFAICT is to essentially do is iterate and compare but will be slow and this routine should not be slow. We might have to do a cython routine for this (rather than a vectorized method)

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Oct 8, 2014
@jorisvandenbossche
Copy link
Member Author

@jreback ok, hadn't seen the other issue. Then this can be closed? (#7065 is still open).

I am not familiar with the internals of equals, but I understand that it are the nans that cause the problem. It isn't a possibility to mask the nans before doing the left == right?

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

you need to element-wise compare first (otherwise no point in nan detection), which is the rhs of that expression.

I think numpy is just doing something dumb.

yes, let's close this one. If its a problem in the docs, maybe just make the example non-object dtype.

@jorisvandenbossche
Copy link
Member Author

It's not really a problem (the warning only shows when building, not in the output docs), and can indeed just change the example.

@jorisvandenbossche
Copy link
Member Author

Ah, by the way, I think there was something wrong in the example: it did df.equals(df) while this should more logically be df.equals(df2), which does not give the warning

@unutbu
Copy link
Contributor

unutbu commented Oct 8, 2014

If I recall correctly the warning is instigated by the use of (left == right) in core/common.py/array_equivalent. But the change in NumPy behavior will not affect array_equivalent because we use (left == right) in conjunction with

((left == right) | (pd.isnull(left) & pd.isnull(right))).all()

So the null case is already handled. Shall be quash this error message with warnings.filterwarnings as in:

import warnings
warnings.filterwarnings('ignore', 'numpy equal will not check object identity in the future')
import numpy as np
import pandas as pd
df = pd.DataFrame({'col':['foo', 0, np.nan]}).sort()
df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
print(df.equals(df))

(which prints only True.)

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

yes I think we can just do that!

submit for the linked issue though

@jreback jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants