numpy FutureWarning in df.equals #8509

jorisvandenbossche · 2014-10-08T19:42:39Z

From the example in the v0.13.1 whatsnew docs (http://pandas.pydata.org/pandas-docs/version/0.15.0/whatsnew.html#id12):

df = DataFrame({'col':['foo', 0, np.nan]}).sort()
df2 = DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
df.equals(df)

gives:

In [3]: df.equals(df)
C:\Anaconda\lib\site-packages\numpy\core\numeric.py:2367: FutureWarning: numpy e
qual will not check object identity in the future. The comparison did not return
 the same result as suggested by the identity (`is`)) and will change.
  return bool(asarray(a1 == a2).all())
Out[3]: True

The example in basics.rst section does not have this warning (http://pandas.pydata.org/pandas-docs/version/0.15.0/basics.html#comparing-if-objects-are-equivalent):

df = DataFrame({'one' : Series(randn(3), index=['a', 'b', 'c']),
                'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
                'three' : Series(randn(3), index=['b', 'c', 'd'])})
(df+df).equals(df*2)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2014-10-08T19:43:53Z

cc @unutbu

jreback · 2014-10-08T19:46:02Z

this has to do with the object dtype equality testing (only happens in 1.9), before that it is ok.

these warning are ALL over (because numpy put them in). Their is an issue about this ....

jreback · 2014-10-08T19:56:37Z

see here: #7065
and the linked numpy issue (which I ultimately closed as they didn't/couldn't fix it)

Its really a dumb warning

>>> import numpy as np
>>> np.__version__
'1.9.0'
>>> left = np.array([['foo', 0, np.nan]], dtype=object)
>>> right = np.array([['foo', 0, np.nan]], dtype=object)
>>> left == right
__main__:1: FutureWarning: numpy equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
array([[ True,  True,  True]], dtype=bool)

The only way around AFAICT is to essentially do is iterate and compare but will be slow and this routine should not be slow. We might have to do a cython routine for this (rather than a vectorized method)

jorisvandenbossche · 2014-10-08T20:03:59Z

@jreback ok, hadn't seen the other issue. Then this can be closed? (#7065 is still open).

I am not familiar with the internals of equals, but I understand that it are the nans that cause the problem. It isn't a possibility to mask the nans before doing the left == right?

jreback · 2014-10-08T20:06:40Z

you need to element-wise compare first (otherwise no point in nan detection), which is the rhs of that expression.

I think numpy is just doing something dumb.

yes, let's close this one. If its a problem in the docs, maybe just make the example non-object dtype.

jorisvandenbossche · 2014-10-08T20:07:59Z

It's not really a problem (the warning only shows when building, not in the output docs), and can indeed just change the example.

jorisvandenbossche · 2014-10-08T20:09:07Z

Ah, by the way, I think there was something wrong in the example: it did df.equals(df) while this should more logically be df.equals(df2), which does not give the warning

unutbu · 2014-10-08T20:09:47Z

If I recall correctly the warning is instigated by the use of (left == right) in core/common.py/array_equivalent. But the change in NumPy behavior will not affect array_equivalent because we use (left == right) in conjunction with

((left == right) | (pd.isnull(left) & pd.isnull(right))).all()

So the null case is already handled. Shall be quash this error message with warnings.filterwarnings as in:

import warnings
warnings.filterwarnings('ignore', 'numpy equal will not check object identity in the future')
import numpy as np
import pandas as pd
df = pd.DataFrame({'col':['foo', 0, np.nan]}).sort()
df2 = pd.DataFrame({'col':[np.nan, 0, 'foo']}, index=[2,1,0])
print(df.equals(df))

(which prints only True.)

jreback · 2014-10-08T20:18:24Z

yes I think we can just do that!

submit for the linked issue though

jorisvandenbossche added this to the 0.15.1 milestone Oct 8, 2014

jreback added the Compat pandas objects compatability with Numpy or Python functions label Oct 8, 2014

jreback closed this as completed Oct 8, 2014

jreback mentioned this issue Oct 8, 2014

DEPR: seems numpy is deprecating object array bool comparison w/nan in 1.9 #7065

Closed

jreback modified the milestones: 0.15.2, 0.15.1 Oct 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

numpy FutureWarning in df.equals #8509

numpy FutureWarning in df.equals #8509

jorisvandenbossche commented Oct 8, 2014

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

unutbu commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

Uh oh!

numpy FutureWarning in df.equals #8509

numpy FutureWarning in df.equals #8509

Comments

jorisvandenbossche commented Oct 8, 2014

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

jorisvandenbossche commented Oct 8, 2014

Uh oh!

unutbu commented Oct 8, 2014

Uh oh!

jreback commented Oct 8, 2014

Uh oh!