Documentation for .iloc is misleadingly incomplete #8956

cswarth · 2014-12-01T22:33:01Z

The documentation for .loc at http://pandas.pydata.org/pandas-docs/version/0.15.1/indexing.html#different-choices-for-indexing should mention that .iloc also takes a boolean array instead of insisting that it is "strictly integer position based".

.iloc is strictly integer position based (from 0 to length-1 of the axis), will raise IndexError if an indexer is requested and it is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

An integer e.g. 5
A list or array of integers [4, 3, 0]
A slice object with ints 1:7

Unlike .loc immediately above, no mention is made of "A boolean array"
This is incorrect as later in the same document an example uses .iloc and isin to retrieve elements from a Series. isin returns a list of booleans (well, strictly it is a numpy.ndarray, but converting to a list works as well)

The text was updated successfully, but these errors were encountered:

jreback · 2014-12-01T22:55:07Z

yep that was an oversight iloc can take a boolean array

want to do a pr to add a doc example and update?

cswarth · 2014-12-02T00:13:35Z

Sure, I'll give it a shot.

TomAugspurger · 2014-12-02T13:46:21Z

Is .iloc ever necessary to index with a boolean? In the example @cswarth mentions from the isin docs, s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)] should be the same as s_mi[s_mi.index.isin(['a', 'c', 'e'], level=1)], i.e. without the .iloc, I think.

jorisvandenbossche · 2014-12-02T15:24:26Z

Yes, for the example here, the iloc is not needed / not the logical choice of indexer. So that example could be adapted.

But in general, I think the use case where this can come in handy is if you want to index multiple axes with mixed positional/boolean indexing. Eg if you want to combine iloc on the columns with boolean on the rows:

df.iloc[df.index.isin(...), 0]

cswarth · 2014-12-02T18:11:19Z

While adding a brief mention of boolean indexers to .iloc I realized I didn't really know how they worked in all cases. For example, what happens when a boolean indexer has a different shape than the axis it is indexing? What happens if the index is too short? Or too long?

Numpy is explicit, "Boolean arrays must be of the same shape as the initial dimensions of the array being indexed", but the examples below suggests that is not true. I actually don't know what the hell numpy is doing in this case - it looks totally unexpected.

pandas.Series behaves rationally, in my opinion. But is this silent 'leftmost-favored' behavior explicitly documented? If not, should it be?

x=np.arange(100,1,-1)
print(x[[True, False,  True,  True, True]])
[ 99 100  99  99  99]
-c:2: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index

from pandas import *
s_mi = Series(np.arange(100,1,-1))
print(s_mi.iloc[[True, False,  True,  True, True]])
0    100
2     98
3     97
4     96
dtype: int64

cswarth · 2014-12-02T18:18:56Z

NVM the numpy example, I realize my example was bogus b/c it passed an array as the index. This example works as documented.

import numpy as np
x=np.arange(100,1,-1)
print(x[True, False,  True,  True, True])

IndexError: too many indices for array

immerrr · 2014-12-02T18:21:37Z

On Tue, Dec 2, 2014 at 9:11 PM, cswarth [email protected] wrote:

Numpy is explicit
http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays,
"Boolean arrays must be of the same shape as the initial dimensions of the
array being indexed", but the examples below suggests that is not true. I
actually don't know what the hell numpy is doing in this case - it looks
totally unexpected.

Numpy only allows ndarrays of bool as masks, your snippet used a list of
bool. This is a common source of confusion, e.g.
http://stackoverflow.com/questions/17779468/numpy-indexing-with-a-one-dimensional-boolean-array
.

jorisvandenbossche · 2014-12-02T22:10:56Z

I rather find the pandas behaviour a bit strange here. Shouldn't the array not have the same length as the series with boolean indexing?

jreback · 2014-12-03T02:23:36Z

@jorisvandenbossche what do you find strange? the boolean array must be equal to the length of the axis

cswarth · 2014-12-03T02:28:44Z

I'm probably doing something wrong, but I dont think that's true.

from pandas import *
s = Series(np.arange(5,1,-1))
print(s)
print(s.iloc[[True, False, True]])

0    5
1    4
2    3
3    2
dtype: int64
0    5
2    3
dtype: int64

jreback · 2014-12-03T02:44:13Z

@cswarth technically that's ok but I think it should really validate the len(boolean indexer) == len(series).

numpy actually coerces the indexer which is odd (e.g. this is like indexing [1,0,1])

 [17]: s.values[[True,False,True]]
Out[17]: array([4, 5, 4])

jorisvandenbossche · 2014-12-03T08:10:29Z

@jorisvandenbossche what do you find strange? the boolean array must be equal to the length of the axis

exactly that, the boolean array does apparantly not need to be of the same length

numpy actually coerces the indexer which is odd (e.g. this is like indexing [1,0,1])

That is what @immerrr said above (#8956 (comment)), numpy only does boolean indexing with arrays, not with lists (so with the array, it is working in the same way as in pandas):

In [58]: s.values[[True,False,True]]
Out[58]: array([4, 5, 4])

In [59]: s.values[np.array([True,False,True])]
Out[59]: array([5, 3])

In [60]: s[[True, False, True]]
Out[60]:
0    5
2    3
dtype: int32

UPDATE: ah, you opened a new issue for that: #8976

jreback · 2014-12-04T00:08:20Z

closed by #8970

jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 1, 2014

jreback added this to the 0.16.0 milestone Dec 1, 2014

cswarth mentioned this issue Dec 2, 2014

docfix: add boolean array as option for indexing with .iloc #8970

Closed

jreback modified the milestones: 0.15.2, 0.16.0 Dec 3, 2014

jreback mentioned this issue Dec 3, 2014

ERR: validate boolean indexing with iloc #8976

Closed

jreback closed this as completed Dec 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Documentation for .iloc is misleadingly incomplete #8956

Documentation for .iloc is misleadingly incomplete #8956

cswarth commented Dec 1, 2014

jreback commented Dec 1, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

TomAugspurger commented Dec 2, 2014

Uh oh!

jorisvandenbossche commented Dec 2, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

immerrr commented Dec 2, 2014

Uh oh!

jorisvandenbossche commented Dec 2, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

cswarth commented Dec 3, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

jorisvandenbossche commented Dec 3, 2014

Uh oh!

jreback commented Dec 4, 2014

Uh oh!

Uh oh!

Documentation for .iloc is misleadingly incomplete #8956

Documentation for .iloc is misleadingly incomplete #8956

Comments

cswarth commented Dec 1, 2014

jreback commented Dec 1, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

TomAugspurger commented Dec 2, 2014

Uh oh!

jorisvandenbossche commented Dec 2, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

cswarth commented Dec 2, 2014

Uh oh!

immerrr commented Dec 2, 2014

Uh oh!

jorisvandenbossche commented Dec 2, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

cswarth commented Dec 3, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

jorisvandenbossche commented Dec 3, 2014

Uh oh!

jreback commented Dec 4, 2014

Uh oh!