BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869

jzwinck · 2016-08-01T15:36:54Z

This code all "works," but in a surprising way:

import pandas as pd
t0 = 1234567890123400000
df1 = pd.DataFrame(index=pd.DatetimeIndex([t0, t0 + 1000, t0 + 2000, t0 + 3000]))
df2 = pd.DataFrame(index=range(4))
df1.loc[:2, 'a'] = np.arange(2)
df2.loc[:2, 'a'] = np.arange(3)

We create df1 with a DatetimeIndex and df2 with an integer index. We then create a new column a in each, using .loc[] with an integer slice. With df1 we get the intuitive, normal Python slice behavior where [:2] means "the first 2 elements", whereas with df2 we get the bizarre—but documented—DataFrame.loc slice behavior where [:2] means "elements up to index 2, inclusive."

I don't see why the type of index the DataFrame has should affect the semantics of slicing with .loc[]. I happen to think the exclusive-end behavior is correct in all cases, though apparently Pandas has decided (or at least documented) that .loc[] slicing is inclusive (in which case the DatetimeIndex case looks like a bug).

Also note that trying to "read" df1.loc[:2, 'a'] (e.g. to print it) fails, saying:

TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'>
    with these indexers [2] of <class 'int'>

It's sort of strange that you can assign to this slice but not read from it.

I'm using Pandas 0.18.1.

The text was updated successfully, but these errors were encountered:

jreback · 2016-08-01T21:09:02Z

this shouldn't allow the assignment at all. a slice of :2 is not valid for a datetimeindex.

This is a bit deep in the code.

some of the setting validation is not nearly as well tested as the getting logic. You are welcome to take a stab.

jzwinck · 2016-08-01T21:34:29Z

@jreback I thought you might say that. I am OK with it if you're sure this shouldn't be allowed, though I do find the syntax convenient so if you have another concise way to do the same thing in a more supported way I would love to know what that is.

This is not at all an area I'm familiar with so I'm not planning to try to fix it myself. Just wanted to be clear so nobody waits for me on this.

jreback · 2016-08-03T10:15:24Z

well you an use partial string indexing and .loc or

pandas by definition aligns things, you need to work with it, rather than fight it.

In [10]: df1['a'] = Series(np.arange(2),index=df1.index[:2])

In [11]: df1
Out[11]: 
                              a
2009-02-13 23:31:30.123400  0.0
2009-02-13 23:31:30.123401  1.0
2009-02-13 23:31:30.123402  NaN
2009-02-13 23:31:30.123403  NaN

Or this

In [17]: df1['a'] = np.array([0,1,np.nan,np.nan])

In [18]: df1
Out[18]: 
                              a
2009-02-13 23:31:30.123400  0.0
2009-02-13 23:31:30.123401  1.0
2009-02-13 23:31:30.123402  NaN
2009-02-13 23:31:30.123403  NaN

using list-like indexers is not providing any aligning information, so you MUST fully fill out the array. Generally using Series is much more convenient.

jzwinck · 2016-08-03T14:35:07Z

@jreback Thanks, your example using Series is the best alternative I have seen. But it is not equivalent to what I was doing, because it overwrites the entire column, even for indexes I did not want to overwrite. For example, if you run my original code and then:

df1['a'] = pd.Series(np.arange(2), index=df1.index[2:]) # last 2, not first 2

You end up with [NaN, NaN, 0, 1]--it dropped the existing values from the first two rows!

Using Series also fails if there are duplicate values in the index ("cannot reindex from a duplicate axis"), whereas integer-based slicing is not sensitive to that.

This sort of thing comes up somewhat frequently, and I think ultimately the problem arises from the fact that loc and iloc require that all the indexers are either labels or integers. In this example what I really want is semantically more like one of these hypothetical syntaxes:

df1.iloc[:2].loc['b'] =  np.arange(2)
df1.xloc[:2, 'b'] =  np.arange(2) # like iloc for first argument, loc for second

That is, I often want a way to use integer-based indexing of rows, but name-based indexing of columns. For those of us who use Pandas in concert with other NumPy-based libraries, this use case is fairly common.

The best working alternative I have come up with so far is to drop down to NumPy like this:

df1['a'].values[:2] = np.arange(2)

If the column exists already, this does what I want. But I imagine you don't intend for people to drop down to NumPy to do something simple like this.

shoyer · 2016-08-03T16:23:01Z

@jzwinck You might try the ix indexer, which does mixed labeled/integer indexing (though with some problematic fallback logic).

I usually prefer something more explicit, e.g., df.loc[df.index[:2], 'b'] = np.arange(2). You can also use methods like Index.get_loc to go from labels to integers.

jreback · 2016-08-03T17:15:37Z

@shoyer is right. This is already a well-defined, very convenient way in pandas. Missing position-and-label based must be very explict on purpose.

jzwinck · 2016-08-04T17:48:14Z

@shoyer and @jreback Thanks for the idea to use df.loc[df.index[:2]]. Unfortunately that doesn't work in some cases--I assume it's a bug so I filed #13908.

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 1, 2016

jreback added this to the Next Major Release milestone Aug 1, 2016

jreback added Difficulty Advanced labels Aug 1, 2016

jzwinck closed this as completed May 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869

BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869

jzwinck commented Aug 1, 2016 •

edited

Loading

jreback commented Aug 1, 2016

Uh oh!

jzwinck commented Aug 1, 2016

Uh oh!

jreback commented Aug 3, 2016

Uh oh!

jzwinck commented Aug 3, 2016 •

edited

Loading

Uh oh!

shoyer commented Aug 3, 2016

Uh oh!

jreback commented Aug 3, 2016

Uh oh!

jzwinck commented Aug 4, 2016

Uh oh!

Uh oh!

BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869

BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869

Comments

jzwinck commented Aug 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jreback commented Aug 1, 2016

Uh oh!

jzwinck commented Aug 1, 2016

Uh oh!

jreback commented Aug 3, 2016

Uh oh!

jzwinck commented Aug 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Aug 3, 2016

Uh oh!

jreback commented Aug 3, 2016

Uh oh!

jzwinck commented Aug 4, 2016

Uh oh!

jzwinck commented Aug 1, 2016 •

edited

Loading

jzwinck commented Aug 3, 2016 •

edited

Loading