-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: DataFrame.loc[i:j] works differently for DatetimeIndex #13869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this shouldn't allow the assignment at all. a slice of This is a bit deep in the code. some of the setting validation is not nearly as well tested as the getting logic. You are welcome to take a stab. |
@jreback I thought you might say that. I am OK with it if you're sure this shouldn't be allowed, though I do find the syntax convenient so if you have another concise way to do the same thing in a more supported way I would love to know what that is. This is not at all an area I'm familiar with so I'm not planning to try to fix it myself. Just wanted to be clear so nobody waits for me on this. |
well you an use partial string indexing and pandas by definition aligns things, you need to work with it, rather than fight it.
Or this
using list-like indexers is not providing any aligning information, so you MUST fully fill out the array. Generally using Series is much more convenient. |
@jreback Thanks, your example using Series is the best alternative I have seen. But it is not equivalent to what I was doing, because it overwrites the entire column, even for indexes I did not want to overwrite. For example, if you run my original code and then:
You end up with Using Series also fails if there are duplicate values in the index ("cannot reindex from a duplicate axis"), whereas integer-based slicing is not sensitive to that. This sort of thing comes up somewhat frequently, and I think ultimately the problem arises from the fact that
That is, I often want a way to use integer-based indexing of rows, but name-based indexing of columns. For those of us who use Pandas in concert with other NumPy-based libraries, this use case is fairly common. The best working alternative I have come up with so far is to drop down to NumPy like this:
If the column exists already, this does what I want. But I imagine you don't intend for people to drop down to NumPy to do something simple like this. |
@jzwinck You might try the I usually prefer something more explicit, e.g., |
@shoyer is right. This is already a well-defined, very convenient way in pandas. Missing position-and-label based must be very explict on purpose. |
Uh oh!
There was an error while loading. Please reload this page.
This code all "works," but in a surprising way:
We create
df1
with a DatetimeIndex anddf2
with an integer index. We then create a new columna
in each, using.loc[]
with an integer slice. Withdf1
we get the intuitive, normal Python slice behavior where[:2]
means "the first 2 elements", whereas withdf2
we get the bizarre—but documented—DataFrame.loc
slice behavior where[:2]
means "elements up to index 2, inclusive."I don't see why the type of index the DataFrame has should affect the semantics of slicing with
.loc[]
. I happen to think the exclusive-end behavior is correct in all cases, though apparently Pandas has decided (or at least documented) that.loc[]
slicing is inclusive (in which case the DatetimeIndex case looks like a bug).Also note that trying to "read"
df1.loc[:2, 'a']
(e.g. to print it) fails, saying:It's sort of strange that you can assign to this slice but not read from it.
I'm using Pandas 0.18.1.
The text was updated successfully, but these errors were encountered: