-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Generalize cut/melt to handle datetime input #6582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you post an example input and output? How would this be different from a resample? |
For example, say I have two columns of data, the first is timestamps and the second is temperature. The data is sampled unevenly. I was hoping to use |
I don't think resample can do equal sized bins. Do you mind editing an example to the original question, I think that'll make it clearer. |
Took a shot at it. |
You just want something like this?
|
I believe your example only gives the desired behavior when the data is already in chronological order. Part of what I'd hoped to get from |
The problem with |
I'm not sure I follow. In my example above, the data is intentionally sampled pretty unevenly. (Combination of daily, 9hourly and 25minutely data). So I would have thought thinking of time as a continuous variable to bin was appropriate in that context. |
Here is what I would expect if pd.cut handled datetime64[ns](except I don't print the left-right bins),
|
You would need some sort of 'rounding' I think
|
Perhaps it would help to provide a use case for when you'd want this. I think I'm with Jeff here, a priori it's a strange result... (even with rounding) |
though I think this should at least have a better error-message.... |
I'll see if I can think of another case, but I'd put out there that I think Jeff's two results above are quite intuitive and useful. Put another way, up above someone said that |
@8one6 not a problem with this result, maybe could just automatically round to second precision (in theory should pass this in). not hard to do. want to do a PR for this then? get's you into some juicy internals! |
@jreback I've never actually contributed to a major project like this, and am very shaky on Git in general. Best place to start? Best ways to avoid making people angry with potentially-shaky code? I assume I'd need to write a test or two? (Never done that.) |
https://github.com/pydata/pandas/wiki clone the repo look for where the tests for cut are now devise a simple test/tests that show the new behavior run the tests, they should fail do a fix run until tests pass (and others don't break) submit a PR just post hear if you need help |
So I was being a bit silly on this. I think I meant |
same idea (and i think fix will work for both those function) e.g. test for both |
|
see also #6434
Originally from:
http://stackoverflow.com/questions/22286930/is-it-possible-to-use-cut-on-a-collection-of-datetimes?noredirect=1#comment33862629_22286930
Would it be possible to enhance the cut function to handle datetime inputs?
For example, say I have the following setup:
I would like to split up
bigframe
into 7 "bins" and explore the properties within each bin. I'd like the bins to be sequential and for each to contain (approxmiately) the same number of data points.Ideally, I'd be able to do something like:
I think I can get something like that by doing:
but that doesn't seem particularly elegant. And it winds up with bins labelled with the datetime values instead of the more-nicely-readable datetimes themselves.
The text was updated successfully, but these errors were encountered: