We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version 0.11.0 has introduced a confusing behavior when importing data via read_fwf (and, I'm pretty sure, read_table)
import pandas as pd from cStringIO import StringIO from datetime import datetime tzlist = [1,10,20,30,60,80,100] ntz = len(tzlist) tcolspecs = [16]+[8]*ntz tcolnames = ['SST'] + ["T%03d" % z for z in tzlist[1:]] data = ''' 2009164202000 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009164203000 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009164204000 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009164205000 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009164210000 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039''' dftemp = pd.read_fwf(StringIO(data), index_col=0, header=None, names=tcolnames, widths=tcolspecs, parse_dates=True, date_parser=lambda s: datetime.strptime(s,'%Y%j%H%M%S'))
In [1]: pd.__version__ Out[1]: '0.10.1' In [2]: dftemp Out[2]: SST T010 T020 T030 T060 T080 T100 2009-06-13 20:20:00 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009-06-13 20:30:00 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009-06-13 20:40:00 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009-06-13 20:50:00 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009-06-13 21:00:00 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 In [3]: dftemp.T030 Out[3]: 2009-06-13 20:20:00 7.8372 2009-06-13 20:30:00 7.8176 2009-06-13 20:40:00 7.5889 2009-06-13 20:50:00 7.4652 2009-06-13 21:00:00 7.4905 Name: T030, dtype: float64 In [4]: dftemp.T060 Out[4]: 2009-06-13 20:20:00 6.0612 2009-06-13 20:30:00 6.0804 2009-06-13 20:40:00 6.0422 2009-06-13 20:50:00 6.0322 2009-06-13 21:00:00 6.0908 Name: T060 In [5]: dftemp.T080 Out[5]: 2009-06-13 20:20:00 5.8843 2009-06-13 20:30:00 5.8728 2009-06-13 20:40:00 5.8526 2009-06-13 20:50:00 5.8189 2009-06-13 21:00:00 5.7904 Name: T080 In [6]: dftemp.T100 Out[6]: 2009-06-13 20:20:00 5.5192 2009-06-13 20:30:00 5.4869 2009-06-13 20:40:00 5.4657 2009-06-13 20:50:00 5.4379 2009-06-13 21:00:00 5.4039 Name: T100, dtype: float64
In [1]: pd.__version__ Out[1]: '0.11.0' In [2]: dftemp Out[2]: SST T010 T020 T030 T060 T080 T100 2009-06-13 20:20:00 9.5403 9.4105 8.6571 7.8372 6.0612 5.8843 5.5192 2009-06-13 20:30:00 9.5435 9.2010 8.6167 7.8176 6.0804 5.8728 5.4869 2009-06-13 20:40:00 9.5873 9.1326 8.4694 7.5889 6.0422 5.8526 5.4657 2009-06-13 20:50:00 9.5810 9.0896 8.4009 7.4652 6.0322 5.8189 5.4379 2009-06-13 21:00:00 9.6034 9.0897 8.3822 7.4905 6.0908 5.7904 5.4039 In [3]: dftemp.T030 Out[3]: 2009-06-13 20:20:00 7.8372 2009-06-13 20:30:00 7.8176 2009-06-13 20:40:00 7.5889 2009-06-13 20:50:00 7.4652 2009-06-13 21:00:00 7.4905 Name: T030, dtype: float64 In [4]: dftemp.T060 Out[4]: Empty DataFrame Columns: [SST, T010, T020, T030, T060, T080, T100] Index: [] In [5]: dftemp.T080 Out[5]: Empty DataFrame Columns: [SST, T010, T020, T030, T060, T080, T100] Index: [] In [6]: dftemp.T100 Out[6]: 2009-06-13 20:20:00 5.5192 2009-06-13 20:30:00 5.4869 2009-06-13 20:40:00 5.4657 2009-06-13 20:50:00 5.4379 2009-06-13 21:00:00 5.4039 Name: T100, dtype: float64
No matter how many columns I've been importing (up to 32 in some cases), it seems like it is always the 5th and 6th columns getting hit.
The text was updated successfully, but these errors were encountered:
This turned out to be a really esoteric case! In 0.11, we enabled time-series slicing via string shortcuts
The following is a normal selection via string based time-series slicingthe
In [7]: df = DataFrame(randn(10,2),index=date_range('20600101',periods=10,freq='Q')) In [8]: df Out[8]: 0 1 2060-03-31 0.805302 1.483985 2060-06-30 -0.606980 -0.722344 2060-09-30 -0.715166 -0.065888 2060-12-31 0.086402 2.754483 2061-03-31 -0.578437 -0.262175 2061-06-30 1.016819 -1.214755 2061-09-30 0.691813 0.954097 2061-12-31 0.757279 1.573809 2062-03-31 -0.666067 -1.630134 2062-06-30 0.071744 -0.988124 In [9]: df['2060'] Out[9]: 0 1 2060-03-31 0.805302 1.483985 2060-06-30 -0.606980 -0.722344 2060-09-30 -0.715166 -0.065888 2060-12-31 0.086402 2.754483
However, for some really weird reason, 'T060' (and only that) parses out to a equivalent to '2060' must be a shortcut some how
In [10]: df['T060'] Out[10]: 0 1 2060-03-31 0.805302 1.483985 2060-06-30 -0.606980 -0.722344 2060-09-30 -0.715166 -0.065888 2060-12-31 0.086402 2.754483
I fixed this another way, in any event
thanks for the report
@wesm take a look at this!
Sorry, something went wrong.
closed by #3597
Successfully merging a pull request may close this issue.
version 0.11.0 has introduced a confusing behavior when importing data via read_fwf (and, I'm pretty sure, read_table)
With version 0.10.1
and, with version 0.11.0
No matter how many columns I've been importing (up to 32 in some cases), it seems like it is always the 5th and 6th columns getting hit.
The text was updated successfully, but these errors were encountered: