-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates #38334
Changes from 1 commit
8c2e1ca
c1b9a7b
76b91bf
9de5059
ca76832
85a3d22
2958d2a
1d740e0
88bf395
6ad5385
da4b602
384c114
070f67d
70780da
9d6205a
4afa2c8
5bee24a
d96a256
c6a226b
3ece01b
6cca960
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2354,12 +2354,16 @@ def _set_no_thousands_columns(self): | |
# Create a set of column ids that are not to be stripped of thousands | ||
# operators. | ||
noconvert_columns = set() | ||
if self._col_indices is not None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sort _col_indices when its created. I would try actually just setting it to the list(range(len(self.columns))) otherwise when its set (rather than doing it here). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved it up and sorted when set immediately |
||
col_indices = sorted(self._col_indices) | ||
else: | ||
col_indices = list(range(len(self.columns))) | ||
|
||
def _set(x): | ||
if is_integer(x): | ||
noconvert_columns.add(x) | ||
else: | ||
noconvert_columns.add(self.columns.index(x)) | ||
noconvert_columns.add(col_indices[self.columns.index(x)]) | ||
|
||
if isinstance(self.parse_dates, list): | ||
for val in self.parse_dates: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,7 @@ | |
|
||
from pandas.errors import ParserError | ||
|
||
from pandas import DataFrame, Index, MultiIndex | ||
from pandas import DataFrame, Index, MultiIndex, Timestamp | ||
import pandas._testing as tm | ||
|
||
|
||
|
@@ -314,3 +314,19 @@ def test_malformed_skipfooter(python_parser_only): | |
msg = "Expected 3 fields in line 4, saw 5" | ||
with pytest.raises(ParserError, match=msg): | ||
parser.read_csv(StringIO(data), header=1, comment="#", skipfooter=1) | ||
|
||
|
||
def test_delimiter_with_usecols_and_parse_dates(python_parser_only): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesn't this not work in c-parser? why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Works for c too, not quite sure why I tested only for python. Moved it |
||
# GH#35873 | ||
result = python_parser_only.read_csv( | ||
StringIO('"dump","-9,1","-9,1",20101010'), | ||
engine="python", | ||
names=["col", "col1", "col2", "col3"], | ||
usecols=["col1", "col2", "col3"], | ||
parse_dates=["col3"], | ||
decimal=",", | ||
) | ||
expected = DataFrame( | ||
{"col1": [-9.1], "col2": [-9.1], "col3": [Timestamp("2010-10-10")]} | ||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be removed off of 1.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm weird, must have missed that- Thx