Skip to content

CSV reader inconsistent with combination of QUOTE_NONNUMERIC and escapechar #113785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
serhiy-storchaka opened this issue Jan 6, 2024 · 3 comments · Fixed by #122110
Closed

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Jan 6, 2024

>>> next(csv.reader([r'2\.5'], escapechar='\\', quoting=csv.QUOTE_NONNUMERIC))
[2.5]
>>> next(csv.reader([r'.5'], escapechar='\\', quoting=csv.QUOTE_NONNUMERIC))
[0.5]
>>> next(csv.reader([r'\.5'], escapechar='\\', quoting=csv.QUOTE_NONNUMERIC))
['.5']

It parses numbers with an escaped character in the middle, it parses numbers starting with a dot, but it does not parse numbers starting with an escaped character. The following variants would be more consistent:

  1. Any escaped character disables parsing field as a number.
  2. Allow parsing a field starting with an escaped character as a number.

Linked PRs

@erlend-aasland erlend-aasland changed the title CVS reader inconsistent with combination of QUOTE_NONNUMERIC and escapechar CSV reader inconsistent with combination of QUOTE_NONNUMERIC and escapechar Jan 6, 2024
@MKuranowski
Copy link
Contributor

MKuranowski commented Jan 23, 2024

I don't know which behavior is the expected one?

To go with the 1st option (any escapes prevent conversion), a self->numeric_field = 0; needs to be inserted at:

case ESCAPED_CHAR:

The 2nd option requires if (dialect->quoting == QUOTE_NONNUMERIC) numeric_field = 1; added in here:

else if (c == dialect->escapechar) {

@serhiy-storchaka
Copy link
Member Author

I think why the escape character can be used at first place. QUOTE_NONNUMERIC does not affect escaping. If the character conflicts with one of special characters, it will be escaped, number it or not. If in future we will add support of using comma as decimal separator, there will be larger chance of conflict with field delimiter, so they should be escaped. Also, escapes can be necessary use some weird dialect with . or - as a field delimiter.

So I think that option 2 is more preferable.

@serhiy-storchaka
Copy link
Member Author

@MKuranowski, do you mind to create a PR?

MKuranowski added a commit to MKuranowski/cpython that referenced this issue Jul 24, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2024
serhiy-storchaka pushed a commit that referenced this issue Jul 25, 2024
serhiy-storchaka pushed a commit that referenced this issue Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants