Skip to content

Wrong column for invalid string prefix #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gvanrossum opened this issue Apr 25, 2020 · 13 comments
Closed

Wrong column for invalid string prefix #121

gvanrossum opened this issue Apr 25, 2020 · 13 comments
Assignees
Labels
3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release

Comments

@gvanrossum
Copy link

Example input:

def foo():
    '''

def bar():
    pass

def baz():
    '''quux'''

Expected output:

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8
    
def bar():
    pass

def baz():
    '''quux'''
    ^
SyntaxError: invalid string prefix

Actual (with new parser):

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8
    
                                        ^
@ammaraskar
Copy link

Seems to be a general problem with errors from the tokenizer.

Before:

>>> a = \!
  File "<stdin>", line 1
    a = \!
         ^
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
    😃 = 1
    ^
SyntaxError: invalid character in identifier

After:

>>> a = \!
  File "<stdin>", line 1
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
SyntaxError: invalid character in identifier

@ammaraskar
Copy link

So the root cause of this is that the offset field in SyntaxError can be either a column offset or an offset from the start of the file. The offset calculation code as taken from here:

cpython/Parser/parsetok.c

Lines 422 to 428 in 14ab84b

err_ret->offset = col_offset != -1 ? col_offset + 1 : ((int)(tok->cur - tok->buf));
len = tok->inp - tok->buf;
err_ret->text = (char *) PyObject_MALLOC(len + 1);
if (err_ret->text != NULL) {
if (len > 0)
strncpy(err_ret->text, tok->buf, len);
err_ret->text[len] = '\0';

is based on the file offset when it does tok->cur - tok->buf, consequently the err_ret->text it uses is the entire file up till where the error occurred. If we want to provide only the line the error occurs on to SyntaxError like we currently do:

if (p->start_rule == Py_file_input) {
loc = PyErr_ProgramTextObject(p->tok->filename, p->tok->lineno);
}
if (!loc) {
loc = get_error_line(p->tok->buf);
}

then the calculation needs to be adjusted so that it's relative to that line. Alternatively, we can pass the entire file up till the error to the SyntaxError to maintain consistency with what's been done so far.

@gvanrossum
Copy link
Author

If the col offset is ever from the start of the file that's a mistake.

@isidentical
Copy link
Collaborator

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

@lysnikolaou
Copy link
Member

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

No particular reason, except that last week everything needed to move fast and so I probably didn't think too hard about alternative ways to do things, actually I just did what seemed easier.

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

@isidentical
Copy link
Collaborator

isidentical commented Apr 27, 2020

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6, beta 1.

@lysnikolaou
Copy link
Member

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

That'd be great!

@ammaraskar
Copy link

ammaraskar commented Apr 27, 2020

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

Thanks for working on this! Just a tip, the existing column numbers are generally based off this rule.

cpython/Parser/parsetok.c

Lines 419 to 421 in 14ab84b

/* if we've managed to parse a token, point the offset to its start,
* else use the current reading position of the tokenizer
*/

@lysnikolaou lysnikolaou added the 3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release label Apr 28, 2020
@lysnikolaou
Copy link
Member

Fixed in python#19782.

@pablogsal pablogsal reopened this May 1, 2020
@pablogsal
Copy link

Fixed in python#19782.

Not completely, we are still missing the offsets that come from the AST

@lysnikolaou
Copy link
Member

Are those related to invalid string prefixes?

@pablogsal
Copy link

Are those related to invalid string prefixes?

No, they are just some incorrect column offsets. Are we tracking those in a different issue?

@pablogsal
Copy link

Oh, we are: #65

Sorry, we can indeed close this issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release
Projects
None yet
Development

No branches or pull requests

5 participants