Wrong column for invalid string prefix #121

gvanrossum · 2020-04-25T00:45:24Z

Example input:

def foo():
    '''

def bar():
    pass

def baz():
    '''quux'''

Expected output:

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8
    
def bar():
    pass

def baz():
    '''quux'''
    ^
SyntaxError: invalid string prefix

Actual (with new parser):

  File "/Users/guido/v39/lib/python3.9/site-packages/pyflakes/test/t.py", line 8
    
                                        ^

The text was updated successfully, but these errors were encountered:

ammaraskar · 2020-04-27T00:42:12Z

Seems to be a general problem with errors from the tokenizer.

Before:

>>> a = \!
  File "<stdin>", line 1
    a = \!
         ^
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
    😃 = 1
    ^
SyntaxError: invalid character in identifier

After:

>>> a = \!
  File "<stdin>", line 1
SyntaxError: unexpected character after line continuation character
>>> 😃 = 1
  File "<stdin>", line 1
SyntaxError: invalid character in identifier

ammaraskar · 2020-04-27T07:05:06Z

So the root cause of this is that the offset field in SyntaxError can be either a column offset or an offset from the start of the file. The offset calculation code as taken from here:

cpython/Parser/parsetok.c

Lines 422 to 428 in 14ab84b

    
           err_ret->offset = col_offset != -1 ? col_offset + 1 : ((int)(tok->cur - tok->buf)); 
        
           len = tok->inp - tok->buf; 
        
           err_ret->text = (char *) PyObject_MALLOC(len + 1); 
        
           if (err_ret->text != NULL) { 
        
               if (len > 0) 
        
                   strncpy(err_ret->text, tok->buf, len); 
        
               err_ret->text[len] = '\0';

is based on the file offset when it does tok->cur - tok->buf, consequently the err_ret->text it uses is the entire file up till where the error occurred. If we want to provide only the line the error occurs on to SyntaxError like we currently do:

cpython/Parser/pegen/pegen.c

Lines 258 to 263 in 14ab84b

    
           if (p->start_rule == Py_file_input) { 
        
               loc = PyErr_ProgramTextObject(p->tok->filename, p->tok->lineno); 
        
           } 
        
           if (!loc) { 
        
               loc = get_error_line(p->tok->buf); 
        
           }

then the calculation needs to be adjusted so that it's relative to that line. Alternatively, we can pass the entire file up till the error to the SyntaxError to maintain consistency with what's been done so far.

gvanrossum · 2020-04-27T15:46:04Z

If the col offset is ever from the start of the file that's a mistake.

isidentical · 2020-04-27T18:09:58Z

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

lysnikolaou · 2020-04-27T18:21:47Z

Is there a special reason to use tokenizer_error_with_col_offset? I've made some changes to _PyPegen_raise_error and replaced all usages of tokenizer_error_with_col_offset calls with proper RAISE_SYNTAX_ERROR/RAISE_INDENTATION_ERROR and it works smoothly.

No particular reason, except that last week everything needed to move fast and so I probably didn't think too hard about alternative ways to do things, actually I just did what seemed easier.

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

isidentical · 2020-04-27T18:46:47Z

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the ~~alpha 6~~, beta 1.

lysnikolaou · 2020-04-27T18:49:40Z

Thanks a lot for taking the time to look into this! 😃 Would you be able to open a PR with your code? This really sounds like a good improvement.

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

That'd be great!

ammaraskar · 2020-04-27T18:56:58Z

Yes, but @pablogsal suggested that I should fix all errors in that syntax error test (in test_exceptions) in once, so still working on. I'll make it ready ASAP before the alpha 6.

Thanks for working on this! Just a tip, the existing column numbers are generally based off this rule.

cpython/Parser/parsetok.c

Lines 419 to 421 in 14ab84b

    
                       /* if we've managed to parse a token, point the offset to its start, 
        
                        * else use the current reading position of the tokenizer 
        
                        */

lysnikolaou · 2020-05-01T15:52:07Z

Fixed in python#19782.

pablogsal · 2020-05-01T16:03:43Z

Fixed in python#19782.

Not completely, we are still missing the offsets that come from the AST

lysnikolaou · 2020-05-01T16:13:30Z

Are those related to invalid string prefixes?

pablogsal · 2020-05-01T16:48:07Z

Are those related to invalid string prefixes?

No, they are just some incorrect column offsets. Are we tracking those in a different issue?

pablogsal · 2020-05-01T16:48:39Z

Oh, we are: #65

Sorry, we can indeed close this issue :)

gvanrossum assigned lysnikolaou Apr 25, 2020

ammaraskar mentioned this issue Apr 27, 2020

Should we report col offset in some tokenizer errors? #110

Closed

lysnikolaou added the 3.9.0b1 Needs to be fixed/implemented until the 3.9.0b1 release label Apr 28, 2020

lysnikolaou closed this as completed May 1, 2020

pablogsal reopened this May 1, 2020

pablogsal closed this as completed May 1, 2020

ammaraskar mentioned this issue May 15, 2020

bpo-40612: Fix SyntaxError edge cases in traceback formatting python/cpython#20072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong column for invalid string prefix #121

Wrong column for invalid string prefix #121

gvanrossum commented Apr 25, 2020

ammaraskar commented Apr 27, 2020

Uh oh!

ammaraskar commented Apr 27, 2020

Uh oh!

gvanrossum commented Apr 27, 2020

Uh oh!

isidentical commented Apr 27, 2020

Uh oh!

lysnikolaou commented Apr 27, 2020

Uh oh!

isidentical commented Apr 27, 2020 •

edited

Loading

Uh oh!

lysnikolaou commented Apr 27, 2020

Uh oh!

ammaraskar commented Apr 27, 2020 •

edited

Loading

Uh oh!

lysnikolaou commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

lysnikolaou commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

Wrong column for invalid string prefix #121

Wrong column for invalid string prefix #121

Comments

gvanrossum commented Apr 25, 2020

ammaraskar commented Apr 27, 2020

Uh oh!

ammaraskar commented Apr 27, 2020

Uh oh!

gvanrossum commented Apr 27, 2020

Uh oh!

isidentical commented Apr 27, 2020

Uh oh!

lysnikolaou commented Apr 27, 2020

Uh oh!

isidentical commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lysnikolaou commented Apr 27, 2020

Uh oh!

ammaraskar commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lysnikolaou commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

lysnikolaou commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

pablogsal commented May 1, 2020

Uh oh!

isidentical commented Apr 27, 2020 •

edited

Loading

ammaraskar commented Apr 27, 2020 •

edited

Loading