-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-46091: Correctly calculate indentation levels for whitespace lines with continuation characters #30130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s with continuation characters
2591366
to
de9998a
Compare
…ce lines with continuation characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully @lysnikolaou can review the fix to the C code? It's deleting an awful lot of code.
Lib/test/test_syntax.py
Outdated
def fib(n): | ||
\ | ||
'''Print a Fibonacci series up to n.''' | ||
\ | ||
a, b = 0, 1 | ||
\ | ||
while a < n: | ||
\ | ||
print(a, end=' ') | ||
\ | ||
a, b = b, a+b | ||
\ | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is both too long and too short.
It doesn't really test anything more than a shorter example, say just the first three lines.
Moreover, it doesn't test that the code is parsed the same as the equivalent version without backslashes.
And it doesn't test the stated property quoted in bpo-46091 from the docs: "Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, it doesn't test that the code is parsed the same as the equivalent version without backslashes.
We could check the AST, but likely you want to check that is tokenized the same as the code without backslashes, but that's is a bit more complicated to test here as we don't have direct access to the C tokenizer.
And it doesn't test the stated property quoted in bpo-46091 from the docs: "Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation."
Hummmm is testing the second part at least as it's checking that the whitespace up the the \
determines the identation of the next line as opposed to something like:
pass
\
pass
where the \
and the \n
will just produce an empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have access to the tokenizer, checking that the ASTs are the same is reasonable. In fact, you can probably just use ast.unparse() and compare the output with the equivalent without backslashes.
But this reminds me. Does tokenize.py have the same behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the tokenization of:
def fib(n):
\
"""Print a Fibonacci series up to n."""
\
a, b = 0, 1
with \
:
0,0-0,0: ENCODING 'utf-8'
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'fib'
1,7-1,8: OP '('
1,8-1,9: NAME 'n'
1,9-1,10: OP ')'
1,10-1,11: OP ':'
1,11-1,12: NEWLINE '\n'
2,0-2,4: INDENT ' '
3,0-3,39: STRING '"""Print a Fibonacci series up to n."""'
3,39-3,40: NEWLINE '\n'
5,0-5,1: NAME 'a'
5,1-5,2: OP ','
5,3-5,4: NAME 'b'
5,5-5,6: OP '='
5,7-5,8: NUMBER '0'
5,8-5,9: OP ','
5,10-5,11: NUMBER '1'
5,11-5,12: NEWLINE '\n'
6,0-6,0: DEDENT ''
6,0-6,0: ENDMARKER ''
and without:
0,0-0,0: ENCODING 'utf-8'
1,0-1,3: NAME 'def'
1,4-1,7: NAME 'fib'
1,7-1,8: OP '('
1,8-1,9: NAME 'n'
1,9-1,10: OP ')'
1,10-1,11: OP ':'
1,11-1,12: NEWLINE '\n'
2,0-2,4: INDENT ' '
2,4-2,43: STRING '"""Print a Fibonacci series up to n."""'
2,43-2,44: NEWLINE '\n'
3,4-3,5: NAME 'a'
3,5-3,6: OP ','
3,7-3,8: NAME 'b'
3,9-3,10: OP '='
3,11-3,12: NUMBER '0'
3,12-3,13: OP ','
3,14-3,15: NUMBER '1'
3,15-3,16: NEWLINE '\n'
4,0-4,0: DEDENT ''
4,0-4,0: ENDMARKER ''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But unfortunately the tokenization for:
pass
\
pass
is wrong:
0,0-0,0: ENCODING 'utf-8'
1,0-1,4: NAME 'pass'
1,4-1,5: NEWLINE '\n'
2,0-2,4: INDENT ' '
3,0-3,1: NEWLINE '\n'
4,0-4,0: DEDENT ''
4,0-4,4: NAME 'pass'
4,4-4,5: NEWLINE '\n'
5,0-5,0: ENDMARKER ''
but that should be fixed separately to this PR
Is really moving it into a function and calling that function in a new place (in addition to the old place). The important part is this:
which is basically skipping over pairs of |
…hitespace lines with continuation characters
03d63f1
to
c623e38
Compare
I have pushed a bunch of tests testing the C tokenizer (I forgot I actually exposed an API for testing long ago 😅 ) and some extra corrections. Turns out this is very tricky to get right because "the backslash and the newline just need to eliminate themselves" and "the indentation is marked by the first backlash that we see" is complicated to keep in sync. |
…s for whitespace lines with continuation characters
This PR is stale because it has been open for 30 days with no activity. |
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10. |
Sorry, @pablogsal, I could not cleanly backport this to |
Sorry @pablogsal, I had trouble checking out the |
…ce lines with continuation characters (pythonGH-30130). (cherry picked from commit a0efc0c) Co-authored-by: Pablo Galindo Salgado <[email protected]>
GH-30898 is a backport of this pull request to the 3.10 branch. |
…ce lines with continuation characters (GH-30130). (GH-30898) (cherry picked from commit a0efc0c) Co-authored-by: Pablo Galindo Salgado <[email protected]>
…e lines with continuation characters (pythonGH-30130). (pythonGH-30898) (cherry picked from commit a0efc0c) Co-authored-by: Pablo Galindo Salgado <[email protected]>. (cherry picked from commit 3fc8b74) Co-authored-by: Pablo Galindo Salgado <[email protected]>
https://bugs.python.org/issue46091