-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Make errors due to full parser stack identifiable #55552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently, if the parser's internal stack is full (as it happens when In my opinion, a SyntaxError would be great because that way one can I have attached two patches: The first still raises a |
I agree that compile-time stack exhaustion is different from runtime object-heap exhaustion and could/should have a different error. I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof. I believe IndentationError was added since 2000. Its doc is "Base class for syntax errors related to incorrect indentation." and it has TabError as a subclass, so its use for too many indents (not really 'incorrect') is not obvious. But having the position marked (if it would be) would be a plus. I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)? |
On Fri, Mar 4, 2011 at 9:30 PM, Terry J. Reedy <[email protected]> wrote:
I added a new
Yes, I meant a read-eval-print-loop. A new error would help and if |
FWIW, this also affects |
This looks like a rather good idea, although I'm not sure it deserves a new global exception type. |
Just make it a SyntaxError. |
Oh, I see Martin disagrees. SyntaxError is raised for anything which Python won't compile. For example, too many arguments gets a SyntaxError. |
+1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly" Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError". Possible docs phrasing: exception:: SyntaxLimitError Raised when the compilation and code generation process encounters an error due to internal limitations of the current implementation (e.g. the CPython parser has an upper limit on the number of nested sets of parentheses it can handle in a single expression, but no such limitation exists in the Python language reference). |
I like SyntaxLimitError much better than ParserError. |
2011/7/16 Nick Coghlan <[email protected]>:
What is the point of a new exception? When would you ever want to Moreover, all Python implementations are going to have to place some The Python language doesn't make an mention of limited memory, but no |
It's important to remember that other implementations treat CPython as Some kinds of errors are inherently implementation specific |
2011/7/16 Nick Coghlan <[email protected]>:
Implementations are quite good at figuring this stuff out themselves. |
It also makes it clear to users whether they've just run up against a |
2011/7/16 Nick Coghlan <[email protected]>:
Why would it matter to users what variety of limitation they're |
It matters, because Python "users" are programmers, and most Similarly, the eventual fix to the recursive crash in the compiler |
2011/7/16 Nick Coghlan <[email protected]>:
I content that in normal code, it is so extremely rare as to be |
Agreed with Benjamin. Also, "why something is wrong" can simply be told |
This is a purity versus practicality argument. I agree with both sides :) |
After further reflection, I (eventually) came to the same conclusion as Benjamin and Antoine - using SyntaxError for this is fine. However, the exception *message* should make it clear that this is an implementation limit rather than a language limit. |
Oh, I thought we never rely on exception "message" for anything |
I think you're missing the point. People usually don't catch SyntaxError |
Yeah, at the level of *code* the origin doesn't really matter, so re-using the exception type is actually OK. However, for *people* seeing the stack trace, it may be useful to know what's genuinely illegal syntax and what's a limitation of the current implementation. The exception message is a good place for that additional detail. |
The patch relates to the old parser. With the new parser the 100*(+100*) example works. If we go to 1000 instead of 100 we get "SyntaxError: too many nested parentheses". From the discussion it seems that the idea of a new exception type for implementation-limits vs "real" SyntaxErrors was rejected as not useful. Is there anything left to do on this issue? |
I retract my original comment as I now agree with SyntaxError + distinct message. Given Irit's report, I think the remaining question is whether A. The implication that 'too many nested parentheses' is compiler specific is strong enough, and any thing further should be left to 3rd parties to explain, and this issue should be closed as 'out of date'. B. We should be explicit and add something like 'for this compiler'. Since Pablo has been working on improving error messages, I would give his opinion high weight. To find the actual limit, I did the following with Win 10, 3.10.0b3, IDLE: >>> def f(n):
... print(eval('('*n + ')'*n"))
...
And after some trials, found
>>> f(200)
()
>>> f(201)
Traceback (most recent call last):
File "<pyshell#56>", line 1, in <module>
f(201)
File "<pyshell#42>", line 2, in f
print(eval('('*n + ')'*n))
File "<string>", line 1
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
^
SyntaxError: too many nested parentheses The error is detected in 'string' line 1, 1-based offset 201, at the final '('. There are 200 spaces before the caret but they are not obvious in this box, which does not wrap spaces properly. The end_lineno and end_offset are None, None rather than the 1,202 I expected. |
For information: I created an actual .py file from the last example with 200 parens and got the following error message:
For consistency and greater clarity, I think that having messages such as
and possibly others (too many arguments?), would be preferable, with
if the exact cause cannot be identified at the parsing stage. |
That error can only happen with the old parser, so you must be using Python3.8 or 3.9 with the old parser. Can you confirm? |
@pablo: Sincere apologies ... I tested this with the wrong virtual environment active locally. With 3.10, it is indeed SyntaxError: too many nested parentheses |
As a piece of extra information, this limit has been there for a while: Line 14 in 91a8f8c
the problem is that the old parser died before reaching it in many situations. |
Regarding what to do, what I would love to optimize is that the message is clear. One of the challenges people may have with too specific messages is that they may not know what "a compiler" or "a parser" means (the first may be especially confusing since many people think Python is not "compiled". On the other hand, is more informative to say "too many nested parentheses" than "parser stack overflow" since many people don't know what that means, especially in Python. I |
C stack overflows and thus crashes still happen with the new parser. oss-fuzz just handed me another example of this with a 49k input that is probably about half If we had pgen generate code not relying on the C stack we could monitor resource usage. That's a more complicated change. Similarly but less exact and "good enough" for reasonable code: If we had for reference whenever that oss-fuzz issue becomes public (a few months?) https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=46699. I doubt it's specific input is special, just pointing to where in the code it happens as done above should give the idea that just creating something along the lines of |
Although I'm sure we can stress pegen to crash in similar ways, what you are reporting here is not technically the parser, but the AST transformation. The actual parser keeps track of how deep the stack is and has a hard limit, stopping before it segfaults on most systems. Is certainly possible to set the stack to something smaller and get it to segfault but at this point is behaving similarly as the python stack limit. All of the segfaults you seem to refer to are layer in the pipeline, and those are unchanged since the old parser. |
I agree with Pablo in #55552 (comment) that "SyntaxError: too many nested parentheses" is fine, and better than "parser stack full". I think we can close this. Another issue can be opened for improvement of something else. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: