Make errors due to full parser stack identifiable #55552

Trundle · 2011-02-26T22:39:51Z

BPO	11343
Nosy	@loewis, @birkenfeld, @terryjreedy, @ncoghlan, @orsenthil, @marienz, @pitrou, @benjaminp, @aroberge, @bitdancer, @Trundle, @ericsnowcurrently, @pablogsal, @iritkatriel
Files	issue11343.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2011-02-26.22:39:50.959>
labels = ['interpreter-core', 'type-feature', '3.11']
title = 'Make errors due to full parser stack identifiable'
updated_at = <Date 2021-07-08.16:46:57.711>
user = 'https://github.com/Trundle'

bugs.python.org fields:

activity = <Date 2021-07-08.16:46:57.711>
actor = 'pablogsal'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2011-02-26.22:39:50.959>
creator = 'Trundle'
dependencies = []
files = ['21387']
hgrepos = []
issue_num = 11343
keywords = ['patch']
message_count = 29.0
messages = ['129604', '130076', '132063', '134778', '140456', '140469', '140470', '140479', '140480', '140496', '140498', '140499', '140500', '140501', '140502', '140504', '140506', '140507', '140517', '140631', '140635', '140646', '396883', '397125', '397133', '397151', '397152', '397153', '397154']
nosy_count = 15.0
nosy_names = ['loewis', 'georg.brandl', 'terry.reedy', 'ncoghlan', 'orsenthil', 'marienz', 'pitrou', 'benjamin.peterson', 'aroberge', 'Arfrever', 'r.david.murray', 'Trundle', 'eric.snow', 'pablogsal', 'iritkatriel']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue11343'
versions = ['Python 3.11']

Trundle · 2011-02-26T22:39:50Z

Currently, if the parser's internal stack is full (as it happens when
the parser tries to parse a deeply nested structure), the parser
writes an error message to stderr and a bare MemoryError is
raised. That way, it is really hard for REPLs do handle input that
causes the parser to exhaust its stack correctly (see e.g. python -c "print('(' * 100 + ')' * 100)" | python -c "import code; code.interact()").

In my opinion, a SyntaxError would be great because that way one can
easily see where in the source code the error is caused. One reason
against that (stated by MvL in msg1687) is that it's really
syntactically correct Python and only the parser can't parse it. But
then, the same is also true when there are too many indentation levels
in which case an IndentationError is raised (which is a subclass of
SyntaxError).

I have attached two patches: The first still raises a MemoryError
(for the sake of backward compatibility) but adds a message ("too many
opening parens") to it. The second raises a SyntaxError instead.

terryjreedy · 2011-03-04T21:30:10Z

I agree that compile-time stack exhaustion is different from runtime object-heap exhaustion and could/should have a different error.

I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof. I believe IndentationError was added since 2000. Its doc is "Base class for syntax errors related to incorrect indentation." and it has TabError as a subclass, so its use for too many indents (not really 'incorrect') is not obvious. But having the position marked (if it would be) would be a plus.

I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?

Trundle · 2011-03-25T01:05:27Z

On Fri, Mar 4, 2011 at 9:30 PM, Terry J. Reedy <[email protected]> wrote:

I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof.

I added a new ParserError that inherits from SyntaxError.

I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?

Yes, I meant a read-eval-print-loop. A new error would help and if
it's a subclass of SyntaxError, they most likely need no changes at
all to handle that new error, as they have to deal with SyntaxErrors
anyway.

Trundle · 2011-04-29T16:04:17Z

FWIW, this also affects ast.literal_eval().

pitrou · 2011-07-15T18:53:12Z

This looks like a rather good idea, although I'm not sure it deserves a new global exception type.

benjaminp · 2011-07-15T19:47:06Z

Just make it a SyntaxError.

benjaminp · 2011-07-15T19:48:27Z

Oh, I see Martin disagrees. SyntaxError is raised for anything which Python won't compile. For example, too many arguments gets a SyntaxError.

ncoghlan · 2011-07-16T07:56:41Z

+1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"

Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".

Possible docs phrasing:

exception:: SyntaxLimitError

Raised when the compilation and code generation process encounters an error due to internal limitations of the current implementation (e.g. the CPython parser has an upper limit on the number of nested sets of parentheses it can handle in a single expression, but no such limitation exists in the Python language reference).
This is a subclass of :exc:`SyntaxError`.

birkenfeld · 2011-07-16T08:24:37Z

I like SyntaxLimitError much better than ParserError.

benjaminp · 2011-07-16T14:43:34Z

2011/7/16 Nick Coghlan <[email protected]>:

Nick Coghlan <[email protected]> added the comment:

+1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"

Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".

What is the point of a new exception? When would you ever want to
catch it as opposed to a regular SyntaxError? You're going to have to
change the code either way.

Moreover, all Python implementations are going to have to place some
limits on stack depth and recursion, so it's not really an
implementation detail.

The Python language doesn't make an mention of limited memory, but no
one is going to suggest a MemoryLimitError, which is an
"implementation detail".

ncoghlan · 2011-07-16T15:10:41Z

It's important to remember that other implementations treat CPython as
the "gold standard" for compatibility purposes. If we declare
something to be an ordinary SyntaxError, then that carries strong
implications for what other implementations should do.

Some kinds of errors are inherently implementation specific
(MemoryError and SystemError spring to mind). No sane implementor is
going to try to match CPython like-for-like when it comes to those.
SyntaxError, however, is typically defined by the language definition,
not the CPython implementation of it. By introducing a new exception
type, we're explicitly telling other implementations "Look, this is
our problem that we don't plan to fix as it doesn't typically arise in
real code, but please don't deliberately cripple your own
implementation just to match this behaviour". I'm strongly with MvL on
this one - SyntaxError itself should never be raised for legal Python
code just because the CPython bytecode generation toolchain isn't able
to handle it properly.

benjaminp · 2011-07-16T15:16:05Z

2011/7/16 Nick Coghlan <[email protected]>:

Nick Coghlan <[email protected]> added the comment:

It's important to remember that other implementations treat CPython as
the "gold standard" for compatibility purposes. If we declare
something to be an ordinary SyntaxError, then that carries strong
implications for what other implementations should do.

Some kinds of errors are inherently implementation specific
(MemoryError and SystemError spring to mind). No sane implementor is
going to try to match CPython like-for-like when it comes to those.
SyntaxError, however, is typically defined by the language definition,
not the CPython implementation of it. By introducing a new exception
type, we're explicitly telling other implementations "Look, this is
our problem that we don't plan to fix as it doesn't typically arise in
real code, but please don't deliberately cripple your own
implementation just to match this behaviour". I'm strongly with MvL on
this one - SyntaxError itself should never be raised for legal Python
code just because the CPython bytecode generation toolchain isn't able
to handle it properly.

Implementations are quite good at figuring this stuff out themselves.
We don't need a whole new exception (new docs, one more thing in the
builtin namespace) to "send a message" to other implementations. If
some implementation is not part of the language, then its test just
just be marked cpython_only. It shouldn't be brought into public API
and the language.

ncoghlan · 2011-07-16T15:22:46Z

It also makes it clear to users whether they've just run up against a
limitation of the implementation they're using or whether what they've
written is genuinely illegal code. They are NOT the same thing.
Attempting to conflate them into one exception for the mere sake of
not adding a new exception type is a false economy. Take it to
python-dev if you want, but I will strongly oppose this check going in
with it raising an ordinary SyntaxError instead of a new subclass.

benjaminp · 2011-07-16T15:33:26Z

2011/7/16 Nick Coghlan <[email protected]>:

Nick Coghlan <[email protected]> added the comment:

It also makes it clear to users whether they've just run up against a
limitation of the implementation they're using or whether what they've
written is genuinely illegal code. They are NOT the same thing.
Attempting to conflate them into one exception for the mere sake of
not adding a new exception type is a false economy. Take it to
python-dev if you want, but I will strongly oppose this check going in
with it raising an ordinary SyntaxError instead of a new subclass.

Why would it matter to users what variety of limitation they're
running up against? As I said, they're still going to have to change
their code.

ncoghlan · 2011-07-16T15:53:45Z

It matters, because Python "users" are programmers, and most
programmers want to know *why* they're being told something is wrong.
Raising MemoryError is technically incorrect at this point, but at
least gives the right flavour of the user not doing anything
specifically wrong, the program just ran out of resources trying to do
as they asked. I see SyntaxError as significantly worse, however, as
instead of telling the developer "sorry, that's a reasonable request,
but I can't handle it", the interpreter is now telling them "you did
that wrong, change it". The semantics of that are all wrong. There's
nothing wrong with the syntax of the user's code, it's just that the
compiler can't handle it.

Similarly, the eventual fix to the recursive crash in the compiler
itself is likely to be an arbitrary limit on expression nesting.
Again, nothing wrong with their syntax, but they'll need to adjust
their (legal!) code to work around implementation limitations.

benjaminp · 2011-07-16T16:50:35Z

2011/7/16 Nick Coghlan <[email protected]>:

Nick Coghlan <[email protected]> added the comment:

It matters, because Python "users" are programmers, and most
programmers want to know *why* they're being told something is wrong.
Raising MemoryError is technically incorrect at this point, but at
least gives the right flavour of the user not doing anything
specifically wrong, the program just ran out of resources trying to do
as they asked. I see SyntaxError as significantly worse, however, as
instead of telling the developer "sorry, that's a reasonable request,
but I can't handle it", the interpreter is now telling them "you did
that wrong, change it". The semantics of that are all wrong. There's
nothing wrong with the syntax of the user's code, it's just that the
compiler can't handle it.

I content that in normal code, it is so extremely rare as to be
unheard of, to get exceptions about the parser stack overflowing or
segfault the compiler by too deep nesting. People who are doing this
(generally to prove the point about limitations of the compiler) are
smart enough to know exactly what "parser stack overflowed" means and
separate that from Python as the language. Adding a new exception is
solving a non-existent problem.

pitrou · 2011-07-16T22:28:55Z

I content that in normal code, it is so extremely rare as to be
unheard of, to get exceptions about the parser stack overflowing or
segfault the compiler by too deep nesting. People who are doing this
(generally to prove the point about limitations of the compiler) are
smart enough to know exactly what "parser stack overflowed" means and
separate that from Python as the language. Adding a new exception is
solving a non-existent problem.

Agreed with Benjamin. Also, "why something is wrong" can simply be told
in the exception message.

bitdancer · 2011-07-16T23:13:20Z

This is a purity versus practicality argument. I agree with both sides :)

ncoghlan · 2011-07-17T06:32:24Z

After further reflection, I (eventually) came to the same conclusion as Benjamin and Antoine - using SyntaxError for this is fine.

However, the exception *message* should make it clear that this is an implementation limit rather than a language limit.

orsenthil · 2011-07-18T23:28:02Z

Oh, I thought we never rely on exception "message" for anything
important. However this seems to be an exception for that exception.
:-)

pitrou · 2011-07-18T23:34:38Z

Oh, I thought we never rely on exception "message" for anything
important. However this seems to be an exception for that exception.
:-)

I think you're missing the point. People usually don't catch SyntaxError
and run fallback code in this case. And even if they do, they are
unlikely to care about whether the ErreurDeSyntaxe is due to a genuinely
faulty piece of code, or a parser limitation ;)

ncoghlan · 2011-07-19T02:11:21Z

Yeah, at the level of *code* the origin doesn't really matter, so re-using the exception type is actually OK. However, for *people* seeing the stack trace, it may be useful to know what's genuinely illegal syntax and what's a limitation of the current implementation. The exception message is a good place for that additional detail.

iritkatriel · 2021-07-02T23:10:59Z

The patch relates to the old parser.

With the new parser the 100*(+100*) example works. If we go to 1000 instead of 100 we get "SyntaxError: too many nested parentheses".

From the discussion it seems that the idea of a new exception type for implementation-limits vs "real" SyntaxErrors was rejected as not useful.

Is there anything left to do on this issue?

terryjreedy · 2021-07-08T03:41:02Z

I retract my original comment as I now agree with SyntaxError + distinct message.

Given Irit's report, I think the remaining question is whether

A. The implication that 'too many nested parentheses' is compiler specific is strong enough, and any thing further should be left to 3rd parties to explain, and this issue should be closed as 'out of date'.

B. We should be explicit and add something like 'for this compiler'.

Since Pablo has been working on improving error messages, I would give his opinion high weight.

To find the actual limit, I did the following with Win 10, 3.10.0b3, IDLE:

>>> def f(n):
...     print(eval('('*n + ')'*n"))
... 
And after some trials, found
>>> f(200)
()
>>> f(201)
Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    f(201)
  File "<pyshell#42>", line 2, in f
    print(eval('('*n + ')'*n))
  File "<string>", line 1
    ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
                                                                                                                                                                                                                                                                                                                                                                                                                        ^
SyntaxError: too many nested parentheses

The error is detected in 'string' line 1, 1-based offset 201, at the final '('. There are 200 spaces before the caret but they are not obvious in this box, which does not wrap spaces properly. The end_lineno and end_offset are None, None rather than the 1,202 I expected.

aroberge · 2021-07-08T09:45:21Z

For information: I created an actual .py file from the last example with 200 parens and got the following error message:

python example.py
s_push: parser stack overflow
MemoryError

For consistency and greater clarity, I think that having messages such as

SyntaxError: parser stack overflow (too many nested parenthesis)
SyntaxError: parser stack overflow (too many statistically nested blocks)

and possibly others (too many arguments?), would be preferable, with
the possibility of simply having

SyntaxError: parser stack overflow

if the exact cause cannot be identified at the parsing stage.

pablogsal · 2021-07-08T16:38:49Z

> python example.py
s_push: parser stack overflow
MemoryError

That error can only happen with the old parser, so you must be using Python3.8 or 3.9 with the old parser. Can you confirm?

aroberge · 2021-07-08T16:43:04Z

@pablo: Sincere apologies ... I tested this with the wrong virtual environment active locally. With 3.10, it is indeed SyntaxError: too many nested parentheses

pablogsal · 2021-07-08T16:43:39Z

As a piece of extra information, this limit has been there for a while:

cpython/Parser/tokenizer.h

Line 14 in 91a8f8c

#define MAXLEVEL 200 /* Max parentheses level */

the problem is that the old parser died before reaching it in many situations.

pablogsal · 2021-07-08T16:46:58Z

Regarding what to do, what I would love to optimize is that the message is clear. One of the challenges people may have with too specific messages is that they may not know what "a compiler" or "a parser" means (the first may be especially confusing since many people think Python is not "compiled".

On the other hand, is more informative to say "too many nested parentheses" than "parser stack overflow" since many people don't know what that means, especially in Python.

I

gpshead · 2022-04-14T19:58:50Z

C stack overflows and thus crashes still happen with the new parser. oss-fuzz just handed me another example of this with a 49k input that is probably about half + signs. Basically it causes ast2obj_expr() recursion within case BinOp_kind: exhausting the C stack. But there are all sorts of ways to trigger this kind of thing as the generated Python/Python-ast.c code uses C stack based recursion. It doesn't fail on my own simple pydebug build (presumably linux default 8mb stack) but did on oss-fuzz which was built with sanitizers enabled and possibly a smaller default stack.

If we had pgen generate code not relying on the C stack we could monitor resource usage. That's a more complicated change. Similarly but less exact and "good enough" for reasonable code: If we had Include/internal/pycore_ast_state.h add a recursion depth counter in struct ast_state and require all calls slinging state around to increment and check that we could bail out nicely before getting anywhere close a likely C stack limit.

for reference whenever that oss-fuzz issue becomes public (a few months?) https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=46699. I doubt it's specific input is special, just pointing to where in the code it happens as done above should give the idea that just creating something along the lines of {a+a+a+a+a+a+a ...and on and on and on... +a+a+a+a} can trigger a C stack overflow in ast.literal_eval()

pablogsal · 2022-04-15T14:04:15Z

C stack overflows and thus crashes still happen with the new parser.

Although I'm sure we can stress pegen to crash in similar ways, what you are reporting here is not technically the parser, but the AST transformation.

The actual parser keeps track of how deep the stack is and has a hard limit, stopping before it segfaults on most systems. Is certainly possible to set the stack to something smaller and get it to segfault but at this point is behaving similarly as the python stack limit.

All of the segfaults you seem to refer to are layer in the pipeline, and those are unchanged since the old parser.

terryjreedy · 2023-11-03T06:42:52Z

I agree with Pablo in #55552 (comment) that "SyntaxError: too many nested parentheses" is fine, and better than "parser stack full". I think we can close this. Another issue can be opened for improvement of something else.

Trundle mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Feb 26, 2011

terryjreedy added the 3.11 only security fixes label Jul 8, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

gpshead mentioned this issue Apr 14, 2022

Ideas for making ast.literal_eval() usable #83340

Open

pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2023

Uh oh!

Make errors due to full parser stack identifiable #55552

Make errors due to full parser stack identifiable #55552

Comments

Trundle mannequin commented Feb 26, 2011

Trundle mannequin commented Feb 26, 2011 • edited by iritkatriel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terryjreedy commented Mar 4, 2011

Uh oh!

Trundle mannequin commented Mar 25, 2011

Uh oh!

Trundle mannequin commented Apr 29, 2011

Uh oh!

pitrou commented Jul 15, 2011

Uh oh!

benjaminp commented Jul 15, 2011

Uh oh!

benjaminp commented Jul 15, 2011

Uh oh!

ncoghlan commented Jul 16, 2011

Uh oh!

birkenfeld commented Jul 16, 2011

Uh oh!

benjaminp commented Jul 16, 2011

Uh oh!

ncoghlan commented Jul 16, 2011

Uh oh!

benjaminp commented Jul 16, 2011

Uh oh!

ncoghlan commented Jul 16, 2011

Uh oh!

benjaminp commented Jul 16, 2011

Uh oh!

ncoghlan commented Jul 16, 2011

Uh oh!

benjaminp commented Jul 16, 2011

Uh oh!

pitrou commented Jul 16, 2011

Uh oh!

bitdancer commented Jul 16, 2011

Uh oh!

ncoghlan commented Jul 17, 2011

Uh oh!

orsenthil commented Jul 18, 2011

Uh oh!

pitrou commented Jul 18, 2011

Uh oh!

ncoghlan commented Jul 19, 2011

Uh oh!

iritkatriel commented Jul 2, 2021

Uh oh!

terryjreedy commented Jul 8, 2021

Uh oh!

aroberge mannequin commented Jul 8, 2021

Uh oh!

pablogsal commented Jul 8, 2021

Uh oh!

aroberge mannequin commented Jul 8, 2021

Uh oh!

pablogsal commented Jul 8, 2021

Uh oh!

pablogsal commented Jul 8, 2021

Uh oh!

gpshead commented Apr 14, 2022

Uh oh!

pablogsal commented Apr 15, 2022

Uh oh!

terryjreedy commented Nov 3, 2023

Uh oh!

Trundle mannequin commented Feb 26, 2011 •

edited by iritkatriel

Loading