Skip to content

Make errors due to full parser stack identifiable #55552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Trundle mannequin opened this issue Feb 26, 2011 · 32 comments
Closed

Make errors due to full parser stack identifiable #55552

Trundle mannequin opened this issue Feb 26, 2011 · 32 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@Trundle
Copy link
Mannequin

Trundle mannequin commented Feb 26, 2011

BPO 11343
Nosy @loewis, @birkenfeld, @terryjreedy, @ncoghlan, @orsenthil, @marienz, @pitrou, @benjaminp, @aroberge, @bitdancer, @Trundle, @ericsnowcurrently, @pablogsal, @iritkatriel
Files
  • issue11343.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-02-26.22:39:50.959>
    labels = ['interpreter-core', 'type-feature', '3.11']
    title = 'Make errors due to full parser stack identifiable'
    updated_at = <Date 2021-07-08.16:46:57.711>
    user = 'https://github.com/Trundle'

    bugs.python.org fields:

    activity = <Date 2021-07-08.16:46:57.711>
    actor = 'pablogsal'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2011-02-26.22:39:50.959>
    creator = 'Trundle'
    dependencies = []
    files = ['21387']
    hgrepos = []
    issue_num = 11343
    keywords = ['patch']
    message_count = 29.0
    messages = ['129604', '130076', '132063', '134778', '140456', '140469', '140470', '140479', '140480', '140496', '140498', '140499', '140500', '140501', '140502', '140504', '140506', '140507', '140517', '140631', '140635', '140646', '396883', '397125', '397133', '397151', '397152', '397153', '397154']
    nosy_count = 15.0
    nosy_names = ['loewis', 'georg.brandl', 'terry.reedy', 'ncoghlan', 'orsenthil', 'marienz', 'pitrou', 'benjamin.peterson', 'aroberge', 'Arfrever', 'r.david.murray', 'Trundle', 'eric.snow', 'pablogsal', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue11343'
    versions = ['Python 3.11']

    @Trundle
    Copy link
    Mannequin Author

    Trundle mannequin commented Feb 26, 2011

    Currently, if the parser's internal stack is full (as it happens when
    the parser tries to parse a deeply nested structure), the parser
    writes an error message to stderr and a bare MemoryError is
    raised. That way, it is really hard for REPLs do handle input that
    causes the parser to exhaust its stack correctly (see e.g. python -c "print('(' * 100 + ')' * 100)" | python -c "import code; code.interact()").

    In my opinion, a SyntaxError would be great because that way one can
    easily see where in the source code the error is caused. One reason
    against that (stated by MvL in msg1687) is that it's really
    syntactically correct Python and only the parser can't parse it. But
    then, the same is also true when there are too many indentation levels
    in which case an IndentationError is raised (which is a subclass of
    SyntaxError).

    I have attached two patches: The first still raises a MemoryError
    (for the sake of backward compatibility) but adds a message ("too many
    opening parens") to it. The second raises a SyntaxError instead.

    @Trundle Trundle mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Feb 26, 2011
    @terryjreedy
    Copy link
    Member

    I agree that compile-time stack exhaustion is different from runtime object-heap exhaustion and could/should have a different error.

    I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof. I believe IndentationError was added since 2000. Its doc is "Base class for syntax errors related to incorrect indentation." and it has TabError as a subclass, so its use for too many indents (not really 'incorrect') is not obvious. But having the position marked (if it would be) would be a plus.

    I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?

    @Trundle
    Copy link
    Mannequin Author

    Trundle mannequin commented Mar 25, 2011

    On Fri, Mar 4, 2011 at 9:30 PM, Terry J. Reedy <[email protected]> wrote:

    I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof.

    I added a new ParserError that inherits from SyntaxError.

    I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?

    Yes, I meant a read-eval-print-loop. A new error would help and if
    it's a subclass of SyntaxError, they most likely need no changes at
    all to handle that new error, as they have to deal with SyntaxErrors
    anyway.

    @Trundle
    Copy link
    Mannequin Author

    Trundle mannequin commented Apr 29, 2011

    FWIW, this also affects ast.literal_eval().

    @pitrou
    Copy link
    Member

    pitrou commented Jul 15, 2011

    This looks like a rather good idea, although I'm not sure it deserves a new global exception type.

    @benjaminp
    Copy link
    Contributor

    Just make it a SyntaxError.

    @benjaminp
    Copy link
    Contributor

    Oh, I see Martin disagrees. SyntaxError is raised for anything which Python won't compile. For example, too many arguments gets a SyntaxError.

    @ncoghlan
    Copy link
    Contributor

    +1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"

    Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".

    Possible docs phrasing:

    exception:: SyntaxLimitError

    Raised when the compilation and code generation process encounters an error due to internal limitations of the current implementation (e.g. the CPython parser has an upper limit on the number of nested sets of parentheses it can handle in a single expression, but no such limitation exists in the Python language reference).
    This is a subclass of :exc:`SyntaxError`.

    @birkenfeld
    Copy link
    Member

    I like SyntaxLimitError much better than ParserError.

    @benjaminp
    Copy link
    Contributor

    2011/7/16 Nick Coghlan <[email protected]>:

    Nick Coghlan <[email protected]> added the comment:

    +1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"

    Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".

    What is the point of a new exception? When would you ever want to
    catch it as opposed to a regular SyntaxError? You're going to have to
    change the code either way.

    Moreover, all Python implementations are going to have to place some
    limits on stack depth and recursion, so it's not really an
    implementation detail.

    The Python language doesn't make an mention of limited memory, but no
    one is going to suggest a MemoryLimitError, which is an
    "implementation detail".

    @ncoghlan
    Copy link
    Contributor

    It's important to remember that other implementations treat CPython as
    the "gold standard" for compatibility purposes. If we declare
    something to be an ordinary SyntaxError, then that carries strong
    implications for what other implementations should do.

    Some kinds of errors are inherently implementation specific
    (MemoryError and SystemError spring to mind). No sane implementor is
    going to try to match CPython like-for-like when it comes to those.
    SyntaxError, however, is typically defined by the language definition,
    not the CPython implementation of it. By introducing a new exception
    type, we're explicitly telling other implementations "Look, this is
    our problem that we don't plan to fix as it doesn't typically arise in
    real code, but please don't deliberately cripple your own
    implementation just to match this behaviour". I'm strongly with MvL on
    this one - SyntaxError itself should never be raised for legal Python
    code just because the CPython bytecode generation toolchain isn't able
    to handle it properly.

    @benjaminp
    Copy link
    Contributor

    2011/7/16 Nick Coghlan <[email protected]>:

    Nick Coghlan <[email protected]> added the comment:

    It's important to remember that other implementations treat CPython as
    the "gold standard" for compatibility purposes. If we declare
    something to be an ordinary SyntaxError, then that carries strong
    implications for what other implementations should do.

    Some kinds of errors are inherently implementation specific
    (MemoryError and SystemError spring to mind). No sane implementor is
    going to try to match CPython like-for-like when it comes to those.
    SyntaxError, however, is typically defined by the language definition,
    not the CPython implementation of it. By introducing a new exception
    type, we're explicitly telling other implementations "Look, this is
    our problem that we don't plan to fix as it doesn't typically arise in
    real code, but please don't deliberately cripple your own
    implementation just to match this behaviour". I'm strongly with MvL on
    this one - SyntaxError itself should never be raised for legal Python
    code just because the CPython bytecode generation toolchain isn't able
    to handle it properly.

    Implementations are quite good at figuring this stuff out themselves.
    We don't need a whole new exception (new docs, one more thing in the
    builtin namespace) to "send a message" to other implementations. If
    some implementation is not part of the language, then its test just
    just be marked cpython_only. It shouldn't be brought into public API
    and the language.

    @ncoghlan
    Copy link
    Contributor

    It also makes it clear to users whether they've just run up against a
    limitation of the implementation they're using or whether what they've
    written is genuinely illegal code. They are NOT the same thing.
    Attempting to conflate them into one exception for the mere sake of
    not adding a new exception type is a false economy. Take it to
    python-dev if you want, but I will strongly oppose this check going in
    with it raising an ordinary SyntaxError instead of a new subclass.

    @benjaminp
    Copy link
    Contributor

    2011/7/16 Nick Coghlan <[email protected]>:

    Nick Coghlan <[email protected]> added the comment:

    It also makes it clear to users whether they've just run up against a
    limitation of the implementation they're using or whether what they've
    written is genuinely illegal code. They are NOT the same thing.
    Attempting to conflate them into one exception for the mere sake of
    not adding a new exception type is a false economy. Take it to
    python-dev if you want, but I will strongly oppose this check going in
    with it raising an ordinary SyntaxError instead of a new subclass.

    Why would it matter to users what variety of limitation they're
    running up against? As I said, they're still going to have to change
    their code.

    @ncoghlan
    Copy link
    Contributor

    It matters, because Python "users" are programmers, and most
    programmers want to know *why* they're being told something is wrong.
    Raising MemoryError is technically incorrect at this point, but at
    least gives the right flavour of the user not doing anything
    specifically wrong, the program just ran out of resources trying to do
    as they asked. I see SyntaxError as significantly worse, however, as
    instead of telling the developer "sorry, that's a reasonable request,
    but I can't handle it", the interpreter is now telling them "you did
    that wrong, change it". The semantics of that are all wrong. There's
    nothing wrong with the syntax of the user's code, it's just that the
    compiler can't handle it.

    Similarly, the eventual fix to the recursive crash in the compiler
    itself is likely to be an arbitrary limit on expression nesting.
    Again, nothing wrong with their syntax, but they'll need to adjust
    their (legal!) code to work around implementation limitations.

    @benjaminp
    Copy link
    Contributor

    2011/7/16 Nick Coghlan <[email protected]>:

    Nick Coghlan <[email protected]> added the comment:

    It matters, because Python "users" are programmers, and most
    programmers want to know *why* they're being told something is wrong.
    Raising MemoryError is technically incorrect at this point, but at
    least gives the right flavour of the user not doing anything
    specifically wrong, the program just ran out of resources trying to do
    as they asked. I see SyntaxError as significantly worse, however, as
    instead of telling the developer "sorry, that's a reasonable request,
    but I can't handle it", the interpreter is now telling them "you did
    that wrong, change it". The semantics of that are all wrong. There's
    nothing wrong with the syntax of the user's code, it's just that the
    compiler can't handle it.

    I content that in normal code, it is so extremely rare as to be
    unheard of, to get exceptions about the parser stack overflowing or
    segfault the compiler by too deep nesting. People who are doing this
    (generally to prove the point about limitations of the compiler) are
    smart enough to know exactly what "parser stack overflowed" means and
    separate that from Python as the language. Adding a new exception is
    solving a non-existent problem.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 16, 2011

    I content that in normal code, it is so extremely rare as to be
    unheard of, to get exceptions about the parser stack overflowing or
    segfault the compiler by too deep nesting. People who are doing this
    (generally to prove the point about limitations of the compiler) are
    smart enough to know exactly what "parser stack overflowed" means and
    separate that from Python as the language. Adding a new exception is
    solving a non-existent problem.

    Agreed with Benjamin. Also, "why something is wrong" can simply be told
    in the exception message.

    @bitdancer
    Copy link
    Member

    This is a purity versus practicality argument. I agree with both sides :)

    @ncoghlan
    Copy link
    Contributor

    After further reflection, I (eventually) came to the same conclusion as Benjamin and Antoine - using SyntaxError for this is fine.

    However, the exception *message* should make it clear that this is an implementation limit rather than a language limit.

    @orsenthil
    Copy link
    Member

    Oh, I thought we never rely on exception "message" for anything
    important. However this seems to be an exception for that exception.
    :-)

    @pitrou
    Copy link
    Member

    pitrou commented Jul 18, 2011

    Oh, I thought we never rely on exception "message" for anything
    important. However this seems to be an exception for that exception.
    :-)

    I think you're missing the point. People usually don't catch SyntaxError
    and run fallback code in this case. And even if they do, they are
    unlikely to care about whether the ErreurDeSyntaxe is due to a genuinely
    faulty piece of code, or a parser limitation ;)

    @ncoghlan
    Copy link
    Contributor

    Yeah, at the level of *code* the origin doesn't really matter, so re-using the exception type is actually OK. However, for *people* seeing the stack trace, it may be useful to know what's genuinely illegal syntax and what's a limitation of the current implementation. The exception message is a good place for that additional detail.

    @iritkatriel
    Copy link
    Member

    The patch relates to the old parser.

    With the new parser the 100*(+100*) example works. If we go to 1000 instead of 100 we get "SyntaxError: too many nested parentheses".

    From the discussion it seems that the idea of a new exception type for implementation-limits vs "real" SyntaxErrors was rejected as not useful.

    Is there anything left to do on this issue?

    @terryjreedy
    Copy link
    Member

    I retract my original comment as I now agree with SyntaxError + distinct message.

    Given Irit's report, I think the remaining question is whether

    A. The implication that 'too many nested parentheses' is compiler specific is strong enough, and any thing further should be left to 3rd parties to explain, and this issue should be closed as 'out of date'.

    B. We should be explicit and add something like 'for this compiler'.

    Since Pablo has been working on improving error messages, I would give his opinion high weight.

    To find the actual limit, I did the following with Win 10, 3.10.0b3, IDLE:

    >>> def f(n):
    ...     print(eval('('*n + ')'*n"))
    ... 
    And after some trials, found
    >>> f(200)
    ()
    >>> f(201)
    Traceback (most recent call last):
      File "<pyshell#56>", line 1, in <module>
        f(201)
      File "<pyshell#42>", line 2, in f
        print(eval('('*n + ')'*n))
      File "<string>", line 1
        ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
                                                                                                                                                                                                                                                                                                                                                                                                                            ^
    SyntaxError: too many nested parentheses

    The error is detected in 'string' line 1, 1-based offset 201, at the final '('. There are 200 spaces before the caret but they are not obvious in this box, which does not wrap spaces properly. The end_lineno and end_offset are None, None rather than the 1,202 I expected.

    @terryjreedy terryjreedy added the 3.11 only security fixes label Jul 8, 2021
    @aroberge
    Copy link
    Mannequin

    aroberge mannequin commented Jul 8, 2021

    For information: I created an actual .py file from the last example with 200 parens and got the following error message:

    python example.py
    s_push: parser stack overflow
    MemoryError

    For consistency and greater clarity, I think that having messages such as

    • SyntaxError: parser stack overflow (too many nested parenthesis)
    • SyntaxError: parser stack overflow (too many statistically nested blocks)

    and possibly others (too many arguments?), would be preferable, with
    the possibility of simply having

    • SyntaxError: parser stack overflow

    if the exact cause cannot be identified at the parsing stage.

    @pablogsal
    Copy link
    Member

    > python example.py
    s_push: parser stack overflow
    MemoryError

    That error can only happen with the old parser, so you must be using Python3.8 or 3.9 with the old parser. Can you confirm?

    @aroberge
    Copy link
    Mannequin

    aroberge mannequin commented Jul 8, 2021

    @pablo: Sincere apologies ... I tested this with the wrong virtual environment active locally. With 3.10, it is indeed SyntaxError: too many nested parentheses

    @pablogsal
    Copy link
    Member

    As a piece of extra information, this limit has been there for a while:

    #define MAXLEVEL 200 /* Max parentheses level */

    the problem is that the old parser died before reaching it in many situations.

    @pablogsal
    Copy link
    Member

    Regarding what to do, what I would love to optimize is that the message is clear. One of the challenges people may have with too specific messages is that they may not know what "a compiler" or "a parser" means (the first may be especially confusing since many people think Python is not "compiled".

    On the other hand, is more informative to say "too many nested parentheses" than "parser stack overflow" since many people don't know what that means, especially in Python.

    I

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @gpshead
    Copy link
    Member

    gpshead commented Apr 14, 2022

    C stack overflows and thus crashes still happen with the new parser. oss-fuzz just handed me another example of this with a 49k input that is probably about half + signs. Basically it causes ast2obj_expr() recursion within case BinOp_kind: exhausting the C stack. But there are all sorts of ways to trigger this kind of thing as the generated Python/Python-ast.c code uses C stack based recursion. It doesn't fail on my own simple pydebug build (presumably linux default 8mb stack) but did on oss-fuzz which was built with sanitizers enabled and possibly a smaller default stack.

    If we had pgen generate code not relying on the C stack we could monitor resource usage. That's a more complicated change. Similarly but less exact and "good enough" for reasonable code: If we had Include/internal/pycore_ast_state.h add a recursion depth counter in struct ast_state and require all calls slinging state around to increment and check that we could bail out nicely before getting anywhere close a likely C stack limit.

    for reference whenever that oss-fuzz issue becomes public (a few months?) https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=46699. I doubt it's specific input is special, just pointing to where in the code it happens as done above should give the idea that just creating something along the lines of {a+a+a+a+a+a+a ...and on and on and on... +a+a+a+a} can trigger a C stack overflow in ast.literal_eval()

    @pablogsal
    Copy link
    Member

    C stack overflows and thus crashes still happen with the new parser. 

    Although I'm sure we can stress pegen to crash in similar ways, what you are reporting here is not technically the parser, but the AST transformation.

    The actual parser keeps track of how deep the stack is and has a hard limit, stopping before it segfaults on most systems. Is certainly possible to set the stack to something smaller and get it to segfault but at this point is behaving similarly as the python stack limit.

    All of the segfaults you seem to refer to are layer in the pipeline, and those are unchanged since the old parser.

    @terryjreedy
    Copy link
    Member

    I agree with Pablo in #55552 (comment) that "SyntaxError: too many nested parentheses" is fine, and better than "parser stack full". I think we can close this. Another issue can be opened for improvement of something else.

    @pablogsal pablogsal closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    10 participants