Skip to content

Ideas for making ast.literal_eval() usable #83340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhettinger opened this issue Dec 29, 2019 · 7 comments
Open

Ideas for making ast.literal_eval() usable #83340

rhettinger opened this issue Dec 29, 2019 · 7 comments
Labels
3.12 only security fixes 3.13 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 39159
Nosy @rhettinger, @serhiy-storchaka, @eamanu, @isidentical
PRs
  • bpo-39159: Declare errors that might be raised from literal_eval #19899
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-12-29.22:22:49.960>
    labels = ['type-feature', 'library', '3.9']
    title = 'Ideas for making ast.literal_eval() usable'
    updated_at = <Date 2020-12-22.02:41:55.180>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2020-12-22.02:41:55.180>
    actor = 'rhettinger'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-12-29.22:22:49.960>
    creator = 'rhettinger'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39159
    keywords = ['patch']
    message_count = 6.0
    messages = ['359011', '361995', '361998', '368042', '383563', '383568']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'serhiy.storchaka', 'eamanu', 'BTaskaya']
    pr_nums = ['19899']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue39159'
    versions = ['Python 3.9']

    @rhettinger
    Copy link
    Contributor Author

    A primary goal for ast.literal_eval() is to "Safely evaluate an expression node or a string".

    In the context of a real application, we need to do several things to make it possible to fulfill its design goal:

    1. We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

    2. Define a size limit guaranteed not to give a MemoryError. The smallest unsafe size I've found so far is 301 characters:

      s = '(' * 100 + '0' + ',)' * 100
      literal_eval(s) # Raises a MemoryError

    3. Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function. This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).

    4. Perhaps document an example of how we suggest that someone process tainted input:

         expr = input('Enter a dataset in Python format: ')
         if len(expr) > 300:
            error(f'Maximum supported size is 300, not {len(expr)}')
         try:
            data = literal_eval(expr)
         except (TypeError, MemoryError, SyntaxError, ValueError):
            error('Input cannot be evaluated')

    @isidentical
    Copy link
    Member

    1. We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

    Maybe we should wrap all of these into something like LiteralEvalError to easily catch all of them, LiteralEvalError can be subclass of that four but I guess in some cases this change might break code.

    1. Define a size limit guaranteed not to give a MemoryError. The smallest unsafe size I've found so far is 301 characters:
    >>> s = "(" * 101 + ")" * 101
    >>> len(s)
    202
    >>> ast.literal_eval(s)
    s_push: parser stack overflow
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.9/ast.py", line 61, in literal_eval
        node_or_string = parse(node_or_string, mode='eval')
      File "/usr/local/lib/python3.9/ast.py", line 49, in parse
        return compile(source, filename, mode, flags,
    MemoryError
    1. Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function. This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).

    Can you explain it a bit more detailed, how does this standalone expression compiler should work?

    @isidentical isidentical added stdlib Python modules in the Lib dir 3.9 only security fixes type-feature A feature request or enhancement labels Feb 14, 2020
    @isidentical
    Copy link
    Member

    1. We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.
    Also, an addition to these errors is RecursionError
    >>> t = ast.Tuple(elts=[], ctx=ast.Load())
    >>> t.elts.append(t)
    >>> ast.literal_eval(t)
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.9/ast.py", line 101, in literal_eval
        return _convert(node_or_string)
      File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
        return tuple(map(_convert, node.elts))
      File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
        return tuple(map(_convert, node.elts))
      File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
        return tuple(map(_convert, node.elts))
      [Previous line repeated 496 more times]
    RecursionError: maximum recursion depth exceeded

    @serhiy-storchaka
    Copy link
    Member

    It can also crash.

        ast.literal_eval('+0'*10**6)

    The cause is that all AST handling C code (in particularly converting the AST from C to Python) is recursive, and therefore can overflow the C stack. Some recursive code has arbitrary limits which cause raising exceptions like MemoryError in the initial example, but not all code has such checks.

    @rhettinger
    Copy link
    Contributor Author

    New changeset fbc7723 by Batuhan Taskaya in branch 'master':
    bpo-39159: Declare error that might be raised from literal_eval (GH-19899)
    fbc7723

    @rhettinger
    Copy link
    Contributor Author

    Can you explain it a bit more detailed,
    how does this standalone expression compiler should work?

    Aim for something like JSON parser but for the supported Python constant expressions and with the existing tokenize module to feed a hand-rolled on-recursive parser

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @gpshead
    Copy link
    Member

    gpshead commented Apr 14, 2022

    related: the old #55552 issue about producing better errors from the parser. not crashing due to a C stack overflow counts as better. ;)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.12 only security fixes 3.13 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants