Ideas for making ast.literal_eval() usable #83340

rhettinger · 2019-12-29T22:22:50Z

BPO	39159
Nosy	@rhettinger, @serhiy-storchaka, @eamanu, @isidentical
PRs	bpo-39159: Declare errors that might be raised from literal_eval #19899

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-12-29.22:22:49.960>
labels = ['type-feature', 'library', '3.9']
title = 'Ideas for making ast.literal_eval() usable'
updated_at = <Date 2020-12-22.02:41:55.180>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2020-12-22.02:41:55.180>
actor = 'rhettinger'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-12-29.22:22:49.960>
creator = 'rhettinger'
dependencies = []
files = []
hgrepos = []
issue_num = 39159
keywords = ['patch']
message_count = 6.0
messages = ['359011', '361995', '361998', '368042', '383563', '383568']
nosy_count = 4.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'eamanu', 'BTaskaya']
pr_nums = ['19899']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue39159'
versions = ['Python 3.9']

rhettinger · 2019-12-29T22:22:49Z

A primary goal for ast.literal_eval() is to "Safely evaluate an expression node or a string".

In the context of a real application, we need to do several things to make it possible to fulfill its design goal:

We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.
Define a size limit guaranteed not to give a MemoryError. The smallest unsafe size I've found so far is 301 characters:

s = '(' * 100 + '0' + ',)' * 100
literal_eval(s) # Raises a MemoryError
Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function. This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).
Perhaps document an example of how we suggest that someone process tainted input:

     expr = input('Enter a dataset in Python format: ')
     if len(expr) > 300:
        error(f'Maximum supported size is 300, not {len(expr)}')
     try:
        data = literal_eval(expr)
     except (TypeError, MemoryError, SyntaxError, ValueError):
        error('Input cannot be evaluated')

isidentical · 2020-02-14T19:37:52Z

We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

Maybe we should wrap all of these into something like LiteralEvalError to easily catch all of them, LiteralEvalError can be subclass of that four but I guess in some cases this change might break code.

Define a size limit guaranteed not to give a MemoryError. The smallest unsafe size I've found so far is 301 characters:

>>> s = "(" * 101 + ")" * 101
>>> len(s)
202
>>> ast.literal_eval(s)
s_push: parser stack overflow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/ast.py", line 61, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/local/lib/python3.9/ast.py", line 49, in parse
    return compile(source, filename, mode, flags,
MemoryError

Consider writing a standalone expression compiler that doesn't have the same small limits as our usual compile() function. This would make literal_eval() usable for evaluating tainted inputs with bigger datasets. (Imagine if the json module could only be safely used with inputs under 301 characters).

Can you explain it a bit more detailed, how does this standalone expression compiler should work?

isidentical · 2020-02-14T21:13:47Z

We should document possible exceptions that need to be caught. So far, I've found TypeError, MemoryError, SyntaxError, ValueError.

Also, an addition to these errors is RecursionError
>>> t = ast.Tuple(elts=[], ctx=ast.Load())
>>> t.elts.append(t)
>>> ast.literal_eval(t)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/ast.py", line 101, in literal_eval
    return _convert(node_or_string)
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  File "/usr/local/lib/python3.9/ast.py", line 81, in _convert
    return tuple(map(_convert, node.elts))
  [Previous line repeated 496 more times]
RecursionError: maximum recursion depth exceeded

serhiy-storchaka · 2020-05-04T12:24:41Z

It can also crash.

    ast.literal_eval('+0'*10**6)

The cause is that all AST handling C code (in particularly converting the AST from C to Python) is recursive, and therefore can overflow the C stack. Some recursive code has arbitrary limits which cause raising exceptions like MemoryError in the initial example, but not all code has such checks.

rhettinger · 2020-12-22T00:15:54Z

New changeset fbc7723 by Batuhan Taskaya in branch 'master':
bpo-39159: Declare error that might be raised from literal_eval (GH-19899)
fbc7723

rhettinger · 2020-12-22T02:41:55Z

Can you explain it a bit more detailed,
how does this standalone expression compiler should work?

Aim for something like JSON parser but for the supported Python constant expressions and with the existing tokenize module to feed a hand-rolled on-recursive parser

gpshead · 2022-04-14T20:10:43Z

related: the old #55552 issue about producing better errors from the parser. not crashing due to a C stack overflow counts as better. ;)

isidentical added stdlib Python modules in the Lib dir 3.9 only security fixes type-feature A feature request or enhancement labels Feb 14, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Rik-de-Kort mentioned this issue Jul 11, 2022

ast.literal_eval does not execute addition correclty #94749

Closed

vstinner mentioned this issue Aug 3, 2022

Rephrase ast.literal_eval() to remove any security warranty #95588

Closed

gpshead added 3.12 only security fixes and removed 3.9 only security fixes labels Aug 12, 2022

erlend-aasland added the 3.13 bugs and security fixes label Jan 5, 2024

LinasKo mentioned this issue Aug 14, 2024

fix: raise UploadErrors for images and annotations roboflow/roboflow-python#266

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ideas for making ast.literal_eval() usable #83340

Ideas for making ast.literal_eval() usable #83340

rhettinger commented Dec 29, 2019

rhettinger commented Dec 29, 2019

Uh oh!

isidentical commented Feb 14, 2020

Uh oh!

isidentical commented Feb 14, 2020

Uh oh!

serhiy-storchaka commented May 4, 2020

Uh oh!

rhettinger commented Dec 22, 2020

Uh oh!

rhettinger commented Dec 22, 2020

Uh oh!

gpshead commented Apr 14, 2022

Uh oh!

Uh oh!

Ideas for making ast.literal_eval() usable #83340

Ideas for making ast.literal_eval() usable #83340

Comments

rhettinger commented Dec 29, 2019

rhettinger commented Dec 29, 2019

Uh oh!

isidentical commented Feb 14, 2020

Uh oh!

isidentical commented Feb 14, 2020

Uh oh!

serhiy-storchaka commented May 4, 2020

Uh oh!

rhettinger commented Dec 22, 2020

Uh oh!

rhettinger commented Dec 22, 2020

Uh oh!

gpshead commented Apr 14, 2022

Uh oh!