Skip to content

bpo-34087: Fix buffer overflow in int(s) and similar functions #8274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Lib/test/test_complex.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,9 @@ def split_zeros(x):
self.assertEqual(type(complex("1"*500)), complex)
# check whitespace processing
self.assertEqual(complex('\N{EM SPACE}(\N{EN SPACE}1+1j ) '), 1+1j)
# Invalid unicode string
# See bpo-34087
self.assertRaises(ValueError, complex, '\u3053\u3093\u306b\u3061\u306f')

class EvilExc(Exception):
pass
Expand Down
3 changes: 3 additions & 0 deletions Lib/test/test_float.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ def test_float(self):
# extra long strings should not be a problem
float(b'.' + b'1'*1000)
float('.' + '1'*1000)
# Invalid unicode string
# See bpo-34087
self.assertRaises(ValueError, float, '\u3053\u3093\u306b\u3061\u306f')

def test_underscores(self):
for lit in VALID_UNDERSCORE_LITERALS:
Expand Down
4 changes: 4 additions & 0 deletions Lib/test/test_long.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,10 @@ def test_long(self):
for base in invalid_bases:
self.assertRaises(ValueError, int, '42', base)

# Invalid unicode string
# See bpo-34087
self.assertRaises(ValueError, int, '\u3053\u3093\u306b\u3061\u306f')


def test_conversion(self):

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix buffer overflow while converting unicode to numeric values.
2 changes: 2 additions & 0 deletions Objects/unicodeobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -9072,13 +9072,15 @@ _PyUnicode_TransformDecimalAndSpaceToASCII(PyObject *unicode)
int decimal = Py_UNICODE_TODECIMAL(ch);
if (decimal < 0) {
out[i] = '?';
out[i+1] = '\0';
_PyUnicode_LENGTH(result) = i + 1;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To debug other kinds of bugs, you can call "assert(_PyUnicode_CheckConsistency(unicode, 1));" just before "return result;". This function detects if the last character is not a NUL character, for example ;-)

out[i] = '0' + decimal;
}
}

assert(_PyUnicode_CheckConsistency(result, 1));
return result;
}

Expand Down
2 changes: 2 additions & 0 deletions Python/pystrtod.c
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,8 @@ _Py_string_to_number_with_underscores(
char *dup, *end;
PyObject *result;

assert(s[orig_len] == '\0');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this assertion. For performance, it would make sense to call _Py_string_to_number_with_underscores() on a sub-string which doesn't end with a NUL character. If you want to add an assertion, please do it at the caller site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is fine to me. _Py_string_to_number_with_underscores() is always called with a substring that end with a NUL character (if it doesn't end, we make a copy and call this function with it). And _Py_string_to_number_with_underscores() calls innerfunc() which expects a NUL terminated string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the following lines calls strchr() which expects that the string ends with a NUL byte.

In that case, should we also test that strlen(s) == orig_len to block embedded NUL byte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed. NUL byte in string is valid.

>>> len("\0\0\0")
3


if (strchr(s, '_') == NULL) {
return innerfunc(s, orig_len, arg);
}
Expand Down