Skip to content

Commit 9407631

Browse files
gh-82052: Don't send partial UTF-8 sequences to the Windows API (GH-101103)
Don't send partial UTF-8 sequences to the Windows API (cherry picked from commit f34176b) Co-authored-by: Paul Moore <[email protected]>
1 parent 9345eea commit 9407631

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fixed an issue where writing more than 32K of Unicode output to the console screen in one go can result in mojibake.

Modules/_io/winconsoleio.c

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -956,7 +956,7 @@ _io__WindowsConsoleIO_write_impl(winconsoleio *self, Py_buffer *b)
956956
{
957957
BOOL res = TRUE;
958958
wchar_t *wbuf;
959-
DWORD len, wlen, n = 0;
959+
DWORD len, wlen, orig_len, n = 0;
960960
HANDLE handle;
961961

962962
if (self->fd == -1)
@@ -986,6 +986,21 @@ _io__WindowsConsoleIO_write_impl(winconsoleio *self, Py_buffer *b)
986986
have to reduce and recalculate. */
987987
while (wlen > 32766 / sizeof(wchar_t)) {
988988
len /= 2;
989+
orig_len = len;
990+
/* Reduce the length until we hit the final byte of a UTF-8 sequence
991+
* (top bit is unset). Fix for github issue 82052.
992+
*/
993+
while (len > 0 && (((char *)b->buf)[len-1] & 0x80) != 0)
994+
--len;
995+
/* If we hit a length of 0, something has gone wrong. This shouldn't
996+
* be possible, as valid UTF-8 can have at most 3 non-final bytes
997+
* before a final one, and our buffer is way longer than that.
998+
* But to be on the safe side, if we hit this issue we just restore
999+
* the original length and let the console API sort it out.
1000+
*/
1001+
if (len == 0) {
1002+
len = orig_len;
1003+
}
9891004
wlen = MultiByteToWideChar(CP_UTF8, 0, b->buf, len, NULL, 0);
9901005
}
9911006
Py_END_ALLOW_THREADS

0 commit comments

Comments
 (0)