Skip to content

bpo-35194: cjkcodec: check the encoded value is not truncated #10432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 29, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 20 additions & 18 deletions Modules/cjkcodecs/cjkcodecs.h
Original file line number Diff line number Diff line change
Expand Up @@ -149,40 +149,42 @@ static const struct dbcs_map *mapping_list;
writer->pos += 2; \
} while (0)

#define OUTBYTE1(c) \
do { ((*outbuf)[0]) = (c); } while (0)
#define OUTBYTE2(c) \
do { ((*outbuf)[1]) = (c); } while (0)
#define OUTBYTE3(c) \
do { ((*outbuf)[2]) = (c); } while (0)
#define OUTBYTE4(c) \
do { ((*outbuf)[3]) = (c); } while (0)
#define OUTBYTEI(c, i) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to convert this macro to a static inline function? I'm trying to slowly convert macros to static inline functions. It avoids uglyness of macros like do { ... } while (0). See https://bugs.python.org/issue35059

For example, a function would require a specific type for c :-) Maybe DBCHAR type?

Copy link
Contributor Author

@izbyshev izbyshev Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of macros for the purpose of catching conversions changing the mathematical value is that (unsigned char)(c) == (c) checks it for any integer type of c. Note that mathematical values of both operands are never changed in this comparison, at least if we consider C99-conforming compilers and don't take into account platforms where a byte is not 8-bit.

If we used a function instead, c would always be converted to its parameter type. No matter what type we choose, even the widest one, the mathematical value of the argument can be changed (if not because of truncation, than because of conversions between signed and unsigned types). So I don't see how static inline functions can help in this particular issue. Macros may be ugly, but they may also be a powerful tool for specific tasks.

do { \
assert((unsigned char)(c) == (c)); \
((*outbuf)[i]) = (c); \
} while (0)

#define OUTBYTE1(c) OUTBYTEI(c, 0)
#define OUTBYTE2(c) OUTBYTEI(c, 1)
#define OUTBYTE3(c) OUTBYTEI(c, 2)
#define OUTBYTE4(c) OUTBYTEI(c, 3)

#define WRITEBYTE1(c1) \
do { \
REQUIRE_OUTBUF(1); \
(*outbuf)[0] = (c1); \
OUTBYTE1(c1); \
} while (0)
#define WRITEBYTE2(c1, c2) \
do { \
REQUIRE_OUTBUF(2); \
(*outbuf)[0] = (c1); \
(*outbuf)[1] = (c2); \
OUTBYTE1(c1); \
OUTBYTE2(c2); \
} while (0)
#define WRITEBYTE3(c1, c2, c3) \
do { \
REQUIRE_OUTBUF(3); \
(*outbuf)[0] = (c1); \
(*outbuf)[1] = (c2); \
(*outbuf)[2] = (c3); \
OUTBYTE1(c1); \
OUTBYTE2(c2); \
OUTBYTE3(c3); \
} while (0)
#define WRITEBYTE4(c1, c2, c3, c4) \
do { \
REQUIRE_OUTBUF(4); \
(*outbuf)[0] = (c1); \
(*outbuf)[1] = (c2); \
(*outbuf)[2] = (c3); \
(*outbuf)[3] = (c4); \
OUTBYTE1(c1); \
OUTBYTE2(c2); \
OUTBYTE3(c3); \
OUTBYTE4(c4); \
} while (0)

#define _TRYMAP_ENC(m, assi, val) \
Expand Down