Skip to content

[bug] Segmentation Fault #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MayfeelYang opened this issue Aug 24, 2017 · 12 comments
Closed

[bug] Segmentation Fault #588

MayfeelYang opened this issue Aug 24, 2017 · 12 comments
Labels
Milestone

Comments

@MayfeelYang
Copy link

Description

This crash occurs at attrs.c:1492. In function ISURLCodePoint, the variable c equals to 712380 (in this case). When function ISALNUM(c) is called, it can cause Segmentation Fault.

Version

5.5.31
2017.05.30

Address Sanitizer Output

==30336==ERROR: AddressSanitizer: SEGV on unknown address 0x7fb4406d2c5a (pc 0x000000524eb2 bp 0x7fffa8d564b0 sp 0x7fffa8d56360 T0)
==30336==The signal is caused by a READ memory access.
    #0 0x524eb1 in IsURLCodePoint /home/fuzzing/MyFuzzerTarget/tidy-html5/src/attrs.c:1492:12
    #1 0x51ff8a in prvTidyCheckUrl /home/fuzzing/MyFuzzerTarget/tidy-html5/src/attrs.c:1570:15
    #2 0x57e2c7 in ParseDocTypeDecl /home/fuzzing/MyFuzzerTarget/tidy-html5/src/lexer.c:4382:21
    #3 0x5744c1 in GetTokenFromStream /home/fuzzing/MyFuzzerTarget/tidy-html5/src/lexer.c:3107:32
    #4 0x56c4b0 in prvTidyGetToken /home/fuzzing/MyFuzzerTarget/tidy-html5/src/lexer.c:2591:12
    #5 0x552b61 in prvTidyParseDocument /home/fuzzing/MyFuzzerTarget/tidy-html5/src/parser.c:4759:20
    #6 0x50e29a in prvTidyDocParseStream /home/fuzzing/MyFuzzerTarget/tidy-html5/src/tidylib.c:1453:9
    #7 0x50ad7f in tidyDocParseFile /home/fuzzing/MyFuzzerTarget/tidy-html5/src/tidylib.c:1115:18
    #8 0x50a8cc in tidyParseFile /home/fuzzing/MyFuzzerTarget/tidy-html5/src/tidylib.c:1052:12
    #9 0x4fd270 in main /home/fuzzing/MyFuzzerTarget/tidy-html5/console/tidy.c:2302:22
    #10 0x7fb440430ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
    #11 0x42301b in _start (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x42301b)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/fuzzing/MyFuzzerTarget/tidy-html5/src/attrs.c:1492:12 in IsURLCodePoint

GDB Information

1492	    return ISALNUM( c ) ||
(gdb) print c
$12 = 712380

PoC

Contact me if you need Poc file at [email protected] or [email protected]

@balthisar
Copy link
Member

@MayfeelYang, can you provide an input file that causes this?

@MayfeelYang
Copy link
Author

@balthisar I have send an email to you.

@balthisar
Copy link
Member

@MayfeelYang, which platform are you running on? With ASan enabled on macOS (64-bit), the sample file isn't generating any runtime faults for me. I am using current next branch, but I don't see any commits going back to 5.5.30 that would have addressed this.

poc-tidy.zip

@MayfeelYang
Copy link
Author

@balthisar

Environment:

Operating System:

uname -a
Linux fuzzing-virtual-machine 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Reproducing: gcc

mkdir build
cd build
cmake ../
make

Exception: gcc

......
line 2 column 121 - Warning: replacing invalid UTF-8 bytes (char. code U+00AD)
line 2 column 122 - Warning: replacing invalid UTF-8 bytes (char. code U+00AD)
line 2 column 123 - Warning: replacing invalid UTF-8 bytes (char. code U+00AD)
line 2 column 124 - Warning: replacing invalid UTF-8 bytes (char. code U+00AD)
line 3 column 12 - Warning: replacing invalid UTF-8 bytes (char. code U+00FF)
Segmentation fault

Reproducing: clang

mkdir build
cd build
CC="clang -g -fsanitize=address -fsanitize-coverage=bb " CXX="clang++ -g -fsanitize=address -fsanitize-coverage=bb" cmake ../
make

Exception: clang

==11842==ERROR: AddressSanitizer: SEGV on unknown address 0x7f630db13c5a (pc 0x000000524eb2 bp 0x7fff15ac0a90 sp 0x7fff15ac0940 T0)
==11842==The signal is caused by a READ memory access.
    #0 0x524eb1  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x524eb1)
    #1 0x51ff8a  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x51ff8a)
    #2 0x57e2c7  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x57e2c7)
    #3 0x5744c1  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x5744c1)
    #4 0x56c4b0  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x56c4b0)
    #5 0x552b61  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x552b61)
    #6 0x50e29a  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x50e29a)
    #7 0x50ad7f  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x50ad7f)
    #8 0x50a8cc  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x50a8cc)
    #9 0x4fd270  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x4fd270)
    #10 0x7f630d871ec4  (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #11 0x42301b  (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x42301b)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/fuzzing/MyFuzzerTarget/tidy-html5/build/tidy+0x524eb1) 
==11842==ABORTING

@geoffmcl
Copy link
Contributor

geoffmcl commented Sep 1, 2017

@MayfeelYang thank you for reporting a bug that shows up when using the AddressSanitizer: on a file with very bad utf-8 sequences...

And I can certainly repeat it in linux, using the sample file you supplied... that is for sure a nasty file, but certainly tidy should not have a problem...

While not exactly the same topic, I made a WIN32 fix in issue #352 for this... specifically my comment-299720449... but then only for WIN32...

As stated there, the Windows MSN documentation is quite clear that the service isalnum(c) will trigger an assert if passed 0xfffd - ie the internal value after getting 0x80. From the MSN site "The behavior of isalnum and _isalnum_l is undefined if c is not EOF or in the range 0 through 0xFF, inclusive." Ugh!

Maybe it is the same or similar in some linux implementations... and @balthisar, maybe the macOS (64-bit) runtime libraries do not have this problem... are you building the Debug version? I only see it in windows if -DNDEBUG is not defined, and therefore assert is active...

The cmake command line I use in linux is -

$ cd build/cmake
$ cmake ../.. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_FLAGS=-fsanitize=address \
-DCMAKE_VERBOSE_MAKEFILE=ON
$ make

The adding of -DCMAKE_VERBOSE_MAKEFILE=ON, allows me to see the full compile lines, especially to see the -g added for the debug version, and of course -fsanitize=address, and certainly no -DNDEBUG, etc...

I am still on vacation until mid Sept, but just briefly back at my machines, so prepared the following patch -

diff --git a/src/attrs.c b/src/attrs.c
index a4cb379..bda6f01 100644
--- a/src/attrs.c
+++ b/src/attrs.c
@@ -1475,14 +1475,15 @@ static void CheckLowerCaseAttrValue( TidyDocImpl* doc, Node *node, AttVal *attva
 }
 
 /* methods for checking value of a specific attribute */
-#ifdef _WIN32
+/* Issue #588 - use simple macros only!
+   Seems 'isalnum(c)' is undefined and can
+   cause an assert or a SIGSEGV in some libraries
+   if 'c' is not EOF, or in the range 0 to 0xff,
+   so avoid using it. */
 #define ISUPPER(a) ((a >= 'A') && (a <= 'Z'))
 #define ISLOWER(a) ((a >= 'a') && (a <= 'z'))
 #define ISNUMERIC(a) ((a >= '0') && (a <= '9'))
 #define ISALNUM(a) (ISUPPER(a) || ISLOWER(a) || ISNUMERIC(a))
-#else
-#define ISALNUM(a)  isalnum(a)
-#endif
 
 static Bool IsURLCodePoint( ctmbstr p, uint *increment )
 {

As suggested back in #352, the use of these simple macros should work in all cases...

@MayfeelYang hope you get a chance to try this patch... it certainly fixes the problem in my linux machine... thanks...

@geoffmcl geoffmcl added the Bug label Sep 1, 2017
@geoffmcl geoffmcl added this to the 5.5 milestone Sep 1, 2017
@fgeek
Copy link

fgeek commented Sep 2, 2017

Please use CVE-2017-13692 for this issue.

@geoffmcl
Copy link
Contributor

Has anyone had a chance to test my patch... namely use only the macro in place of isalnum? Does this fix the problem? Thanks...

@fgeek thanks for the CVE-2017-13692 link, but I will leave others to deal with that...

@fgeek
Copy link

fgeek commented Sep 17, 2017

@geoffmcl What do you mean by deal with that? I just linked it here so that upstream is aware of the CVE identifier. I did not request it. Just cross-referencing from the CVE database.

@geoffmcl
Copy link
Contributor

@fgeek as stated, thanks for the cross-reference, but will leave others, probably the person who opened it, to deal with that...

What I need here is someone to test and confirm the patch is good so I can push it as a fix... thanks...

@MayfeelYang
Copy link
Author

@geoffmcl I have tested and confirmed the patch, it's good.

@balthisar
Copy link
Member

Added #622 with @geoffmcl's patch, which works nicely.

@balthisar
Copy link
Member

Merged, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants