-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Fix Small String Optimization misplaced 0 terminator #5890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,6 +129,7 @@ String::~String() { | |
// /*********************************************/ | ||
|
||
inline void String::init(void) { | ||
memset(sso_buf, 0, sizeof(sso_buf)); | ||
setSSO(false); | ||
setCapacity(0); | ||
setLen(0); | ||
|
@@ -161,11 +162,11 @@ unsigned char String::changeBuffer(unsigned int maxStrLen) { | |
return 1; | ||
} else { // if bufptr && !sso() | ||
// Using bufptr, need to shrink into sso_buff | ||
char temp[sizeof(sso_buf)]; | ||
char temp[sizeof(sso_buf)] = {0}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about the following, which would be more efficient:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess copying less may improve it a bit theoretically, but all other calls to I don't know how much impact the zeroing has at the time of construction (on the stack). But since we're also setting it to all zeros in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems only setting a string to "" would have any different behavior in this patch as-is, vs. @devyte's which would explicitly write a 0 at maxstrlen. I think his is correct (but again, I'm having trouble seeing where this actually changes behavior...) |
||
memcpy(temp, buffer(), maxStrLen); | ||
free(wbuffer()); | ||
setSSO(true); | ||
memcpy(wbuffer(), temp, maxStrLen); | ||
memcpy(wbuffer(), temp, sizeof(sso_buf)); | ||
return 1; | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,6 +83,12 @@ TEST_CASE("String constructors", "[core][String]") | |
REQUIRE(ssh == "3.14159_abcd"); | ||
String flash = (F("hello from flash")); | ||
REQUIRE(flash == "hello from flash"); | ||
const char textarray[6] = {'h', 'e', 'l', 'l', 'o', 0}; | ||
String hello(textarray); | ||
REQUIRE(hello == "hello"); | ||
String hello2; | ||
hello2 = textarray; | ||
REQUIRE(hello2 == "hello"); | ||
} | ||
|
||
TEST_CASE("String concantenation", "[core][String]") | ||
|
@@ -360,4 +366,8 @@ TEST_CASE("String SSO works", "[core][String]") | |
REQUIRE(s == "0123456789abcdefghijklmnopqrstu"); | ||
REQUIRE(s.length() == 31); | ||
} | ||
s = "0123456789abcde"; | ||
s = s.substring(s.indexOf('a')); | ||
REQUIRE(s == "abcde"); | ||
REQUIRE(s.length() == 5); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These and the prior added test pass on the unpatched core. Were they supposed to fail? I was hoping there would be something that fails like your app code did, but passes once the patch is applied. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not test the tests myself. These were merely added to see how often they fail, but I guess it also depends on how the data on the stack is being allocated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about this, but need @devyte's C++ internals help. Basically you want to get a RAM buffer of 16 bytes, fill it with 0xff, and then coerce a pointer to that buffer into a (String*). The bugger is you need to call the constructor on it, and I've no idea if that's possible explicitly in C++. -edit- |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see why the initial zeroing in ::init is needed, honestly. Can you explain? Once you apply a string of >12 chars to the String, a pointer and 2 length fields will be written over the 0s you wrote in ::init. At some later point you assign a 1-char string the SSO better write 0 to sso_buff[1] or it'll go poorly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did test it without this memset and then the garbled texts re-appeared in the use case I showed in the issue.
The main reason I placed this memset in the init, is because that's the only point where I knew for sure that any data present would not be worth keeping.
Immediately after setting this buffer to 0's, the mode is switched to non-SSO.
So then all calls made in init are only on the _ptr struct.
N.B. this struct is then also set to contain 0's, but it does not overlap all the space occupied by
sso_buf
In the constructor where I had the issues, only the
copy
is called afterinit
.This calls
reserve
, which callschangeBuffer
.In
changeBuffer
we enter this part, since theptr.buff
is set to 0 ininit
:Arduino/cores/esp8266/WString.cpp
Lines 156 to 162 in db88c82
So only SSO is set and nothing else.
Since nothing has changed to the SSO buffer, but SSO is now active, a call to
len()
may return something other than 0 (it is callingstrlen(sso_buf)
which may return anything)So the first byte of the buffer is not set to 0 and thus the String object may contain garbage and it is up to the compiler how this union is aligned.
If the pointer of
_ptr
is at the first byte of thesso_buf
, then it should already be a zero, but apparently the Windows build is not locating the_prt
struct at the same address assso_buf
.By wiping as much as possible to 0 in this union struct, it is guaranteed it will contain zeroes at the start.
Also I guess it could differ between builds where this union is located and I don't know what all possible offsets are between the start of
_ptr
andsso_buf
.So that's why I think it is needed to have the
sso_buf
initialized to 0.Does that make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD64 compiles are going to align things somewhat differently than X86_32 ones do (the pointer, for example, is gonna be 2x the size...). I don't know if Windows supports it w/the mingw(?) toolchain, but there is a 32-bit compile mode in the makefile which should help get closer to on-board alignment.
I'm not sure I fully follow, but my point is in the general case the sequence you've got is not guaranteed to happen, and there's possibility for the early bytes of the union to contain non-0s after you've called ::init and when switching. 0-ing out at ::init is safe, but I think it's unneeded.
I think the main problem is you found that when a SSO string was copied into the struct it did not get explicitly terminated w/a \0. Your patch does fix that, later on in the SSO assignment changeBuffer. My gut says that dropping this memset but leaving everything else untouched won't change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, leaving out this memset did not fix the issue (tested it).
I'm also not 100% convinced that whatever this init does fix will cover all situations, but at least it does something.
Maybe the suggestion I made yesterday does cover all and then this part in the init can be removed?
I don't understand what you tried to explain with the difference on 64-bit and 32 bit platforms.
Do the compilers generate different code even though they compile for the same 32 bit platform?
I know the
size_t
,ptrdiff_t
and pointers have different sizes when compiling for 64 bit or 32 bit platforms, but when they compile for the same 32 bit platform they should align structures in memory the same, right?Otherwise we might have a problem when serializing structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the host testing. It can run either 32-bit or 64-bit on your system, depending on the compilers available and
make
options. Thegcc-xtensa
binaries should produce the exact same code under either host OS, of course.