-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Wrong code with optimization on i386 FreeBSD #40569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Changed the "Importance" field to "normal". |
A discussion of the bug can be found in the FreeBSD toolchain mailing list archive at https://lists.freebsd.org/pipermail/freebsd-toolchain/2019-March/004458.html |
Myself and Andy Kaylor here at Intel spent a great deal of time playing around with this today using both clang and gcc. I don't believe the precision being set to 53 is necessary to hit the issue. I was able to get the test to report errors on the normal linux configuration. As I think we mentioned in the freebsd mailing list, we are passing x and y to dp_csinh without rounding to float after the additions. We spill them around the call to csinhf using a 64 bit stack slot. This is due to the X87 codegen treating an fp_extend as nothing more than a copy from float register class to double register class. And the register coalescer rewriting some of the machine IR to use the double register class. This causes spill slot size to be calculated as 64-bits. Using volatile circumvents this and forces a store as float instead of the spill. This causes the value to be rounded to float before being extended to double. gcc's codegen seems to be affected by -std=c11 vs -std=gnu11. I believe the difference is really -fexcess-precision=standard vs -fexcess-precision=fast. When -fexcess-precision=fast is in effect, gcc's maximum ULP gets worse if dp_csinh is called before csinhf. Though it does not exceed the limit of 21. It looks as though -fexcess-precision=standard causes gcc to insert intermediate casts to long double in their intermediate representation. At least on linux. That's what I observed from dumping the output of the 004t.original in godbolt. They might be casts to double on freebsd when the precision is 53 bits. I assume this has some effect on how instructions are generated later. Another interesting note, I noticed clang does set FLT_EVAL_METHOD to 1 instead of 2 on some versions of NetBSD without SSE. But not for FreeBSD and we don't seem to do anything with the information other than set the define. |
Craig, Thanks for taking a look at this issue. FreeBSD on i386/387 https://svnweb.freebsd.org/base/head/sys/x86/include/fpu.h?revision=314436view=markup lines 186-208 are
The comment above that refers to GCC knowing about this setting can https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/freebsd.h?revision=267494&view=markup lines 120-123 are
I've looked through GCC sources, and know the above effects the -- |
@llvm/issue-subscribers-backend-x86 |
Extended Description
The attached testcase, a.c, demonstrates a code generation issue on FreeBSD running on an i686 class hardware (i.e., 32-bit i386/387). FreeBSD sets the i387 FPU to a 53-bit precision when the FPU is first accessed. clang or llvm seems to have no knowledge of this setting and unconditionally assumes a 64-bit precision. This leads to wrong for floating point codes that use the 32-bit float type when optimization is used. Consider,
gcc8 (FreeBSD Ports Collection) 8.3.0
gcc8 -fno-builtin -O0 -o z a.c -lm && ./z
gcc8 -fno-builtin -O1 -o z a.c -lm && ./z
gcc8 -fno-builtin -O2 -o z a.c -lm && ./z
gcc8 -fno-builtin -O3 -o z a.c -lm && ./z
The above command lines yield
Maximum ULP: 2.297073
of ULP > 21: 0
This is the expected result.
gcc8 -fno-builtin -O0 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O1 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O2 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O3 -DKLUDGE -o z a.c -lm && ./z
The above command lines yield
Maximum ULP: 2.297073
of ULP > 21: 0
This is the expected result.
Now, consider
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250)
(based on LLVM 7.0.1)
Target: i386-unknown-freebsd13.0
Thread model: posix
/usr/bin/clang -fno-builtin -O0 -o z a.c -lm && ./z
The above command line yields
Maximum ULP: 2.297073
of ULP > 21: 0
This is the expected result.
/usr/bin/clang -fno-builtin -O1 -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O2 -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O3 -o z a.c -lm && ./z
The above command lines yield
Maximum ULP: 23.061242
of ULP > 21: 39
This is not the expected result. In fact, in my numerical testsuite I have observed 6 digit Max ULP estimates (i.e., only a single digit is correct).
/usr/bin/clang -fno-builtin -O0 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O1 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O2 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O3 -DKLUDGE -o z a.c -lm && ./z
The above command lines yield
Maximum ULP: 2.297073
of ULP > 21: 0
which is again the expected results. The -DKLUDGE option causes
the source to use 'volatile float x, y' instead of just 'float x, y'.
AFAICT, from the generated asm (see attachments), the use of volatile
forces clang to spill/reload x, y (thus, using the correct precision
for the type).
The text was updated successfully, but these errors were encountered: