Wrong code with optimization on i386 FreeBSD #40569

llvmbot · 2019-03-25T18:04:28Z


Bugzilla Link	41224
Version	7.0
OS	FreeBSD
Attachments	Program that demonstrates the issue, Assembly when a.c is compiled without -DKLUDGE, Assembly when a.c is compiled with -DKLUDGE
Reporter	LLVM Bugzilla Contributor
CC	@andykaylor,@topperc,@DimitryAndric,@emaste

Extended Description

The attached testcase, a.c, demonstrates a code generation issue on FreeBSD running on an i686 class hardware (i.e., 32-bit i386/387). FreeBSD sets the i387 FPU to a 53-bit precision when the FPU is first accessed. clang or llvm seems to have no knowledge of this setting and unconditionally assumes a 64-bit precision. This leads to wrong for floating point codes that use the 32-bit float type when optimization is used. Consider,

gcc8 (FreeBSD Ports Collection) 8.3.0

gcc8 -fno-builtin -O0 -o z a.c -lm && ./z
gcc8 -fno-builtin -O1 -o z a.c -lm && ./z
gcc8 -fno-builtin -O2 -o z a.c -lm && ./z
gcc8 -fno-builtin -O3 -o z a.c -lm && ./z

The above command lines yield

Maximum ULP: 2.297073

of ULP > 21: 0

This is the expected result.

gcc8 -fno-builtin -O0 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O1 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O2 -DKLUDGE -o z a.c -lm && ./z
gcc8 -fno-builtin -O3 -DKLUDGE -o z a.c -lm && ./z

The above command lines yield

Maximum ULP: 2.297073

of ULP > 21: 0

This is the expected result.

Now, consider

FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250)
(based on LLVM 7.0.1)
Target: i386-unknown-freebsd13.0
Thread model: posix

/usr/bin/clang -fno-builtin -O0 -o z a.c -lm && ./z

The above command line yields

Maximum ULP: 2.297073

of ULP > 21: 0

This is the expected result.

/usr/bin/clang -fno-builtin -O1 -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O2 -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O3 -o z a.c -lm && ./z

The above command lines yield

Maximum ULP: 23.061242

of ULP > 21: 39

This is not the expected result. In fact, in my numerical testsuite I have observed 6 digit Max ULP estimates (i.e., only a single digit is correct).

/usr/bin/clang -fno-builtin -O0 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O1 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O2 -DKLUDGE -o z a.c -lm && ./z
/usr/bin/clang -fno-builtin -O3 -DKLUDGE -o z a.c -lm && ./z

The above command lines yield

Maximum ULP: 2.297073

of ULP > 21: 0

which is again the expected results. The -DKLUDGE option causes
the source to use 'volatile float x, y' instead of just 'float x, y'.
AFAICT, from the generated asm (see attachments), the use of volatile
forces clang to spill/reload x, y (thus, using the correct precision
for the type).

llvmbot · 2019-03-25T18:07:21Z

Changed the "Importance" field to "normal".

llvmbot · 2019-03-25T18:10:49Z

A discussion of the bug can be found in the FreeBSD toolchain mailing list archive at

https://lists.freebsd.org/pipermail/freebsd-toolchain/2019-March/004458.html

topperc · 2019-03-26T01:28:26Z

Myself and Andy Kaylor here at Intel spent a great deal of time playing around with this today using both clang and gcc. I don't believe the precision being set to 53 is necessary to hit the issue. I was able to get the test to report errors on the normal linux configuration.

As I think we mentioned in the freebsd mailing list, we are passing x and y to dp_csinh without rounding to float after the additions. We spill them around the call to csinhf using a 64 bit stack slot. This is due to the X87 codegen treating an fp_extend as nothing more than a copy from float register class to double register class. And the register coalescer rewriting some of the machine IR to use the double register class. This causes spill slot size to be calculated as 64-bits.

Using volatile circumvents this and forces a store as float instead of the spill. This causes the value to be rounded to float before being extended to double.

gcc's codegen seems to be affected by -std=c11 vs -std=gnu11. I believe the difference is really -fexcess-precision=standard vs -fexcess-precision=fast. When -fexcess-precision=fast is in effect, gcc's maximum ULP gets worse if dp_csinh is called before csinhf. Though it does not exceed the limit of 21.

It looks as though -fexcess-precision=standard causes gcc to insert intermediate casts to long double in their intermediate representation. At least on linux. That's what I observed from dumping the output of the 004t.original in godbolt. They might be casts to double on freebsd when the precision is 53 bits. I assume this has some effect on how instructions are generated later.

Another interesting note, I noticed clang does set FLT_EVAL_METHOD to 1 instead of 2 on some versions of NetBSD without SSE. But not for FreeBSD and we don't seem to do anything with the information other than set the define.

llvmbot · 2019-03-26T18:00:52Z

Myself and Andy Kaylor here at Intel spent a great deal of time playing
around with this today using both clang and gcc. I don't believe the
precision being set to 53 is necessary to hit the issue. I was able to get
the test to report errors on the normal linux configuration.

Craig,

Thanks for taking a look at this issue. FreeBSD on i386/387
has been using 53-bit precision for likely more than 2 decades.
To save you some code spelunking, one finds (sorry about long url)

https://svnweb.freebsd.org/base/head/sys/x86/include/fpu.h?revision=314436view=markup

lines 186-208 are

/*
* The hardware default control word for i387's and later coprocessors is
* 0x37F, giving:
*
*      round to nearest
*      64-bit precision
*      all exceptions masked.
*
* FreeBSD/i386 uses 53 bit precision for things like fadd/fsub/fsqrt etc
* because of the difference between memory and fpu register stack arguments.
* If its using an intermediate fpu register, it has 80/64 bits to work
* with.  If it uses memory, it has 64/53 bits to work with.  However,
* gcc is aware of this and goes to a fair bit of trouble to make the
* best use of it.
*
* This is mostly academic for AMD64, because the ABI prefers the use
* SSE2 based math.  For FreeBSD/amd64, we go with the default settings.
*/
#define __INITIAL_FPUCW__       0x037F
#define __INITIAL_FPUCW_I386__  0x127F
#define __INITIAL_NPXCW__       __INITIAL_FPUCW_I386__
#define __INITIAL_MXCSR__       0x1F80
#define __INITIAL_MXCSR_MASK__  0xFFBF

The comment above that refers to GCC knowing about this setting can
be traced to

https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/freebsd.h?revision=267494&view=markup

lines 120-123 are

/* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
   compiler get the contents of <float.h> and std::numeric_limits correct. */
#undef TARGET_96_ROUND_53_LONG_DOUBLE
#define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)

I've looked through GCC sources, and know the above effects the
processing of gcc/config/i386/i386-modes.def, but that's as far
as I'm able to go at the moment.

--
steve

llvmbot · 2022-11-05T09:38:41Z

@llvm/issue-subscribers-backend-x86

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

thesamesam added the backend:X86 label Nov 5, 2022

efriedma-quic mentioned this issue Feb 18, 2023

numerical differences in x87 fpu code generated by clang-14 and clang-15 #60837

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong code with optimization on i386 FreeBSD #40569

Wrong code with optimization on i386 FreeBSD #40569

llvmbot commented Mar 25, 2019

llvmbot commented Mar 25, 2019

Uh oh!

llvmbot commented Mar 25, 2019

Uh oh!

topperc commented Mar 26, 2019

Uh oh!

llvmbot commented Mar 26, 2019 •

edited by RKSimon

Loading

Uh oh!

llvmbot commented Nov 5, 2022

Uh oh!

Wrong code with optimization on i386 FreeBSD #40569

Wrong code with optimization on i386 FreeBSD #40569

Comments

llvmbot commented Mar 25, 2019

Extended Description

of ULP > 21: 0

of ULP > 21: 0

of ULP > 21: 0

of ULP > 21: 39

of ULP > 21: 0

llvmbot commented Mar 25, 2019

Uh oh!

llvmbot commented Mar 25, 2019

Uh oh!

topperc commented Mar 26, 2019

Uh oh!

llvmbot commented Mar 26, 2019 • edited by RKSimon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 5, 2022

Uh oh!

llvmbot commented Mar 26, 2019 •

edited by RKSimon

Loading