-
Notifications
You must be signed in to change notification settings - Fork 43
Concerns about integer vs floating-point instructions on x86 #125
Comments
FWIW, there's some past discussion about this here: #1 (comment) |
I wrote about this last week on my blog here. I discuss real world use cases for this and performance measurements with/without. I only mention quaternion math related functions but usage of these instructions happens in lots of other code. ARM64 seems to suffer when using XOR with floating point inputs. It isn't impossible that it has a similar penalty internally but no instruction to bypass it. Different chips have different performance here, you can see numbers from my Pixel 3 and an iPad if you follow the links in my post. Performance ranged from slightly worse to much worse. Perhaps someday we'll see a NEON extension that will add these instructions as well. I just can't seem to find good ARM internal documents to really shine light on this and I don't have time to measure myself. |
Labeling this as pending data as the result of the discussion in 10/22/2019 was to have some benchmarks to see how this affects usage in practice. (#121) |
Following up, the notes have an AI for @penzn to see if there's any benchmarking data here to share. Is this still the plan? If not, given that we won't be adding separate integer/floating point ops at this stage, I would suggest we close this issue. |
Given the late phase, separate ops won't be implemented regardless of benchmarks, no? In which case, might as well close it. I think it'd still be good to have a few benchmark samples. I'm planning to use some of SIMD for matrix & quaternion math and model volumetrics in the next year for native-grade performance, but an implicit conversion penalty on every op or needing to waste cycles on testing types may derail my plans if the resulting overhead is too great. Assuming this is set in stone for now, could this be revisited for the next revision of wasm SIMD? |
Closing as per #396. |
Uh oh!
There was an error while loading. Please reload this page.
In SSE and AVX instruction sets on x86 many instructions have separate integer, single-precision, and double-precision forms, e.g.
MOVDQU
/MOVUPS
/MOVUPD
. On "big" Intel and AMD cores, there is an extra penalty if a register produced by an integer SIMD op is consumed by a floating-point SIMD op, and vice versa.However, WebAssembly SIMD doesn't make distinction between e.g. integer & FP loads, and although this information can, in theory, be reconstructed from instruction stream, such reconstruction requires expensive analysis passes, which streaming WebAssembly engines can not afford.
There are only few classes of ops have separate integer / floating-point instructions on x86:
I think it is worth to consider splitting corresponding WebAssembly instructions into separate integer and floating-point variants in the SIMD spec. Initially both compilers and WAsm engines can treat both the integer and the floating-point variants the same, but at least it will allow to properly fix it in the future. Here is the list of instructions that would need two forms:
v128.const
v8x16.shuffle
v128.and
v128.or
v128.xor
v128.not
v128.andnot
v128.bitselect
(decomposed intoAND
,ANDNOT
, andOR
on x86)v128.load
v8x16.load_splat
v16x8.load_splat
v32x4.load_splat
v64x2.load_splat
v128.store
Note that the problem is specific to the distinction between integer and floating-point SIMD instructions on x86. ARM NEON doesn't distinguish between integer/floating-point variants at ISA level, and as far as I know no x86 CPUs distinguish between "double-precision" (e.g.
ANDPD
) and "single-precision" (e.g.ANDPS
) instructions.The text was updated successfully, but these errors were encountered: