Concerns about integer vs floating-point instructions on x86 #125

Maratyszcza · 2019-10-28T10:11:29Z

In SSE and AVX instruction sets on x86 many instructions have separate integer, single-precision, and double-precision forms, e.g. MOVDQU/MOVUPS/MOVUPD. On "big" Intel and AMD cores, there is an extra penalty if a register produced by an integer SIMD op is consumed by a floating-point SIMD op, and vice versa.

However, WebAssembly SIMD doesn't make distinction between e.g. integer & FP loads, and although this information can, in theory, be reconstructed from instruction stream, such reconstruction requires expensive analysis passes, which streaming WebAssembly engines can not afford.

There are only few classes of ops have separate integer / floating-point instructions on x86:

Loads and stores
Shuffles
Broadcasts ("load-and-splat")
Binary logic (AND, OR, XOR, ANDNOT)
Blends

I think it is worth to consider splitting corresponding WebAssembly instructions into separate integer and floating-point variants in the SIMD spec. Initially both compilers and WAsm engines can treat both the integer and the floating-point variants the same, but at least it will allow to properly fix it in the future. Here is the list of instructions that would need two forms:

v128.const
v8x16.shuffle
v128.and
v128.or
v128.xor
v128.not
v128.andnot
v128.bitselect (decomposed into AND, ANDNOT, and OR on x86)
v128.load
v8x16.load_splat
v16x8.load_splat
v32x4.load_splat
v64x2.load_splat
v128.store

Note that the problem is specific to the distinction between integer and floating-point SIMD instructions on x86. ARM NEON doesn't distinguish between integer/floating-point variants at ISA level, and as far as I know no x86 CPUs distinguish between "double-precision" (e.g. ANDPD) and "single-precision" (e.g. ANDPS) instructions.

The text was updated successfully, but these errors were encountered:

AndrewScheidecker · 2019-10-28T12:09:18Z

FWIW, there's some past discussion about this here: #1 (comment)

nfrechette · 2019-10-28T13:16:51Z

I wrote about this last week on my blog here. I discuss real world use cases for this and performance measurements with/without. I only mention quaternion math related functions but usage of these instructions happens in lots of other code.

ARM64 seems to suffer when using XOR with floating point inputs. It isn't impossible that it has a similar penalty internally but no instruction to bypass it. Different chips have different performance here, you can see numbers from my Pixel 3 and an iPad if you follow the links in my post. Performance ranged from slightly worse to much worse. Perhaps someday we'll see a NEON extension that will add these instructions as well. I just can't seem to find good ARM internal documents to really shine light on this and I don't have time to measure myself.

dtig · 2019-12-19T19:38:59Z

Labeling this as pending data as the result of the discussion in 10/22/2019 was to have some benchmarks to see how this affects usage in practice. (#121)

dtig · 2020-05-20T23:51:54Z

Following up, the notes have an AI for @penzn to see if there's any benchmarking data here to share. Is this still the plan? If not, given that we won't be adding separate integer/floating point ops at this stage, I would suggest we close this issue.

midnight-dev · 2020-05-21T00:52:54Z

Given the late phase, separate ops won't be implemented regardless of benchmarks, no? In which case, might as well close it.

I think it'd still be good to have a few benchmark samples. I'm planning to use some of SIMD for matrix & quaternion math and model volumetrics in the next year for native-grade performance, but an implicit conversion penalty on every op or needing to waste cycles on testing types may derail my plans if the resulting overhead is too great.

Assuming this is set in stone for now, could this be revisited for the next revision of wasm SIMD?

dtig · 2020-12-11T18:46:11Z

Closing as per #396.

Maratyszcza changed the title ~~Concerns about difference integer & floating-point instructions on x86~~ Concerns about integer vs floating-point instructions on x86 Oct 28, 2019

Maratyszcza mentioned this issue Oct 28, 2019

SIMD Sync meeting 10/22/2019 Agenda #121

Closed

dtig added the pending prototype data label Dec 19, 2019

abrown mentioned this issue Feb 28, 2020

Too many raw_bitcasts in SIMD code bytecodealliance/wasmtime#1147

Open

arunetm mentioned this issue Nov 16, 2020

Tracking issue for issues marked as pending prototype data #396

Closed

2 tasks

dtig closed this as completed Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concerns about integer vs floating-point instructions on x86 #125

Concerns about integer vs floating-point instructions on x86 #125

Maratyszcza commented Oct 28, 2019 •

edited

Loading

AndrewScheidecker commented Oct 28, 2019

Uh oh!

nfrechette commented Oct 28, 2019

Uh oh!

dtig commented Dec 19, 2019

Uh oh!

dtig commented May 20, 2020

Uh oh!

midnight-dev commented May 21, 2020

Uh oh!

dtig commented Dec 11, 2020

Uh oh!

Concerns about integer vs floating-point instructions on x86 #125

Concerns about integer vs floating-point instructions on x86 #125

Comments

Maratyszcza commented Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AndrewScheidecker commented Oct 28, 2019

Uh oh!

nfrechette commented Oct 28, 2019

Uh oh!

dtig commented Dec 19, 2019

Uh oh!

dtig commented May 20, 2020

Uh oh!

midnight-dev commented May 21, 2020

Uh oh!

dtig commented Dec 11, 2020

Uh oh!

Maratyszcza commented Oct 28, 2019 •

edited

Loading