Skip to content

Commit a6261ca

Browse files
[fixup] Misc small fixes
1 parent 6d3b7ea commit a6261ca

File tree

3 files changed

+26
-20
lines changed

3 files changed

+26
-20
lines changed

main/acle.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2354,7 +2354,7 @@ be found in [[BA]](#BA).
23542354
| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 |
23552355
| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 |
23562356
| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F |
2357-
| [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions | 1 |
2357+
| [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions (FEAT_LUT) | 1 |
23582358
| [`__ARM_FEATURE_MATMUL_INT8`](#availability-of-armv8.6-a-integer-matrix-multiply-intrinsics) | Integer Matrix Multiply extension (Armv8.6-A, optional Armv8.2-A, Armv8.3-A, Armv8.4-A, Armv8.5-A) | 1 |
23592359
| [`__ARM_FEATURE_MEMORY_TAGGING`](#memory-tagging) | Memory Tagging (Armv8.5-A) | 1 |
23602360
| [`__ARM_FEATURE_MOPS`](#memcpy-family-of-memory-operations-standarization-instructions---mops) | `memcpy`, `memset`, and `memmove` family of operations standardization instructions | 1 |
@@ -2379,7 +2379,7 @@ be found in [[BA]](#BA).
23792379
| [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics) | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64) | 1 |
23802380
| [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics) | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64) | 1 |
23812381
| [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme) | Support for the `arm_locally_streaming` attribute | 1 |
2382-
| [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions | 1 |
2382+
| [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions (FEAT_SME_LUTv2) | 1 |
23832383
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
23842384
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
23852385
| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
@@ -9067,6 +9067,7 @@ Floating-point absolute maximum (predicated).
90679067
svfloat16_t svamax[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90689068
svfloat16_t svamax[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90699069

9070+
// Variants are also available for: _f32 and _f64
90709071
svfloat16_t svamax[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
90719072
svfloat16_t svamax[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
90729073
svfloat16_t svamax[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
@@ -9081,6 +9082,7 @@ Floating-point absolute minimum (predicated).
90819082
svfloat16_t svamin[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90829083
svfloat16_t svamin[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90839084

9085+
// Variants are also available for: _f32 and _f64
90849086
svfloat16_t svamin[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
90859087
svfloat16_t svamin[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
90869088
svfloat16_t svamin[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
@@ -9113,7 +9115,7 @@ Lookup table read with 4-bit indices.
91139115

91149116
// Variant are also available for: _u16, _f16, _bf16
91159117
svint16_t svluti4_lane[_s16](svint16_t table, svuint8_t indices, uint64_t imm_idx);
9116-
svint16_t svluti4_lane[_s16]_x2(svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
9118+
svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
91179119
```
91189120

91199121
# SME language extensions and intrinsics
@@ -12460,8 +12462,8 @@ Move vector register to ZT0.
1246012462

1246112463
Lookup table read with 4-bit indexes and 8-bit elements.
1246212464
``` c
12463-
// Variants are also available for: _s8
12464-
svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
12465+
// Variants are also available for: _u8
12466+
svint8x4_t svluti4_zt_s8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
1246512467
```
1246612468

1246712469
# M-profile Vector Extension (MVE) intrinsics

neon_intrinsics/advsimd.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4547,21 +4547,21 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``.
45474547
| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures |
45484548
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--------------------------------------------|--------------------|---------------------------|
45494549
| <code>uint8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_u8" target="_blank">vluti4q_lane_u8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4550-
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s8" target="_blank">vluti4q_lane_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4551-
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p8" target="_blank">vluti4q_lane_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45524550
| <code>uint8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u8" target="_blank">vluti4q_laneq_u8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4551+
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s8" target="_blank">vluti4q_lane_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45534552
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s8" target="_blank">vluti4q_laneq_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4553+
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p8" target="_blank">vluti4q_lane_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45544554
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p8" target="_blank">vluti4q_laneq_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4555-
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u16_x2" target="_blank">vluti4q_laneq_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4556-
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s16_x2" target="_blank">vluti4q_laneq_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4557-
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_f16_x2" target="_blank">vluti4q_laneq_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4558-
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_bf16_x2" target="_blank">vluti4q_laneq_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4559-
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p16_x2" target="_blank">vluti4q_laneq_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45604555
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_u16_x2" target="_blank">vluti4q_lane_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4556+
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u16_x2" target="_blank">vluti4q_laneq_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45614557
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s16_x2" target="_blank">vluti4q_lane_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4558+
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s16_x2" target="_blank">vluti4q_laneq_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45624559
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_f16_x2" target="_blank">vluti4q_lane_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4560+
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_f16_x2" target="_blank">vluti4q_laneq_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45634561
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_bf16_x2" target="_blank">vluti4q_lane_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4562+
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_bf16_x2" target="_blank">vluti4q_laneq_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45644563
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p16_x2" target="_blank">vluti4q_lane_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4564+
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p16_x2" target="_blank">vluti4q_laneq_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45654565

45664566
## Crypto
45674567

0 commit comments

Comments
 (0)