Skip to content

Tracking Issue for stdarch_s390x #135681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
folkertdev opened this issue Jan 18, 2025 · 1 comment
Open

Tracking Issue for stdarch_s390x #135681

folkertdev opened this issue Jan 18, 2025 · 1 comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC O-SystemZ Target: SystemZ processors (s390x) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@folkertdev
Copy link
Contributor

folkertdev commented Jan 18, 2025

Feature gate: #![feature(stdarch_s390x)]

This is a tracking issue for the s390x (aka SystemZ) intrinsics in core::arch::s390x.

Public API

Everything in core::arch::s390x.

Missing instructions

based on the clang vecintrin.h (roungly similar to the linkable GCC vecintrin.h).

blocked on #137447

from nnp-assist, current qemu traps on these

  • vec_extend_to_fp32_hi
  • vec_extend_to_fp32_lo
  • vec_round_from_fp32
  • vec_convert_to_fp16
  • vec_convert_from_fp16

deprecated functions

  • vec_ctd
  • vec_ctd_s64
  • vec_ctd_u64
  • vec_ctsl
  • vec_ctul
  • vec_ld2f
  • vec_st2f
  • vec_xstd2
  • vec_xstw4
  • vec_xld2
  • vec_xlw4
  • vec_permi

Steps / History

  • Implementation: #...
  • Final comment period (FCP)1
  • Stabilization PR

Unresolved Questions

  • None yet.

@rustbot label O-SystemZ
general s390x vector/intrinsics progress is tracked at #130869
cc @taiki-e

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@uweigand
Copy link
Contributor

The new machines IBM z17 and IBM LinuxONE Emperor 5 were recently announced. These machines implement the arch15 level of the z/Architecture. Support for this has been added to LLVM here: llvm/llvm-project@8424bf2

Support for the new architecture level also comes with a new revision of the vector intrinsics (implemented across GCC, LLVM, and the IBM compilers). It would be good to update the Rust implementation to match.

The new vecintrin.h file can be seen e.g. here: https://github.com/llvm/llvm-project/blob/8424bf207efd89eacf2fe893b67be98d535e1db6/clang/lib/Headers/vecintrin.h This implements the following set of changes compared to the previous version:

Generic cleanup

While reviewing the new changes, we noticed a number of inconsistencies and deficiencies in the existing intrinsics, which were cleaned up as part of the new revision. Specifically:

  • Added vec_and, vec_or, and vec_xor intrinsics. These are mostly redundant with the &, |, and ^ operators, but can also be used with floating-point vector arguments.
vec_and(__vector __bool char __a, __vector __bool char __b)
vec_and(__vector signed char __a, __vector signed char __b)
vec_and(__vector unsigned char __a, __vector unsigned char __b)
vec_and(__vector __bool short __a, __vector __bool short __b)
vec_and(__vector signed short __a, __vector signed short __b)
vec_and(__vector unsigned short __a, __vector unsigned short __b)
vec_and(__vector __bool int __a, __vector __bool int __b)
vec_and(__vector signed int __a, __vector signed int __b)
vec_and(__vector unsigned int __a, __vector unsigned int __b)
vec_and(__vector __bool long long __a, __vector __bool long long __b)
vec_and(__vector signed long long __a, __vector signed long long __b)
vec_and(__vector unsigned long long __a, __vector unsigned long long __b)
vec_and(__vector float __a, __vector float __b) [only with vector-enhancements-1]
vec_and(__vector double __a, __vector double __b)
vec_or(__vector __bool char __a, __vector __bool char __b)
vec_or(__vector signed char __a, __vector signed char __b)
vec_or(__vector unsigned char __a, __vector unsigned char __b)
vec_or(__vector __bool short __a, __vector __bool short __b)
vec_or(__vector signed short __a, __vector signed short __b)
vec_or(__vector unsigned short __a, __vector unsigned short __b)
vec_or(__vector __bool int __a, __vector __bool int __b)
vec_or(__vector signed int __a, __vector signed int __b)
vec_or(__vector unsigned int __a, __vector unsigned int __b)
vec_or(__vector __bool long long __a, __vector __bool long long __b)
vec_or(__vector signed long long __a, __vector signed long long __b)
vec_or(__vector unsigned long long __a, __vector unsigned long long __b)
vec_or(__vector float __a, __vector float __b) [only with vector-enhancements-1]
vec_or(__vector double __a, __vector double __b)
vec_xor(__vector __bool char __a, __vector __bool char __b)
vec_xor(__vector signed char __a, __vector signed char __b)
vec_xor(__vector unsigned char __a, __vector unsigned char __b)
vec_xor(__vector __bool short __a, __vector __bool short __b)
vec_xor(__vector signed short __a, __vector signed short __b)
vec_xor(__vector unsigned short __a, __vector unsigned short __b)
vec_xor(__vector __bool int __a, __vector __bool int __b)
vec_xor(__vector signed int __a, __vector signed int __b)
vec_xor(__vector unsigned int __a, __vector unsigned int __b)
vec_xor(__vector __bool long long __a, __vector __bool long long __b)
vec_xor(__vector signed long long __a, __vector signed long long __b)
vec_xor(__vector unsigned long long __a, __vector unsigned long long __b)
vec_xor(__vector float __a, __vector float __b) [only with vector-enhancements-1]
vec_xor(__vector double __a, __vector double __b)
  • The shift count operand for the full-vector shift intrinsics needs to have the count replicated across all 16 bytes according to the ISA. Therefore, the only type that makes sense for this operand is vector unsigned char. Also, the operand to be shifted should not be of any vector bool type since the result may not necessarily be a valid bool vector value. Added the following intrinsics:
vec_slb(__vector signed short __a, __vector unsigned char __b)
vec_slb(__vector unsigned short __a, __vector unsigned char __b)
vec_slb(__vector signed int __a, __vector unsigned char __b)
vec_slb(__vector unsigned int __a, __vector unsigned char __b)
vec_slb(__vector signed long long __a, __vector unsigned char __b)
vec_slb(__vector unsigned long long __a, __vector unsigned char __b)
vec_slb(__vector float __a, __vector unsigned char __b) [only with vector-enhancements-1]
vec_slb(__vector double __a, __vector unsigned char __b)
vec_srb(__vector signed short __a, __vector unsigned char __b)
vec_srb(__vector unsigned short __a, __vector unsigned char __b)
vec_srb(__vector signed int __a, __vector unsigned char __b)
vec_srb(__vector unsigned int __a, __vector unsigned char __b)
vec_srb(__vector signed long long __a, __vector unsigned char __b)
vec_srb(__vector unsigned long long __a, __vector unsigned char __b)
vec_srb(__vector float __a, __vector unsigned char __b) [only with vector-enhancements-1]
vec_srb(__vector double __a, __vector unsigned char __b)
vec_srab(__vector signed short __a, __vector unsigned char __b)
vec_srab(__vector unsigned short __a, __vector unsigned char __b)
vec_srab(__vector signed int __a, __vector unsigned char __b)
vec_srab(__vector unsigned int __a, __vector unsigned char __b)
vec_srab(__vector signed long long __a, __vector unsigned char __b)
vec_srab(__vector unsigned long long __a, __vector unsigned char __b)
vec_srab(__vector float __a, __vector unsigned char __b) [only with vector-enhancements-1]
vec_srab(__vector double __a, __vector unsigned char __b)

and deprecated those intrinsics:

vec_slb(__vector signed char __a, __vector signed char __b)
vec_slb(__vector unsigned char __a, __vector signed char __b)
vec_slb(__vector signed short __a, __vector signed short __b)
vec_slb(__vector signed short __a, __vector unsigned short __b)
vec_slb(__vector unsigned short __a, __vector signed short __b)
vec_slb(__vector unsigned short __a, __vector unsigned short __b)
vec_slb(__vector signed int __a, __vector signed int __b)
vec_slb(__vector signed int __a, __vector unsigned int __b)
vec_slb(__vector unsigned int __a, __vector signed int __b)
vec_slb(__vector unsigned int __a, __vector unsigned int __b)
vec_slb(__vector signed long long __a, __vector signed long long __b)
vec_slb(__vector signed long long __a, __vector unsigned long long __b)
vec_slb(__vector unsigned long long __a, __vector signed long long __b)
vec_slb(__vector unsigned long long __a, __vector unsigned long long __b)
vec_slb(__vector float __a, __vector signed int __b)
vec_slb(__vector float __a, __vector unsigned int __b)
vec_slb(__vector double __a, __vector signed long long __b)
vec_slb(__vector double __a, __vector unsigned long long __b)
vec_srb(__vector signed char __a, __vector signed char __b)
vec_srb(__vector unsigned char __a, __vector signed char __b)
vec_srb(__vector signed short __a, __vector signed short __b)
vec_srb(__vector signed short __a, __vector unsigned short __b)
vec_srb(__vector unsigned short __a, __vector signed short __b)
vec_srb(__vector unsigned short __a, __vector unsigned short __b)
vec_srb(__vector signed int __a, __vector signed int __b)
vec_srb(__vector signed int __a, __vector unsigned int __b)
vec_srb(__vector unsigned int __a, __vector signed int __b)
vec_srb(__vector unsigned int __a, __vector unsigned int __b)
vec_srb(__vector signed long long __a, __vector signed long long __b)
vec_srb(__vector signed long long __a, __vector unsigned long long __b)
vec_srb(__vector unsigned long long __a, __vector signed long long __b)
vec_srb(__vector unsigned long long __a, __vector unsigned long long __b)
vec_srb(__vector float __a, __vector signed int __b)
vec_srb(__vector float __a, __vector unsigned int __b)
vec_srb(__vector double __a, __vector signed long long __b)
vec_srb(__vector double __a, __vector unsigned long long __b)
vec_srab(__vector signed char __a, __vector signed char __b)
vec_srab(__vector unsigned char __a, __vector signed char __b)
vec_srab(__vector signed short __a, __vector signed short __b)
vec_srab(__vector signed short __a, __vector unsigned short __b)
vec_srab(__vector unsigned short __a, __vector signed short __b)
vec_srab(__vector unsigned short __a, __vector unsigned short __b)
vec_srab(__vector signed int __a, __vector signed int __b)
vec_srab(__vector signed int __a, __vector unsigned int __b)
vec_srab(__vector unsigned int __a, __vector signed int __b)
vec_srab(__vector unsigned int __a, __vector unsigned int __b)
vec_srab(__vector signed long long __a, __vector signed long long __b)
vec_srab(__vector signed long long __a, __vector unsigned long long __b)
vec_srab(__vector unsigned long long __a, __vector signed long long __b)
vec_srab(__vector unsigned long long __a, __vector unsigned long long __b)
vec_srab(__vector float __a, __vector signed int __b)
vec_srab(__vector float __a, __vector unsigned int __b)
vec_srab(__vector double __a, __vector signed long long __b)
vec_srab(__vector double __a, __vector unsigned long long __b)
vec_sld(__vector __bool char __a, __vector __bool char __b, int __c)
vec_sld(__vector __bool short __a, __vector __bool short __b, int __c)
vec_sld(__vector __bool int __a, __vector __bool int __b, int __c)
vec_sld(__vector __bool long long __a, __vector __bool long long __b, int __c)
  • Use the same set of supported operand types for vec_load_len/vec_store_len and vec_load_len_r/vec_store_len_r. Both now support only vector signed char and vector unsigned char. Added intrinsics:
vec_load_len_r(const signed char *__ptr, unsigned int __len)
vec_store_len_r(__vector signed char __vec, signed char *__ptr, unsigned int __len)

and deprecated the existing intrinsics:

vec_load_len(const signed short *__ptr, unsigned int __len)
vec_load_len(const unsigned short *__ptr, unsigned int __len)
vec_load_len(const signed int *__ptr, unsigned int __len)
vec_load_len(const unsigned int *__ptr, unsigned int __len)
vec_load_len(const signed long long *__ptr, unsigned int __len)
vec_load_len(const unsigned long long *__ptr, unsigned int __len)
vec_load_len(const float *__ptr, unsigned int __len)
vec_load_len(const double *__ptr, unsigned int __len)
vec_store_len(__vector signed short __vec, signed short *__ptr, unsigned int __len)
vec_store_len(__vector unsigned short __vec, unsigned short *__ptr, unsigned int __len)
vec_store_len(__vector signed int __vec, signed int *__ptr, unsigned int __len)
vec_store_len(__vector unsigned int __vec, unsigned int *__ptr, unsigned int __len)
vec_store_len(__vector signed long long __vec, signed long long *__ptr, unsigned int __len)
vec_store_len(__vector unsigned long long __vec, unsigned long long *__ptr, unsigned int __len)
vec_store_len(__vector float __vec, float *__ptr, unsigned int __len)
vec_store_len(__vector double __vec, double *__ptr, unsigned int __len)

Support for 128-bit integer vector types

One main feature of the arch15 ISA is support for a full set of arithmetical operations on 128-bit integer values held in vector registers. This is used to a large extent implicitly by the code-generator back end. However, there are also a number of operations that required intrinsics to fully exploit. We decided to add the following new vector types to be used with those intrinsics:

vector signed __int128
vector unsigned __int128
vector bool __int128

Note that since the vector length is only 128 bits, these vector types only contain a single element. They are still useful as they use a different ABI (passed in vector registers and not in memory), and it seems cleaner to consistently use "vector" types with the vector intrinsics.

Note that many operations on these types can actually be performed with prior versions of the ISA, so the types have been made available unconditionally. Many existing intrinsics have been extended to support the new types:

vec_perm(__vector signed __int128 __a, __vector signed __int128 __b, __vector unsigned char __c)
vec_perm(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned char __c)
vec_perm(__vector __bool __int128 __a, __vector __bool __int128 __b, __vector unsigned char __c)
vec_revb(__vector signed __int128 __vec)
vec_revb(__vector unsigned __int128 __vec)
vec_sel(__vector signed __int128 __a, __vector signed __int128 __b, __vector unsigned __int128 __c)
vec_sel(__vector signed __int128 __a, __vector signed __int128 __b, __vector __bool __int128 __c)
vec_sel(__vector __bool __int128 __a, __vector __bool __int128 __b, __vector unsigned __int128 __c)
vec_sel(__vector __bool __int128 __a, __vector __bool __int128 __b, __vector __bool __int128 __c)
vec_sel(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_sel(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector __bool __int128 __c)
vec_xl(long __offset, const signed __int128 *__ptr)
vec_xl(long __offset, const unsigned __int128 *__ptr)
vec_xst(__vector signed __int128 __vec, long __offset,
vec_xst(__vector unsigned __int128 __vec, long __offset,
vec_load_bndry(const signed __int128 *__ptr, unsigned short __len)
vec_load_bndry(const unsigned __int128 *__ptr, unsigned short __len)
vec_splats(signed __int128 __scalar)
vec_splats(unsigned __int128 __scalar)
vec_pack(__vector signed __int128 __a, __vector signed __int128 __b)
vec_pack(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_pack(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cmpeq(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_cmpeq(__vector signed __int128 __a, __vector signed __int128 __b)
vec_cmpeq(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cmpge(__vector signed __int128 __a, __vector signed __int128 __b)
vec_cmpge(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cmpgt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_cmpgt(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cmple(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cmplt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_and(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_and(__vector signed __int128 __a, __vector signed __int128 __b)
vec_and(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_or(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_or(__vector signed __int128 __a, __vector signed __int128 __b)
vec_or(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_xor(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_xor(__vector signed __int128 __a, __vector signed __int128 __b)
vec_xor(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_andc(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_andc(__vector signed __int128 __a, __vector signed __int128 __b)
vec_andc(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_nor(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_nor(__vector signed __int128 __a, __vector signed __int128 __b)
vec_nor(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_orc(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_orc(__vector signed __int128 __a, __vector signed __int128 __b)
vec_orc(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_nand(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_nand(__vector signed __int128 __a, __vector signed __int128 __b)
vec_nand(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_eqv(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_eqv(__vector signed __int128 __a, __vector signed __int128 __b)
vec_eqv(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_sll(__vector signed __int128 __a, __vector unsigned char __b)
vec_sll(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_slb(__vector signed __int128 __a, __vector unsigned char __b)
vec_slb(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_sld(__vector signed __int128 __a, __vector signed __int128 __b, int __c)
vec_sld(__vector unsigned __int128 __a, __vector unsigned __int128 __b, int __c)
vec_sldw(__vector signed __int128 __a, __vector signed __int128 __b, int __c)
vec_sldw(__vector unsigned __int128 __a, __vector unsigned __int128 __b,
vec_sldb(__vector signed __int128 __a, __vector signed __int128 __b, int __c)
vec_sldb(__vector unsigned __int128 __a, __vector unsigned __int128 __b,
vec_sral(__vector signed __int128 __a, __vector unsigned char __b)
vec_sral(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_srab(__vector signed __int128 __a, __vector unsigned char __b)
vec_srab(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_srl(__vector signed __int128 __a, __vector unsigned char __b)
vec_srl(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_srb(__vector signed __int128 __a, __vector unsigned char __b)
vec_srb(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_srdb(__vector signed __int128 __a, __vector signed __int128 __b, int __c)
vec_srdb(__vector unsigned __int128 __a, __vector unsigned __int128 __b, int __c)
vec_abs(__vector signed __int128 __a)
vec_max(__vector signed __int128 __a, __vector signed __int128 __b)
vec_max(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_min(__vector signed __int128 __a, __vector signed __int128 __b)
vec_min(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_test_mask(__vector signed __int128 __a, __vector unsigned __int128 __b)
vec_test_mask(__vector unsigned __int128 __a, __vector unsigned __int128 __b)

Some other intrinsics also now support the new types, but only when the vector-enhancements-3 feature is present:

vec_unpackh(__vector signed long long __a)
vec_unpackh(__vector __bool long long __a)
vec_unpackh(__vector unsigned long long __a)
vec_unpackl(__vector signed long long __a)
vec_unpackl(__vector __bool long long __a)
vec_unpackl(__vector unsigned long long __a)
vec_all_eq(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_eq(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_all_eq(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_all_ne(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_ne(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_all_ne(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_all_ge(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_ge(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_all_gt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_gt(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_all_le(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_le(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_all_lt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_all_lt(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_eq(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_eq(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_eq(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_any_ne(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_ne(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_ne(__vector __bool __int128 __a, __vector __bool __int128 __b)
vec_any_ge(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_ge(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_gt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_gt(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_le(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_le(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_any_lt(__vector signed __int128 __a, __vector signed __int128 __b)
vec_any_lt(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_cntlz(__vector signed __int128 __a)
vec_cntlz(__vector unsigned __int128 __a)
vec_cnttz(__vector signed __int128 __a)
vec_cnttz(__vector unsigned __int128 __a)
vec_avg(__vector signed __int128 __a, __vector signed __int128 __b)
vec_avg(__vector unsigned __int128 __a, __vector unsigned __int128 __b)

Finally, a number of existing intrinsics already operated on 128-bit integer types, but used vector unsigned char to represent those values in the absence of a better type. These have now all been deprecated:

vec_bperm_u128(__vector unsigned char __a, __vector unsigned char __b)
vec_add_u128(__vector unsigned char __a, __vector unsigned char __b)
vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b)
vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b, __vector unsigned char __c)
vec_addec_u128(__vector unsigned char __a, __vector unsigned char __b, __vector unsigned char __c)
vec_sub_u128(__vector unsigned char __a, __vector unsigned char __b)
vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b)
vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b, __vector unsigned char __c)
vec_subec_u128(__vector unsigned char __a, __vector unsigned char __b, __vector unsigned char __c)
vec_sum_u128(__vector unsigned int __a, __vector unsigned int __b)
vec_sum_u128(__vector unsigned long long __a, __vector unsigned long long __b)
vec_msum_u128(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned char __c, int __d)
vec_gfmsum_128(__vector unsigned long long __a, __vector unsigned long long __b)
vec_gfmsum_accum_128(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned char __c)

and replaced by the following new intrinsics:

vec_bperm(__vector unsigned __int128 __a, __vector unsigned char __b)
vec_adde(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_addec(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_sube(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_subec(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_sum(__vector unsigned int __a, __vector unsigned int __b)
vec_sum(__vector unsigned long long __a, __vector unsigned long long __b)
vec_msum(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned __int128 __c, int __d)

and new overloads of existing intrinsics:

vec_addc(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_subc(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_gfmsum(__vector unsigned long long __a, __vector unsigned long long __b)
vec_gfmsum_accum(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned __int128 __c)

Other new ISA capabilities

In addition to the above, the new ISA provides an extended set of new multiplication operations on 64-bit and 128-bit
integers, including 64->128 and 128->256 widening multiply. These have been added as new overloads to the following intrinsics (only available with vector-enhancements-3):

vec_mladd(__vector signed long long __a, __vector signed long long __b, __vector signed long long __c)
vec_mladd(__vector unsigned long long __a, __vector signed long long __b, __vector signed long long __c)
vec_mladd(__vector signed long long __a, __vector unsigned long long __b, __vector unsigned long long __c)
vec_mladd(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned long long __c)
vec_mladd(__vector signed __int128 __a, __vector signed __int128 __b, __vector signed __int128 __c)
vec_mladd(__vector unsigned __int128 __a, __vector signed __int128 __b, __vector signed __int128 __c)
vec_mladd(__vector signed __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_mladd(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_mhadd(__vector signed long long __a, __vector signed long long __b, __vector signed long long __c)
vec_mhadd(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned long long __c)
vec_mhadd(__vector signed __int128 __a, __vector signed __int128 __b, __vector signed __int128 __c)
vec_mhadd(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c)
vec_meadd(__vector signed long long __a, __vector signed long long __b, __vector signed __int128 __c)
vec_meadd(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned __int128 __c)
vec_moadd(__vector signed long long __a, __vector signed long long __b, __vector signed __int128 __c)
vec_moadd(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned __int128 __c)
vec_mulh(__vector signed long long __a, __vector signed long long __b)
vec_mulh(__vector unsigned long long __a, __vector unsigned long long __b)
vec_mulh(__vector signed __int128 __a, __vector signed __int128 __b)
vec_mulh(__vector unsigned __int128 __a, __vector unsigned __int128 __b)
vec_mule(__vector signed long long __a, __vector signed long long __b)
vec_mule(__vector unsigned long long __a, __vector unsigned long long __b)
vec_mulo(__vector signed long long __a, __vector signed long long __b)
vec_mulo(__vector unsigned long long __a, __vector unsigned long long __b)

Finally, there are a few completely new intrinsics to support new operations (with vector-enhancements-3):

vec_gen_element_masks_8(__vector unsigned short __mask)
vec_gen_element_masks_16(__vector unsigned char __mask)
vec_gen_element_masks_32(__vector unsigned char __mask)
vec_gen_element_masks_64(__vector unsigned char __mask)
vec_gen_element_masks_128(__vector unsigned char __mask)
vec_blend(__vector signed char __a, __vector signed char __b, __vector signed char __c)
vec_blend(__vector __bool char __a, __vector __bool char __b, __vector signed char __c)
vec_blend(__vector unsigned char __a, __vector unsigned char __b, __vector signed char __c)
vec_blend(__vector signed short __a, __vector signed short __b, __vector signed short __c)
vec_blend(__vector __bool short __a, __vector __bool short __b, __vector signed short __c)
vec_blend(__vector unsigned short __a, __vector unsigned short __b, __vector signed short __c)
vec_blend(__vector signed int __a, __vector signed int __b, __vector signed int __c)
vec_blend(__vector __bool int __a, __vector __bool int __b, __vector signed int __c)
vec_blend(__vector unsigned int __a, __vector unsigned int __b, __vector signed int __c)
vec_blend(__vector signed long long __a, __vector signed long long __b, __vector signed long long __c)
vec_blend(__vector __bool long long __a, __vector __bool long long __b, __vector signed long long __c)
vec_blend(__vector unsigned long long __a, __vector unsigned long long __b, __vector signed long long __c)
vec_blend(__vector signed __int128 __a, __vector signed __int128 __b, __vector signed __int128 __c)
vec_blend(__vector __bool __int128 __a, __vector __bool __int128 __b, __vector signed __int128 __c)
vec_blend(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector signed __int128 __c)
vec_blend(__vector float __a, __vector float __b, __vector signed int __c)
vec_blend(__vector double __a, __vector double __b, __vector signed long long __c)
vec_evaluate(__vector signed char __a, __vector signed char __b, __vector signed char __c, unsigned char __d)
vec_evaluate(__vector unsigned char __a, __vector unsigned char __b, __vector unsigned char __c, unsigned char __d)
vec_evaluate(__vector __bool char __a, __vector __bool char __b, __vector __bool char __c, unsigned char __d)
vec_evaluate(__vector signed short __a, __vector signed short __b, __vector signed short __c, unsigned char __d)
vec_evaluate(__vector unsigned short __a, __vector unsigned short __b, __vector unsigned short __c, unsigned char __d)
vec_evaluate(__vector __bool short __a, __vector __bool short __b, __vector __bool short __c, unsigned char __d)
vec_evaluate(__vector signed int __a, __vector signed int __b, __vector signed int __c, unsigned char __d)
vec_evaluate(__vector unsigned int __a, __vector unsigned int __b, __vector unsigned int __c, unsigned char __d)
vec_evaluate(__vector __bool int __a, __vector __bool int __b, __vector __bool int __c, unsigned char __d)
vec_evaluate(__vector signed long long __a, __vector signed long long __b, __vector signed long long __c, unsigned char __d)
vec_evaluate(__vector unsigned long long __a, __vector unsigned long long __b, __vector unsigned long long __c, unsigned char __d)
vec_evaluate(__vector __bool long long __a, __vector __bool long long __b, __vector __bool long long __c, unsigned char __d)
vec_evaluate(__vector signed __int128 __a, __vector signed __int128 __b, __vector signed __int128 __c, unsigned char __d)
vec_evaluate(__vector unsigned __int128 __a, __vector unsigned __int128 __b, __vector unsigned __int128 __c, unsigned char __d)
vec_evaluate(__vector __bool __int128 __a, __vector __bool __int128 __b, __vector __bool __int128 __c, unsigned char __d)

FYI @folkertdev @taiki-e @cuviper @fneddy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC O-SystemZ Target: SystemZ processors (s390x) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants