Skip to content

[AArch64] FEAT_SPEv1p2 is optional in v8.7-A and v9.2-A #123336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 21, 2025

Conversation

ostannard
Copy link
Collaborator

The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef) was incorrectly marked as a required feature of Armv8.7-A (and later), which is incorrect because it is optional, and some CPUs do not implement it. This moves it to the default features list, so that it is still enabled by -march=armv8.7-a, but can be configured individually for each processor.

For Cortex-A520 and Cortex-A520AE, I've checked that these do not have any of the FEAT_SPE* features, so updated the tests accordingly. All other Arm-designed v8.7A+ and v9.2A+ CPUs should continue to have it enabled. For ampere1b, apple-m4 and fujitsu-monaka, I haven't found any reference for whether these CPUs should have this feature, so I've added it to their definitions to avoid this being a functional change for them.

@ptomsich, @kinoshita-fj, @jroelofs: can you confirm whether this is correct for your CPUs, or if I should drop the features from those CPUs?

The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef)
was incorrectly marked as a required feature of Armv8.7-A (and later),
which is incorrect because it is optional, and some CPUs do not
implement it. This moves it to the default features list, so that it is
still enabled by -march=armv8.7-a, but can be configured individually
for each processor.

For Cortex-A520 and Cortex-A520AE, I've checked that these do not have
any of the FEAT_SPE* features, so updated the tests accordingly. For
ampere1b, apple-m4 and fujitsu-monaka, I haven't found any reference for
whether these cores should have this feature, so I've added it to their
definitions to avoid this being a functional change for them.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' mc Machine (object) code labels Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-clang-driver

Author: Oliver Stannard (ostannard)

Changes

The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef) was incorrectly marked as a required feature of Armv8.7-A (and later), which is incorrect because it is optional, and some CPUs do not implement it. This moves it to the default features list, so that it is still enabled by -march=armv8.7-a, but can be configured individually for each processor.

For Cortex-A520 and Cortex-A520AE, I've checked that these do not have any of the FEAT_SPE* features, so updated the tests accordingly. All other Arm-designed v8.7A+ and v9.2A+ CPUs should continue to have it enabled. For ampere1b, apple-m4 and fujitsu-monaka, I haven't found any reference for whether these CPUs should have this feature, so I've added it to their definitions to avoid this being a functional change for them.

@ptomsich, @kinoshita-fj, @jroelofs: can you confirm whether this is correct for your CPUs, or if I should drop the features from those CPUs?


Full diff: https://github.com/llvm/llvm-project/pull/123336.diff

6 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/targetattr.c (+1-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c (-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c (-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+9-7)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+4-3)
  • (modified) llvm/test/MC/AArch64/spe.s (+1-1)
diff --git a/clang/test/CodeGen/AArch64/targetattr.c b/clang/test/CodeGen/AArch64/targetattr.c
index f8d5f9912c0d71..cfe115bf97ed33 100644
--- a/clang/test/CodeGen/AArch64/targetattr.c
+++ b/clang/test/CodeGen/AArch64/targetattr.c
@@ -218,7 +218,7 @@ void applem4() {}
 // CHECK: attributes #[[ATTR15]] = { noinline nounwind optnone "branch-target-enforcement" "guarded-control-stack" "no-trapping-math"="true" "sign-return-address"="non-leaf" "sign-return-address-key"="a_key" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #[[ATTR16]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // CHECK: attributes #[[ATTR17]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.3a" }
-// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
+// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+spe-eef,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
 //.
 // CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
index b906074ce76590..6ddd52a4a7089c 100644
--- a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
@@ -46,7 +46,6 @@
 // CHECK-NEXT:     FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
 // CHECK-NEXT:     FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
 // CHECK-NEXT:     FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
-// CHECK-NEXT:     FEAT_SPEv1p2                                           Enable extra register in the Statistical Profiling Extension
 // CHECK-NEXT:     FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
 // CHECK-NEXT:     FEAT_SVE                                               Enable Scalable Vector Extension (SVE) instructions
 // CHECK-NEXT:     FEAT_SVE2                                              Enable Scalable Vector Extension 2 (SVE2) instructions
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
index 2e147732d5c688..35399a3c85c626 100644
--- a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
@@ -46,7 +46,6 @@
 // CHECK-NEXT:     FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
 // CHECK-NEXT:     FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
 // CHECK-NEXT:     FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
-// CHECK-NEXT:     FEAT_SPEv1p2                                           Enable extra register in the Statistical Profiling Extension
 // CHECK-NEXT:     FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
 // CHECK-NEXT:     FEAT_SVE                                               Enable Scalable Vector Extension (SVE) instructions
 // CHECK-NEXT:     FEAT_SVE2                                              Enable Scalable Vector Extension 2 (SVE2) instructions
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index ffc2d27a57c93b..0a91edb4c1661b 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -859,8 +859,8 @@ def HasV8_6aOps : Architecture64<8, 6, "a", "v8.6a",
     FeatureEnhancedCounterVirtualization, FeatureMatMulInt8],
   !listconcat(HasV8_5aOps.DefaultExts, [FeatureBF16, FeatureMatMulInt8])>;
 def HasV8_7aOps : Architecture64<8, 7, "a", "v8.7a",
-  [HasV8_6aOps, FeatureXS, FeatureWFxT, FeatureHCX, FeatureSPE_EEF],
-  !listconcat(HasV8_6aOps.DefaultExts, [FeatureWFxT])>;
+  [HasV8_6aOps, FeatureXS, FeatureWFxT, FeatureHCX],
+  !listconcat(HasV8_6aOps.DefaultExts, [FeatureWFxT, FeatureSPE_EEF])>;
 def HasV8_8aOps : Architecture64<8, 8, "a", "v8.8a",
   [HasV8_7aOps, FeatureHBC, FeatureMOPS, FeatureNMI],
   !listconcat(HasV8_7aOps.DefaultExts, [FeatureMOPS, FeatureHBC])>;
@@ -875,17 +875,19 @@ def HasV9_0aOps : Architecture64<9, 0, "a", "v9a",
     FeatureSVE2])>;
 def HasV9_1aOps : Architecture64<9, 1, "a", "v9.1a",
   [HasV8_6aOps, HasV9_0aOps],
-  !listconcat(HasV9_0aOps.DefaultExts, [FeatureBF16, FeatureMatMulInt8, FeatureRME])>;
+  !listconcat(HasV9_0aOps.DefaultExts, HasV8_6aOps.DefaultExts,
+              [FeatureRME])>;
 def HasV9_2aOps : Architecture64<9, 2, "a", "v9.2a",
   [HasV8_7aOps, HasV9_1aOps],
-  !listconcat(HasV9_1aOps.DefaultExts, [FeatureMEC, FeatureWFxT])>;
+  !listconcat(HasV9_1aOps.DefaultExts, HasV8_7aOps.DefaultExts,
+              [FeatureMEC])>;
 def HasV9_3aOps : Architecture64<9, 3, "a", "v9.3a",
   [HasV8_8aOps, HasV9_2aOps],
-  !listconcat(HasV9_2aOps.DefaultExts, [FeatureMOPS, FeatureHBC])>;
+  !listconcat(HasV9_2aOps.DefaultExts, HasV8_8aOps.DefaultExts, [])>;
 def HasV9_4aOps : Architecture64<9, 4, "a", "v9.4a",
   [HasV8_9aOps, HasV9_3aOps],
-  !listconcat(HasV9_3aOps.DefaultExts, [FeatureSPECRES2, FeatureCSSC,
-    FeatureRASv2, FeatureSVE2p1])>;
+  !listconcat(HasV9_3aOps.DefaultExts, HasV8_9aOps.DefaultExts,
+              [FeatureSVE2p1])>;
 def HasV9_5aOps : Architecture64<9, 5, "a", "v9.5a",
   [HasV9_4aOps, FeatureCPA],
   !listconcat(HasV9_4aOps.DefaultExts, [FeatureCPA,  FeatureLUT, FeatureFAMINMAX])>;
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 364ab0d82bf888..68cb3b5d9da4c5 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -856,7 +856,7 @@ def ProcessorFeatures {
                                    FeatureSSBS, FeatureLS64, FeatureCLRBHB,
                                    FeatureSPECRES2, FeatureSVEAES, FeatureSVE2SM4,
                                    FeatureSVE2SHA3, FeatureSVE2, FeatureSVEBitPerm, FeatureETE,
-                                   FeatureMEC, FeatureFP8DOT2];
+                                   FeatureMEC, FeatureFP8DOT2, FeatureSPE_EEF];
   list<SubtargetFeature> Carmel   = [HasV8_2aOps, FeatureNEON, FeatureSHA2, FeatureAES,
                                      FeatureFullFP16, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM,
                                      FeatureFPARMv8];
@@ -923,7 +923,8 @@ def ProcessorFeatures {
                                     FeatureComplxNum, FeatureCRC, FeatureJS,
                                     FeatureLSE, FeaturePAuth, FeatureFPAC,
                                     FeatureRAS, FeatureRCPC, FeatureRDM,
-                                    FeatureDotProd, FeatureMatMulInt8];
+                                    FeatureDotProd, FeatureMatMulInt8,
+                                    FeatureSPE_EEF];
   list<SubtargetFeature> ExynosM3 = [HasV8_0aOps, FeatureCRC, FeatureSHA2, FeatureAES,
                                      FeaturePerfMon, FeatureNEON, FeatureFPARMv8];
   list<SubtargetFeature> ExynosM4 = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureDotProd,
@@ -1046,7 +1047,7 @@ def ProcessorFeatures {
                                      FeatureCRC, FeatureDotProd, FeatureFPARMv8, FeatureMatMulInt8,
                                      FeatureJS, FeatureLSE, FeaturePAuth, FeatureRAS, FeatureRCPC,
                                      FeatureCCIDX,
-                                     FeatureRDM];
+                                     FeatureRDM, FeatureSPE_EEF];
 
   list<SubtargetFeature> Oryon = [HasV8_6aOps, FeatureNEON, FeaturePerfMon,
                                      FeatureRandGen,
diff --git a/llvm/test/MC/AArch64/spe.s b/llvm/test/MC/AArch64/spe.s
index a4b2a555621fef..570ce6704502b6 100644
--- a/llvm/test/MC/AArch64/spe.s
+++ b/llvm/test/MC/AArch64/spe.s
@@ -1,5 +1,5 @@
 // RUN: llvm-mc -triple aarch64 -mattr +spe-eef -show-encoding %s 2>%t | FileCheck %s
-// RUN: llvm-mc -triple aarch64 -mattr +v8.7a -show-encoding %s 2>%t | FileCheck %s
+// RUN: not llvm-mc -triple aarch64 -mattr +v8.7a %s 2>&1 | FileCheck --check-prefix=CHECK-NO-SPE-EEF-ERR %s
 // RUN: not llvm-mc -triple aarch64 < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-SPE-EEF-ERR %s
 
 msr PMSNEVFR_EL1, x0

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-clang

Author: Oliver Stannard (ostannard)

Changes

The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef) was incorrectly marked as a required feature of Armv8.7-A (and later), which is incorrect because it is optional, and some CPUs do not implement it. This moves it to the default features list, so that it is still enabled by -march=armv8.7-a, but can be configured individually for each processor.

For Cortex-A520 and Cortex-A520AE, I've checked that these do not have any of the FEAT_SPE* features, so updated the tests accordingly. All other Arm-designed v8.7A+ and v9.2A+ CPUs should continue to have it enabled. For ampere1b, apple-m4 and fujitsu-monaka, I haven't found any reference for whether these CPUs should have this feature, so I've added it to their definitions to avoid this being a functional change for them.

@ptomsich, @kinoshita-fj, @jroelofs: can you confirm whether this is correct for your CPUs, or if I should drop the features from those CPUs?


Full diff: https://github.com/llvm/llvm-project/pull/123336.diff

6 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/targetattr.c (+1-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c (-1)
  • (modified) clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c (-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Features.td (+9-7)
  • (modified) llvm/lib/Target/AArch64/AArch64Processors.td (+4-3)
  • (modified) llvm/test/MC/AArch64/spe.s (+1-1)
diff --git a/clang/test/CodeGen/AArch64/targetattr.c b/clang/test/CodeGen/AArch64/targetattr.c
index f8d5f9912c0d71..cfe115bf97ed33 100644
--- a/clang/test/CodeGen/AArch64/targetattr.c
+++ b/clang/test/CodeGen/AArch64/targetattr.c
@@ -218,7 +218,7 @@ void applem4() {}
 // CHECK: attributes #[[ATTR15]] = { noinline nounwind optnone "branch-target-enforcement" "guarded-control-stack" "no-trapping-math"="true" "sign-return-address"="non-leaf" "sign-return-address-key"="a_key" "stack-protector-buffer-size"="8" "target-cpu"="neoverse-n1" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+spe,+ssbs,+sve,+sve2,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8a" "tune-cpu"="cortex-a710" }
 // CHECK: attributes #[[ATTR16]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 // CHECK: attributes #[[ATTR17]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="-v9.3a" }
-// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
+// CHECK: attributes #[[ATTR18]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="apple-m4" "target-features"="+aes,+bf16,+bti,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fpac,+fullfp16,+i8mm,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+sme,+sme-f64f64,+sme-i16i64,+sme2,+spe-eef,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8.6a,+v8.7a,+v8a,+wfxt" }
 //.
 // CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
 // CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
index b906074ce76590..6ddd52a4a7089c 100644
--- a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520.c
@@ -46,7 +46,6 @@
 // CHECK-NEXT:     FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
 // CHECK-NEXT:     FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
 // CHECK-NEXT:     FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
-// CHECK-NEXT:     FEAT_SPEv1p2                                           Enable extra register in the Statistical Profiling Extension
 // CHECK-NEXT:     FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
 // CHECK-NEXT:     FEAT_SVE                                               Enable Scalable Vector Extension (SVE) instructions
 // CHECK-NEXT:     FEAT_SVE2                                              Enable Scalable Vector Extension 2 (SVE2) instructions
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
index 2e147732d5c688..35399a3c85c626 100644
--- a/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-cortex-a520ae.c
@@ -46,7 +46,6 @@
 // CHECK-NEXT:     FEAT_SB                                                Enable Armv8.5-A Speculation Barrier
 // CHECK-NEXT:     FEAT_SEL2                                              Enable Armv8.4-A Secure Exception Level 2 extension
 // CHECK-NEXT:     FEAT_SPECRES                                           Enable Armv8.5-A execution and data prediction invalidation instructions
-// CHECK-NEXT:     FEAT_SPEv1p2                                           Enable extra register in the Statistical Profiling Extension
 // CHECK-NEXT:     FEAT_SSBS, FEAT_SSBS2                                  Enable Speculative Store Bypass Safe bit
 // CHECK-NEXT:     FEAT_SVE                                               Enable Scalable Vector Extension (SVE) instructions
 // CHECK-NEXT:     FEAT_SVE2                                              Enable Scalable Vector Extension 2 (SVE2) instructions
diff --git a/llvm/lib/Target/AArch64/AArch64Features.td b/llvm/lib/Target/AArch64/AArch64Features.td
index ffc2d27a57c93b..0a91edb4c1661b 100644
--- a/llvm/lib/Target/AArch64/AArch64Features.td
+++ b/llvm/lib/Target/AArch64/AArch64Features.td
@@ -859,8 +859,8 @@ def HasV8_6aOps : Architecture64<8, 6, "a", "v8.6a",
     FeatureEnhancedCounterVirtualization, FeatureMatMulInt8],
   !listconcat(HasV8_5aOps.DefaultExts, [FeatureBF16, FeatureMatMulInt8])>;
 def HasV8_7aOps : Architecture64<8, 7, "a", "v8.7a",
-  [HasV8_6aOps, FeatureXS, FeatureWFxT, FeatureHCX, FeatureSPE_EEF],
-  !listconcat(HasV8_6aOps.DefaultExts, [FeatureWFxT])>;
+  [HasV8_6aOps, FeatureXS, FeatureWFxT, FeatureHCX],
+  !listconcat(HasV8_6aOps.DefaultExts, [FeatureWFxT, FeatureSPE_EEF])>;
 def HasV8_8aOps : Architecture64<8, 8, "a", "v8.8a",
   [HasV8_7aOps, FeatureHBC, FeatureMOPS, FeatureNMI],
   !listconcat(HasV8_7aOps.DefaultExts, [FeatureMOPS, FeatureHBC])>;
@@ -875,17 +875,19 @@ def HasV9_0aOps : Architecture64<9, 0, "a", "v9a",
     FeatureSVE2])>;
 def HasV9_1aOps : Architecture64<9, 1, "a", "v9.1a",
   [HasV8_6aOps, HasV9_0aOps],
-  !listconcat(HasV9_0aOps.DefaultExts, [FeatureBF16, FeatureMatMulInt8, FeatureRME])>;
+  !listconcat(HasV9_0aOps.DefaultExts, HasV8_6aOps.DefaultExts,
+              [FeatureRME])>;
 def HasV9_2aOps : Architecture64<9, 2, "a", "v9.2a",
   [HasV8_7aOps, HasV9_1aOps],
-  !listconcat(HasV9_1aOps.DefaultExts, [FeatureMEC, FeatureWFxT])>;
+  !listconcat(HasV9_1aOps.DefaultExts, HasV8_7aOps.DefaultExts,
+              [FeatureMEC])>;
 def HasV9_3aOps : Architecture64<9, 3, "a", "v9.3a",
   [HasV8_8aOps, HasV9_2aOps],
-  !listconcat(HasV9_2aOps.DefaultExts, [FeatureMOPS, FeatureHBC])>;
+  !listconcat(HasV9_2aOps.DefaultExts, HasV8_8aOps.DefaultExts, [])>;
 def HasV9_4aOps : Architecture64<9, 4, "a", "v9.4a",
   [HasV8_9aOps, HasV9_3aOps],
-  !listconcat(HasV9_3aOps.DefaultExts, [FeatureSPECRES2, FeatureCSSC,
-    FeatureRASv2, FeatureSVE2p1])>;
+  !listconcat(HasV9_3aOps.DefaultExts, HasV8_9aOps.DefaultExts,
+              [FeatureSVE2p1])>;
 def HasV9_5aOps : Architecture64<9, 5, "a", "v9.5a",
   [HasV9_4aOps, FeatureCPA],
   !listconcat(HasV9_4aOps.DefaultExts, [FeatureCPA,  FeatureLUT, FeatureFAMINMAX])>;
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 364ab0d82bf888..68cb3b5d9da4c5 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -856,7 +856,7 @@ def ProcessorFeatures {
                                    FeatureSSBS, FeatureLS64, FeatureCLRBHB,
                                    FeatureSPECRES2, FeatureSVEAES, FeatureSVE2SM4,
                                    FeatureSVE2SHA3, FeatureSVE2, FeatureSVEBitPerm, FeatureETE,
-                                   FeatureMEC, FeatureFP8DOT2];
+                                   FeatureMEC, FeatureFP8DOT2, FeatureSPE_EEF];
   list<SubtargetFeature> Carmel   = [HasV8_2aOps, FeatureNEON, FeatureSHA2, FeatureAES,
                                      FeatureFullFP16, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM,
                                      FeatureFPARMv8];
@@ -923,7 +923,8 @@ def ProcessorFeatures {
                                     FeatureComplxNum, FeatureCRC, FeatureJS,
                                     FeatureLSE, FeaturePAuth, FeatureFPAC,
                                     FeatureRAS, FeatureRCPC, FeatureRDM,
-                                    FeatureDotProd, FeatureMatMulInt8];
+                                    FeatureDotProd, FeatureMatMulInt8,
+                                    FeatureSPE_EEF];
   list<SubtargetFeature> ExynosM3 = [HasV8_0aOps, FeatureCRC, FeatureSHA2, FeatureAES,
                                      FeaturePerfMon, FeatureNEON, FeatureFPARMv8];
   list<SubtargetFeature> ExynosM4 = [HasV8_2aOps, FeatureSHA2, FeatureAES, FeatureDotProd,
@@ -1046,7 +1047,7 @@ def ProcessorFeatures {
                                      FeatureCRC, FeatureDotProd, FeatureFPARMv8, FeatureMatMulInt8,
                                      FeatureJS, FeatureLSE, FeaturePAuth, FeatureRAS, FeatureRCPC,
                                      FeatureCCIDX,
-                                     FeatureRDM];
+                                     FeatureRDM, FeatureSPE_EEF];
 
   list<SubtargetFeature> Oryon = [HasV8_6aOps, FeatureNEON, FeaturePerfMon,
                                      FeatureRandGen,
diff --git a/llvm/test/MC/AArch64/spe.s b/llvm/test/MC/AArch64/spe.s
index a4b2a555621fef..570ce6704502b6 100644
--- a/llvm/test/MC/AArch64/spe.s
+++ b/llvm/test/MC/AArch64/spe.s
@@ -1,5 +1,5 @@
 // RUN: llvm-mc -triple aarch64 -mattr +spe-eef -show-encoding %s 2>%t | FileCheck %s
-// RUN: llvm-mc -triple aarch64 -mattr +v8.7a -show-encoding %s 2>%t | FileCheck %s
+// RUN: not llvm-mc -triple aarch64 -mattr +v8.7a %s 2>&1 | FileCheck --check-prefix=CHECK-NO-SPE-EEF-ERR %s
 // RUN: not llvm-mc -triple aarch64 < %s 2>&1 | FileCheck --check-prefix=CHECK-NO-SPE-EEF-ERR %s
 
 msr PMSNEVFR_EL1, x0

Copy link
Contributor

@tmatheson-arm tmatheson-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming no objections to CPU changes.

@davemgreen
Copy link
Collaborator

Is this a system-reg only extension?
It was enabled in #115296, which has an explanation why it was enabled.

I'm not sure how well we implement the sys-reg only extensions always being enabled idea, or if the best way to handle that is making them required features. But this seemed intentional.

@ostannard
Copy link
Collaborator Author

This is a system register only feature, but not one which we have enabled unconditionally. If we're going to turn it on only for architectures/CPUs which (can) have it, then I think it's better to be precise, which is what this patch does. Since it's only a system register we could go the other way and turn it on unconditionally, but as you say the current bevahiour seems to be intentional.

@davemgreen
Copy link
Collaborator

I agree the current behaviour isn't very consistent. It would be good to come up with a single rule and stick to it, whatever it is. FeatureAMVS / FEAT_AMUv1p1 came up recently too, which is enabled for some cpus that do not have it (like Neoverse V3).

It looks like GCC is the "precise" option for the -mcpu and enabled for 8.7+, which sounds OK to me if this now mirrors.

@ptomsich
Copy link
Contributor

FEAT_SPE is not supported on Ampere1B, so should be removed from the Ampere1B feature-set.

@kinoshita-fj
Copy link
Contributor

Because FUJITSU-MONAKA doesn't support FEAT_SPEv1p2, please remove it.

@ostannard ostannard merged commit 84fa175 into llvm:main Jan 21, 2025
8 checks passed
@jroelofs
Copy link
Contributor

Sorry I didn't have a chance to check on this last week. apple-m4 does not have FEAT_SPEv1p2. I'll put up a patch.

@jroelofs
Copy link
Contributor

#123827

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants