Skip to content

Commit e31bfc0

Browse files
authored
[AMDGPU] Strengthen preload intrinsics to noundef and nonnull (#92801)
The various preloaded registers (workitem IDs, workgroup IDs, and various implicit pointers) always have a finite, invariant, well-defined value throughout a well-defined program. In cases where the compiler infers or the user declares that some implicit input will not be used (ex. via amdgcn-no-workitem-id-y), the behavior of the entire program is undefined, since that misdeclaration can cause arbitrary other preloaded-register intrinsics to access the wrong register. This case is not expected to arise in practice, but could occur when the no implicit argument attributes were not cleared correctly in the presence of external functions, indrect calls, or other means of executing un-analyzable code. Failure to detect that case would be a bug in the attributor. This commit updates the documentation to reflect this long-standing reality. Then, on the basis that all implicit arguments are defined in all correct programs, the intrinsics that return those values are annototated with `noundef``. Some implicit pointer arguments gain a `nonnull`, but the kernel argument segment pointer or implicit argument pointers don't necessarily have this property. This will prevent spurious calls to `freeze` in front-end optimizations that destroy user-provided ranges on built-in IDs. (While I'm here, this commit adds a test for `noundef` on kernel arguments which is currently unimplemented)
1 parent a4bc44a commit e31bfc0

File tree

6 files changed

+119
-87
lines changed

6 files changed

+119
-87
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1395,8 +1395,11 @@ The AMDGPU backend supports the following LLVM IR attributes.
13951395

13961396
"amdgpu-no-workitem-id-x" Indicates the function does not depend on the value of the
13971397
llvm.amdgcn.workitem.id.x intrinsic. If a function is marked with this
1398-
attribute, or reached through a call site marked with this attribute,
1399-
the value returned by the intrinsic is undefined. The backend can
1398+
attribute, or reached through a call site marked with this attribute, and
1399+
that intrinsic is called, the behavior of the program is undefined. (Whole-program
1400+
undefined behavior is used here because, for example, the absence of a required workitem
1401+
ID in the preloaded register set can mean that all other preloaded registers
1402+
are earlier than the compilation assumed they would be.) The backend can
14001403
generally infer this during code generation, so typically there is no
14011404
benefit to frontends marking functions with this.
14021405

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,18 @@
1212

1313
def global_ptr_ty : LLVMQualPointerType<1>;
1414

15+
// The amdgpu-no-* attributes (ex amdgpu-no-workitem-id-z) typically inferred
16+
// by the backend cause whole-program undefined behavior when violated, such as
17+
// by causing all other preload register intrinsics to return arbitrarily incorrect
18+
// values. In non-entry-point functions, attempting to call a function that needs
19+
// some preloaded register from a function that is known to not need it is a violation
20+
// of the calling convention and also program-level UB. Outside of such IR-level UB,
21+
// these preloaded registers are always set to a well-defined value and are thus `noundef`.
1522
class AMDGPUReadPreloadRegisterIntrinsic
16-
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
23+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
1724

1825
class AMDGPUReadPreloadRegisterIntrinsicNamed<string name>
19-
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
26+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>, ClangBuiltin<name>;
2027

2128
// Used to tag image and resource intrinsics with information used to generate
2229
// mem operands.
@@ -56,7 +63,7 @@ def int_r600_group_barrier : ClangBuiltin<"__builtin_r600_group_barrier">,
5663
def int_r600_implicitarg_ptr :
5764
ClangBuiltin<"__builtin_r600_implicitarg_ptr">,
5865
DefaultAttrsIntrinsic<[LLVMQualPointerType<7>], [],
59-
[IntrNoMem, IntrSpeculatable]>;
66+
[NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
6067

6168
def int_r600_rat_store_typed :
6269
// 1st parameter: Data
@@ -144,39 +151,43 @@ defm int_amdgcn_workgroup_id : AMDGPUReadPreloadRegisterIntrinsic_xyz_named
144151

145152
def int_amdgcn_dispatch_ptr :
146153
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
147-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
154+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
148155

149156
def int_amdgcn_queue_ptr :
150157
ClangBuiltin<"__builtin_amdgcn_queue_ptr">,
151158
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
152-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
159+
[Align<RetIndex, 4>, NoUndef<RetIndex>, NonNull<RetIndex>, IntrNoMem, IntrSpeculatable]>;
153160

154161
def int_amdgcn_kernarg_segment_ptr :
155162
ClangBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,
156163
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
157-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
164+
[Align<RetIndex, 4>, NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
158165

159166
def int_amdgcn_implicitarg_ptr :
160167
ClangBuiltin<"__builtin_amdgcn_implicitarg_ptr">,
161168
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
162-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
169+
[Align<RetIndex, 4>, NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
163170

171+
// Returns the amount of LDS statically allocated for this program.
172+
// This is no longer guaranteed to be a compile-time constant due to linking
173+
// support.
164174
def int_amdgcn_groupstaticsize :
165175
ClangBuiltin<"__builtin_amdgcn_groupstaticsize">,
166-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
176+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
167177

168178
def int_amdgcn_dispatch_id :
169179
ClangBuiltin<"__builtin_amdgcn_dispatch_id">,
170-
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable]>;
180+
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
171181

172182
// For internal use. Coordinates LDS lowering between IR transform and backend.
173183
def int_amdgcn_lds_kernel_id :
174-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
184+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
175185

176186
def int_amdgcn_implicit_buffer_ptr :
177187
ClangBuiltin<"__builtin_amdgcn_implicit_buffer_ptr">,
178188
DefaultAttrsIntrinsic<[LLVMQualPointerType<4>], [],
179-
[Align<RetIndex, 4>, IntrNoMem, IntrSpeculatable]>;
189+
[Align<RetIndex, 4>, NoUndef<RetIndex>,
190+
IntrNoMem, IntrSpeculatable]>;
180191

181192
// Set EXEC to the 64-bit value given.
182193
// This is always moved to the beginning of the basic block.
@@ -199,7 +210,7 @@ def int_amdgcn_init_exec_from_input : Intrinsic<[],
199210

200211
def int_amdgcn_wavefrontsize :
201212
ClangBuiltin<"__builtin_amdgcn_wavefrontsize">,
202-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
213+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
203214

204215
// Represent a relocation constant.
205216
def int_amdgcn_reloc_constant : DefaultAttrsIntrinsic<
@@ -1923,8 +1934,8 @@ def int_amdgcn_s_setreg :
19231934
// s_getpc_b64 instruction returns a zero-extended value.
19241935
def int_amdgcn_s_getpc :
19251936
ClangBuiltin<"__builtin_amdgcn_s_getpc">,
1926-
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [IntrNoMem, IntrSpeculatable,
1927-
IntrWillReturn]>;
1937+
DefaultAttrsIntrinsic<[llvm_i64_ty], [], [NoUndef<RetIndex>, IntrNoMem,
1938+
IntrSpeculatable, IntrWillReturn]>;
19281939

19291940
// __builtin_amdgcn_interp_mov <param>, <attr_chan>, <attr>, <m0>
19301941
// param values: 0 = P10, 1 = P20, 2 = P0
@@ -2044,7 +2055,7 @@ def int_amdgcn_ps_live : DefaultAttrsIntrinsic <
20442055
// Query currently live lanes.
20452056
// Returns true if lane is live (and not a helper lane).
20462057
def int_amdgcn_live_mask : DefaultAttrsIntrinsic <[llvm_i1_ty],
2047-
[], [IntrReadMem, IntrInaccessibleMemOnly]
2058+
[], [NoUndef<RetIndex>, IntrReadMem, IntrInaccessibleMemOnly]
20482059
>;
20492060

20502061
def int_amdgcn_mbcnt_lo :
@@ -2517,7 +2528,7 @@ def int_amdgcn_mov_dpp8 :
25172528
def int_amdgcn_s_get_waveid_in_workgroup :
25182529
ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,
25192530
Intrinsic<[llvm_i32_ty], [],
2520-
[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
2531+
[NoUndef<RetIndex>, IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;
25212532

25222533
class AMDGPUAtomicRtn<LLVMType vt, LLVMType pt = llvm_anyptr_ty> : Intrinsic <
25232534
[vt],
@@ -2751,7 +2762,7 @@ def int_amdgcn_global_load_tr_b128 : AMDGPULoadIntrinsic<global_ptr_ty>;
27512762

27522763
// i32 @llvm.amdgcn.wave.id()
27532764
def int_amdgcn_wave_id :
2754-
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
2765+
DefaultAttrsIntrinsic<[llvm_i32_ty], [], [NoUndef<RetIndex>, IntrNoMem, IntrSpeculatable]>;
27552766

27562767
//===----------------------------------------------------------------------===//
27572768
// Deep learning intrinsics.

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
120120
CallInst *KernArgSegment =
121121
Builder.CreateIntrinsic(Intrinsic::amdgcn_kernarg_segment_ptr, {}, {},
122122
nullptr, F.getName() + ".kernarg.segment");
123-
124123
KernArgSegment->addRetAttr(Attribute::NonNull);
125124
KernArgSegment->addRetAttr(
126125
Attribute::getWithDereferenceableBytes(Ctx, TotalKernArgSize));

llvm/test/CodeGen/AMDGPU/lower-kernargs.ll

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
22
; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -o - -passes=amdgpu-lower-kernel-arguments %s | FileCheck -check-prefixes=GCN,HSA %s
33
; RUN: opt -mtriple=amdgcn-- -S -o - -passes=amdgpu-lower-kernel-arguments %s | FileCheck -check-prefixes=GCN,MESA %s
44

@@ -1041,14 +1041,14 @@ define amdgpu_kernel void @kern_global_ptr_dereferencable_or_null(ptr addrspace(
10411041
; HSA-LABEL: @kern_global_ptr_dereferencable_or_null(
10421042
; HSA-NEXT: [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
10431043
; HSA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT]], i64 0
1044-
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]], !dereferenceable_or_null !3
1044+
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]], !dereferenceable_or_null [[META3:![0-9]+]]
10451045
; HSA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) undef, align 8
10461046
; HSA-NEXT: ret void
10471047
;
10481048
; MESA-LABEL: @kern_global_ptr_dereferencable_or_null(
10491049
; MESA-NEXT: [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
10501050
; MESA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_GLOBAL_PTR_DEREFERENCABLE_OR_NULL_KERNARG_SEGMENT]], i64 36
1051-
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]], !dereferenceable_or_null !3
1051+
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]], !dereferenceable_or_null [[META3:![0-9]+]]
10521052
; MESA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) undef, align 8
10531053
; MESA-NEXT: ret void
10541054
;
@@ -1116,6 +1116,25 @@ define amdgpu_kernel void @kern_noalias_global_ptr_x2(ptr addrspace(1) noalias %
11161116
ret void
11171117
}
11181118

1119+
define amdgpu_kernel void @kern_noundef_global_ptr(ptr addrspace(1) noundef %ptr) #0 {
1120+
; HSA-LABEL: @kern_noundef_global_ptr(
1121+
; HSA-NEXT: [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
1122+
; HSA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT]], i64 0
1123+
; HSA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 16, !invariant.load [[META1]]
1124+
; HSA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) null, align 8
1125+
; HSA-NEXT: ret void
1126+
;
1127+
; MESA-LABEL: @kern_noundef_global_ptr(
1128+
; MESA-NEXT: [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT:%.*]] = call nonnull align 16 dereferenceable(264) ptr addrspace(4) @llvm.amdgcn.kernarg.segment.ptr()
1129+
; MESA-NEXT: [[PTR_KERNARG_OFFSET:%.*]] = getelementptr inbounds i8, ptr addrspace(4) [[KERN_NOUNDEF_GLOBAL_PTR_KERNARG_SEGMENT]], i64 36
1130+
; MESA-NEXT: [[PTR_LOAD:%.*]] = load ptr addrspace(1), ptr addrspace(4) [[PTR_KERNARG_OFFSET]], align 4, !invariant.load [[META1]]
1131+
; MESA-NEXT: store volatile ptr addrspace(1) [[PTR_LOAD]], ptr addrspace(1) null, align 8
1132+
; MESA-NEXT: ret void
1133+
;
1134+
store volatile ptr addrspace(1) %ptr, ptr addrspace(1) null
1135+
ret void
1136+
}
1137+
11191138
define amdgpu_kernel void @struct_i8_i8_arg({i8, i8} %in) #0 {
11201139
; HSA-LABEL: @struct_i8_i8_arg(
11211140
; HSA-NEXT: entry:
@@ -1711,12 +1730,12 @@ attributes #2 = { nounwind "target-cpu"="tahiti" }
17111730
; HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
17121731
; HSA: [[META1]] = !{}
17131732
; HSA: [[META2]] = !{i64 42}
1714-
; HSA: [[META3:![0-9]+]] = !{i64 128}
1733+
; HSA: [[META3]] = !{i64 128}
17151734
; HSA: [[META4]] = !{i64 1024}
17161735
;.
17171736
; MESA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
17181737
; MESA: [[META1]] = !{}
17191738
; MESA: [[META2]] = !{i64 42}
1720-
; MESA: [[META3:![0-9]+]] = !{i64 128}
1739+
; MESA: [[META3]] = !{i64 128}
17211740
; MESA: [[META4]] = !{i64 1024}
17221741
;.

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ attributes #0 = { noinline }
181181
; CHECK: declare void @llvm.donothing() #2
182182

183183
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
184-
; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #3
184+
; CHECK: declare noundef i32 @llvm.amdgcn.lds.kernel.id() #3
185185

186186
; CHECK: attributes #0 = { noinline }
187187
; CHECK: attributes #1 = { "amdgpu-lds-size"="4,4" }

0 commit comments

Comments
 (0)