[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

jorickert · 2025-05-21T11:56:07Z

No description provided.

llvm#116833) SVE2.2 introduces instructions with predicated forms with zeroing of the inactive lanes. This allows in some cases to save a `movprfx` or a `mov` instruction when emitting code for `_x` or `_z` variants of intrinsics. This patch adds support for emitting the zeroing forms of certain `FLOGB` instructions.

…llvm#123918) When the Guarded Control Stack (GCS) is enabled, returns cause the processor to validate that the address at the location pointed to by gcspr_el0 matches the one in the link register. ``` ret (lr=A) << pc | GCS | +=====+ | A | | B | << gcspr_el0 Fault: tried to return to A when you should have returned to B. ``` Therefore when an expression wrapper function tries to return to the expression return address (usually `_start` if there is a libc), it would fault. ``` ret (lr=_start) << pc | GCS | +============+ | user_func1 | | user_func2 | << gcspr_el0 Fault: tried to return to _start when you should have returned to user_func2. ``` To fix this we must push that return address to the GCS in PrepareTrivialCall. This value is then consumed by the final return and the expression completes as expected. If for some reason that fails, we will manually restore the value of gcspr_el0, because it turns out that PrepareTrivialCall does not restore registers if it fails at all. So for now I am handling gcspr_el0 specifically, but I have filed llvm#124269 to address the general problem. (the other things PrepareTrivialCall does are exceedingly likely to not fail, so we have never noticed this) ``` ret (lr=_start) << pc | GCS | +============+ | user_func1 | | user_func2 | | _start | << gcspr_el0 No fault, we return to _start as normal. ``` The gcspr_el0 register will be restored after expression evaluation so that the program can continue correctly. However, due to restrictions in the Linux GCS ABI, we will not restore the enable bit of gcs_features_enabled. Re-enabling GCS via ptrace is not supported because it requires memory to be allocated by the kernel. We could disable GCS if the expression enabled GCS, however this would use up that state transition that the program might later rely on. And generally it is cleaner to ignore the enable bit, rather than one state transition of it. We will also not restore the GCS entry that was overwritten with the expression's return address. On the grounds that: * This entry will never be used by the program. If the program branches, the entry will be overwritten. If the program returns, gcspr_el0 will point to the entry before the expression return address and that entry will instead be validated. * Any expression that calls functions will overwrite even more entries, so the user needs to be aware of that anyway if they want to preserve the contents of the GCS for inspection. * An expression could leave the program in a state where restoring the value makes the situation worse. Especially if we ever support this in bare metal debugging. I will later document all this on https://lldb.llvm.org/use/aarch64-linux.html. Tests have been added for: * A function call that does not interact with GCS. * A call that does, and disables it (we do not re-enable it). * A call that does, and enables it (we do not disable it again). * Failure to push an entry to the GCS stack.

On the CPUs listed below, we want to avoid LDAPUR for performance reasons. Add a tuning feature to disable them when using: -mcpu=neoverse-v2 -mcpu=neoverse-v3 -mcpu=cortex-x3 -mcpu=cortex-x4 -mcpu=cortex-x925

Clang knows how to perform relational operations on OpenCL vectors, so we don't need to use the Clang builtins. The builtins we were using didn't support vector types, so we were previously scalarizing. This commit generates the same LLVM fcmp operations as before, just without the scalarization.

Got put in the wrong place during a rebase.

…llvm#122674) Extends rewriting of `loop` directives by supporting `bind` clause for standalone directives. This follows both the spec and the current state of clang as follows: * No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`. * `bind(parallel)`: the `loop` is rewritten to `do`. * `bind(teams)`: the `loop` is rewritten to `distribute`. This is a follow-up PR for llvm#122632, only the latest commit in this PR is relevant to the PR.

This should match x86 for the basic implementation, but its useful to check it actually runs correctly.

Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.

getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.

…nd above (llvm#117149) Since `__STDC_NO_THREADS__` is a reserved identifier, - If `MSVC version < 17.9` - C version < C11(201112L) - When `<threads.h>` is unavailable `!__has_include(<threads.h>)` is `__has_include` is defined. Closes llvm#115529

- The FP8 scalar type (`__mfp8`) was described as a vector type - The FP8 vector types were described/assumed to have integer element type (the element type ought to be `__mfp8`) - Add support for `m` type specifier (denoting `__mfp8`) in `DecodeTypeFromStr` and create builtin function prototypes using that specifier, instead of `int8_t`

…ries No need to check poison values if they have been vectorized and/or mark them as vectorized, it should work only for instructions.

These are similar to 347fb20, but these builtins are expressed in terms of other builtins. The LLVM IR generated features the same fcmp ord/uno comparisons as before, but consistently in vector form.

llvm#124404) We record whether an expression is immediate escalating in the FunctionScope. However, that only happen when parsing or transforming an expression. This might not happen when transforming a non dependent expression. This patch fixes that by considering a function immediate when instantiated from an immediate function. Fixes llvm#123405

…ext (llvm#124279) While sifting through this part of the code I noticed that when we parse C++ methods, `DWARFASTParserClang` creates two sets of `ParmVarDecls`, one in `ParseChildParameters` and once in `AddMethodToCXXRecordType`. The former is unused when we're dealing with methods. Moreover, the `ParmVarDecls` we created in `ParseChildParameters` were created with an incorrect `clang::DeclContext` (namely the DeclContext of the function, and not the function itself). In Clang, there's `ParmVarDecl::setOwningFunction` to adjust the DeclContext of a parameter if the parameter was created before the FunctionDecl. But we never used it. This patch removes the `ParmVarDecl` creation from `ParseChildParameters` and instead creates a `TypeSystemClang::CreateParameterDeclarations` that ensures we set the DeclContext correctly. Note there is one differences in how `ParmVarDecl`s would be created now: we won't set a ClangASTMetadata entry for any of the parameters. I don't think this was ever actually useful for parameter DIEs anyway. This wasn't causing any concrete issues (that I know of), but was quite surprising. And this way of setting the parameters seems easier to reason about (in my opinion).

…ter instead of copying it (llvm#124305) We used to copy the `SourceLocation` instead of importing it, which isn't correct since the `SourceManager`'s of the source and target ASTContext might differ. Also adds test that confirms that we import the explicit object parameter location for `ParmVarDecl`s. This is how Clang determines whether a parameter `isExplicitObjectParamater`. The LLDB expression evaluator relies on this for calling "explicit object member functions".

The index computation is meant to be signed. Using unsigned could lead to subtle errors. Fix places where some index math was using unsigned operations. Signed-off-by: MaheshRavishankar <[email protected]>

…pr (llvm#124533) We used to always transform the pattern declaration for SizeOfPackExpr to ensure the constraint expression's profile produced the desired result. However, this approach failed to handle pack expansions when the pack referred to function parameters. In such cases, the function parameters were formerly expanded to 1 to avoid building Subst* nodes (see e6974da). That workaround caused us to transform a pack without a proper ArgumentPackSubstitutionIndex, leading to crashes when transforming the pattern. It turns out that profiling the pattern for partially substituted SizeOfPackExprs is unnecessary because their transformed forms are also profiled within the partial arguments. Fixes llvm#124161

… for `tensor.expand_shape` op. (llvm#113501) The op carries the output-shape directly. This can be used directly. Also adds a method to get the shape as a `SmallVector<OpFoldResult>`. Signed-off-by: MaheshRavishankar <[email protected]>

A bulk commit of true16 support for v_cmpx_xx_f16 instructions including: v_cmpx_f_f16 v_cmpx_le_f16 v_cmpx_gt_f16 v_cmpx_lg_f16 v_cmpx_ge_f16 v_cmpx_o_f16 v_cmpx_u_f16 v_cmpx_nge_f16 v_cmpx_nlg_f16 v_cmpx_ngt_f16 v_cmpx_nle_f16 v_cmpx_neq_f16 v_cmpx_nlt_f16 v_cmpx_t_f16 v_cmpx_eq_f16 is not in this patch and will be added in the following patch

Patch created using the following command line: ```bash codespell polly --skip="*.pdf,polly/lib/External/*" --write-changes \ --ignore-words-list=couter,createor,distribues,doble,identty,indention,indx,olt,ore,padd,sais,te,theses ```

…es (llvm#124291) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.

If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes llvm#124526.

…ot needed" Relands llvm#124146 but without changes to the sorting algorithm and the following reverse.

Reverts llvm#124562

Reverts llvm#124170

…R` with series of `INSERT_VECTOR_ELT` (llvm#124420) If the operands to `INSERT_SUBVECTOR` can't be widened legally, just replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`. Closes llvm#124255 (and possibly llvm#102016)

Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.

…prototypes (llvm#123378) On lowering from `memref` to LLVM, `malloc` and other intrinsic functions from `libc` will be declared in the current module. User's redefinition of these reserved functions will poison the internal analysis with wrong prototype. This patch adds assertion on the found function's type and reports if it mismatch with the intended type. Related to llvm#120950 --------- Co-authored-by: Luohao Wang <[email protected]>

…alars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash

Fixes c5840cc. On platforms where UL is 32 bit, like Windows or 32 bit Linux, this shift was not correct, so we assumed GCS was not present. Use ULL instead, to match the other HWCAP constants.

…ge metadata type (llvm#121247) This is a fix for: llvm#97290 Please let me know if that is the right way to address the issue. Thank you! --------- Co-authored-by: Renat Idrisov <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>

…ses (llvm#111551) In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.

Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.

…PHI (llvm#124290) The RemoveDIs project [0] makes debug intrinsics obsolete and to support this instruction iterators carry an extra bit of debug information. To maintain debug information accuracy insertion needs to be performed with a BasicBlock::iterator rather than with Instruction pointers, otherwise the extra bit of debug information is lost. To that end, we're deprecating getFirstNonPHI and moveBefore for instruction pointers. They're replaced by getFirstNonPHIIt and an iterator-taking moveBefore: switching to the replacement is straightforwards, and 99% of call-sites need only to unwrap the iterator with &* or call getIterator() on an Instruction pointer. The exception is when inserting instructions at the start of a block: if you call getFirstNonPHI() (or begin() or getFirstInsertionPt()) and then insert something at that position, you must pass the BasicBlock::iterator returned into the insertion method. Unwrapping with &* and then calling getIterator strips the debug-info bit we wish to preserve. Please do contact us about any use case that's confusing or unclear [1]. [0] https://llvm.org/docs/RemoveDIsDebugInfo.html [1] https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578

…llvm#124575) This test: ```c++ extern Swim& trident; // expected-note {{declared here}} constexpr auto& gallagher = typeid(trident); // expected-error {{constexpr variable 'gallagher' must be initialized by a constant expression}} // expected-note@-1 {{initializer of 'trident' is not a constant expression}} ``` diagnosed the initializer of `trident` as not constant, but `trident` doesn't even have an initializer. Remove that diagnostic in this case.

…escacalating expressions (reland) (llvm#124708) HandleImmediateInvocation can call MarkExpressionAsImmediateEscalating and should always be called before CheckImmediateEscalatingFunctionDefinition. However, we were not doing that in `ActFunctionBody`. Fixes llvm#119046

Test coverage to llvm#124751.

Fixes failures for builds without AMDGPU enabled for test added in 11db7fb

…lvm#124594) If -funroll-loops tests are not restricted to specific targets the tests may behave differently based on the host platform. This patch restricts the tests to aarch64 and x86_64, and removes the PowerPC XFAIL.

clc_isnan.cl is needed since llvm#124097

Since 3494ee9 APInt stopped to implicitly truncate values, therefore it asserts on a big signed value converted to (implicitly) unsigned APInt. The change explicitly marks offset as a signed value.

… be CostKind compatible. NFC. (llvm#124753) No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.

…vm#120623) The DwarfDebug.cpp implementation expects the epilogue instructions to have source location of last non-debug instruction after which the epilogue instructions are inserted. The epilogue_begin is set on location of the first FrameDestroy instruction with source line information that has been seen in the epilogue basic block. In the trunk, the risc-v backend sets the epilogue_begin after the epilogue has actually begun i.e. after callee saved register reloads and the source line information is not set on those reload instructions. This is leading to llvm#120553 where, while debugging, breaking on or single stepping to the epilogue_begin location will make accessing the variables from wrong place as the FP has been restored to the parent frame's FP. To fix that, this patch sets FrameSetup/FrameDestroy flags on the callee saved register spill/reload instructions which is actually correct. Then the RISCVInstrInfo::loadRegFromStackSlot uses FrameDestroy flag to identify a reload of the callee saved register in the epilogue and copies the source line information from insert position instruction to that reload instruction. Requires PR llvm#120622 Fixes llvm#120553

…FC) (llvm#124755)

This is important during debugging to be able to dump a pass pipeline. It is also what is used by `--mlir-print-ir-tree-dir` to compute filenames during dumps.

gcc and clang won't complain about calls to deprecated functions, if you're calling from a function that is deprecated too. However, MSVC does care, and expands into maaany deprecation warnings for getFirstNonPHI. Suppress this by converting the inlineable copy of getFirstNonPHI into a non-inline copy.

…124605)" This reverts commit f949f87. This commit introduces an llvm_unreachable call that is actually reachable. I posted a reproducer on the pull request discussion.

Closes llvm#124635. Some parameter types in the definition of `posix_spawn` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man3/posix_spawn.3.html

The builtins we were using to implement __clc_is(finite|inf|nan|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors. There is no change in codegen to the scalar versions of any of these builtins.

Closes llvm#124633. Some parameter types in the definition of `{get, set}rlimit` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man2/getrlimit.2.html

devnexen and others added 30 commits January 27, 2025 12:52

[compiler-rt][rtsan] socketpair interception. (llvm#124107)

e21b804

[AArch64] Avoid generating LDAPUR on certain cores (llvm#124274)

ef54e0b

On the CPUs listed below, we want to avoid LDAPUR for performance reasons. Add a tuning feature to disable them when using: -mcpu=neoverse-v2 -mcpu=neoverse-v3 -mcpu=cortex-x3 -mcpu=cortex-x4 -mcpu=cortex-x925

[lldb][AArch64][NFC] Move a comment in GCS tests

e9e06be

Got put in the wrong place during a rebase.

[Offload][NFC] Make sure the thread is not running already

e7592d8

[X86] huge-stack-offset.ll - add gnux32 test coverage

86705eb

This should match x86 for the basic implementation, but its useful to check it actually runs correctly.

[flang] IEEE underflow control for Arm (llvm#124170)

3684ec4

Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.

[Offload] Fix server thread from being shut down if unused

f075058

[SLP][NFC]Do not check poison values for corresponding vectorized ent…

f1d5e70

…ries No need to check poison values if they have been vectorized and/or mark them as vectorized, it should work only for instructions.

[libclc] Optimize CLC vector is(un)ordered builtins (llvm#124546)

eaa5897

These are similar to 347fb20, but these builtins are expressed in terms of other builtins. The LLVM IR generated features the same fcmp ord/uno comparisons as before, but consistently in vector form.

Make index computation used divsi/remsi (llvm#124390)

1f5335c

The index computation is meant to be signed. Using unsigned could lead to subtle errors. Fix places where some index math was using unsigned operations. Signed-off-by: MaheshRavishankar <[email protected]>

[InstCombine] Handle constant expression result in tryFactorization()

212f344

If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes llvm#124526.

[GlobalMerge][NFC] Reland "Skip sorting by profitability when it is n…

5592875

…ot needed" Relands llvm#124146 but without changes to the sorting algorithm and the following reverse.

[flang] arm build fix (llvm#124562)

1eb4e9f

Revert "[flang] arm build fix" (llvm#124569)

20f72d1

Reverts llvm#124562

Revert "[flang] IEEE underflow control for Arm" (llvm#124570)

3322ba4

Reverts llvm#124170

abhishek-kaushik22 and others added 30 commits January 28, 2025 18:54

[SLP]Adjust NumberOfParts value for adjusted number of buildvector sc…

1d5fbe8

…alars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash

[lldb][AArch64] Fix GCS register field detection

0cf6714

Fixes c5840cc. On platforms where UL is 32 bit, like Windows or 32 bit Linux, this shift was not correct, so we assumed GCS was not present. Use ULL instead, to match the other HWCAP constants.

[gn] port b968fd9 (BuiltinsNVPTX.td)

37b595c

[SLP]Check the MainOp matches the requirements for the instructions

a1ab5b4

Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.

[gn] port 7e22180 (BuiltinsHexagon.td)

3a975d6

[libclc][NFC] Move key math headers to CLC (llvm#124739)

78b5bb7

[LoopUnroll] Add AArch64 tests for multi-exit loop unrolling.

3007f31

Test coverage to llvm#124751.

MachineVerifier: Move test into AMDGPU directory

ee1c6a6

Fixes failures for builds without AMDGPU enabled for test added in 11db7fb

libclc: clspv: add missing clc_isnan.cl dependency (llvm#124614)

9d8d538

clc_isnan.cl is needed since llvm#124097

[AMDGPU][GlobalISel] Fix assert on APInt creation. (llvm#124608)

68d90cf

Since 3494ee9 APInt stopped to implicitly truncate values, therefore it asserts on a big signed value converted to (implicitly) unsigned APInt. The change explicitly marks offset as a signed value.

[CostModel][X86] getShuffleCosts - convert all shuffle cost tables to…

7d172f9

… be CostKind compatible. NFC. (llvm#124753) No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.

[gn build] Port de4bbbf

2abde54

[IR][SPIR-V] Replace of PointerType::get(Type) with opaque version (N…

d459784

…FC) (llvm#124755)

[MLIR] Define getArgument() for Toy tutorial passes

75622e3

This is important during debugging to be able to dump a pass pipeline. It is also what is used by `--mlir-print-ir-tree-dir` to compute filenames during dumps.

Revert "[clang] improve print / dump of anonymous declarations (llvm#…

e38f4f6

…124605)" This reverts commit f949f87. This commit introduces an llvm_unreachable call that is actually reachable. I posted a reproducer on the pull request discussion.

[libc] Revise the definition of posix_spawn. (llvm#124686)

8ce0d05

Closes llvm#124635. Some parameter types in the definition of `posix_spawn` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man3/posix_spawn.3.html

[libc] Revise the definition of {get, set}rlimit. (llvm#124701)

5a8fe9e

Closes llvm#124633. Some parameter types in the definition of `{get, set}rlimit` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man2/getrlimit.2.html

[AutoBump] Merge with 5a8fe9e (Jan 28)

4ed6347

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

Uh oh!

jorickert commented May 21, 2025

Uh oh!

Uh oh!

[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

Are you sure you want to change the base?

[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

Uh oh!

Conversation

jorickert commented May 21, 2025

Uh oh!

Uh oh!