Skip to content

[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 431 commits into
base: bump_to_eb206e9e
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

devnexen and others added 30 commits January 27, 2025 12:52
llvm#116833)

SVE2.2 introduces instructions with predicated forms with zeroing of
the inactive lanes. This allows in some cases to save a `movprfx` or
a `mov` instruction when emitting code for `_x` or `_z` variants of
intrinsics.

This patch adds support for emitting the zeroing forms of certain
`FLOGB` instructions.
…llvm#123918)

When the Guarded Control Stack (GCS) is enabled, returns cause the
processor to validate that the address at the location pointed to by
gcspr_el0 matches the one in the link register.

```
ret (lr=A) << pc

| GCS |
+=====+
|  A  |
|  B  | << gcspr_el0

Fault: tried to return to A when you should have returned to B.
```

Therefore when an expression wrapper function tries to return to the
expression return address (usually `_start` if there is a libc), it
would fault.

```
ret (lr=_start) << pc

| GCS        |
+============+
| user_func1 |
| user_func2 | << gcspr_el0

Fault: tried to return to _start when you should have returned to user_func2.
```

To fix this we must push that return address to the GCS in
PrepareTrivialCall. This value is then consumed by the final return and
the expression completes as expected.

If for some reason that fails, we will manually restore the value of
gcspr_el0, because it turns out that PrepareTrivialCall
does not restore registers if it fails at all. So for now I am handling
gcspr_el0 specifically, but I have filed
llvm#124269 to address the
general problem.

(the other things PrepareTrivialCall does are exceedingly likely to not
fail, so we have never noticed this)

```
ret (lr=_start) << pc

| GCS        |
+============+
| user_func1 |
| user_func2 |
| _start     | << gcspr_el0

No fault, we return to _start as normal.
```

The gcspr_el0 register will be restored after expression evaluation so
that the program can continue correctly.

However, due to restrictions in the Linux GCS ABI, we will not restore
the enable bit of gcs_features_enabled. Re-enabling GCS via ptrace is
not supported because it requires memory to be allocated by the kernel.

We could disable GCS if the expression enabled GCS, however this would
use up that state transition that the program might later rely on. And
generally it is cleaner to ignore the enable bit, rather than one state
transition of it.

We will also not restore the GCS entry that was overwritten with the
expression's return address. On the grounds that:
* This entry will never be used by the program. If the program branches,
the entry will be overwritten. If the program returns, gcspr_el0 will
point to the entry before the expression return address and that entry
will instead be validated.
* Any expression that calls functions will overwrite even more entries,
so the user needs to be aware of that anyway if they want to preserve
the contents of the GCS for inspection.
* An expression could leave the program in a state where restoring the
value makes the situation worse. Especially if we ever support this in
bare metal debugging.

I will later document all this on
https://lldb.llvm.org/use/aarch64-linux.html.

Tests have been added for:
* A function call that does not interact with GCS.
* A call that does, and disables it (we do not re-enable it).
* A call that does, and enables it (we do not disable it again).
* Failure to push an entry to the GCS stack.
On the CPUs listed below, we want to avoid LDAPUR for performance
reasons. Add a tuning feature to disable them when using:
 -mcpu=neoverse-v2
 -mcpu=neoverse-v3
 -mcpu=cortex-x3
 -mcpu=cortex-x4
 -mcpu=cortex-x925
Clang knows how to perform relational operations on OpenCL vectors, so
we don't need to use the Clang builtins. The builtins we were using
didn't support vector types, so we were previously scalarizing.

This commit generates the same LLVM fcmp operations as before, just
without the scalarization.
Got put in the wrong place during a rebase.
…llvm#122674)

Extends rewriting of `loop` directives by supporting `bind` clause for
standalone directives. This follows both the spec and the current state
of clang as follows:
* No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`.
* `bind(parallel)`: the `loop` is rewritten to `do`.
* `bind(teams)`: the `loop` is rewritten to `distribute`.

This is a follow-up PR for
llvm#122632, only the latest commit
in this PR is relevant to the PR.
This should match x86 for the basic implementation, but its useful to check it actually runs correctly.
Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and
IEEE_SET_UNDERFLOW_MODE code for Arm.
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
…nd above (llvm#117149)

Since `__STDC_NO_THREADS__` is a reserved identifier,
- If `MSVC version < 17.9`
- C version < C11(201112L)
- When `<threads.h>` is unavailable `!__has_include(<threads.h>)` is
`__has_include` is defined.

Closes llvm#115529
- The FP8 scalar type (`__mfp8`) was described as a vector type
- The FP8 vector types were described/assumed to have integer element
type (the element type ought to be `__mfp8`)
- Add support for `m` type specifier (denoting `__mfp8`) in
`DecodeTypeFromStr` and create builtin function prototypes using that
specifier, instead of `int8_t`
…ries

No need to check poison values if they have been vectorized and/or mark
them as vectorized, it should work only for instructions.
These are similar to 347fb20, but these builtins are expressed in terms
of other builtins. The LLVM IR generated features the same fcmp ord/uno
comparisons as before, but consistently in vector form.
llvm#124404)

We record whether an expression is immediate escalating in the
FunctionScope.
However, that only happen when parsing or transforming an expression.
This might not happen when transforming a non dependent expression.

This patch fixes that by considering a function immediate when
instantiated from an immediate function.

Fixes llvm#123405
…ext (llvm#124279)

While sifting through this part of the code I noticed that when we parse
C++ methods, `DWARFASTParserClang` creates two sets of `ParmVarDecls`,
one in `ParseChildParameters` and once in `AddMethodToCXXRecordType`.
The former is unused when we're dealing with methods. Moreover, the
`ParmVarDecls` we created in `ParseChildParameters` were created with an
incorrect `clang::DeclContext` (namely the DeclContext of the function,
and not the function itself). In Clang, there's
`ParmVarDecl::setOwningFunction` to adjust the DeclContext of a
parameter if the parameter was created before the FunctionDecl. But we
never used it.

This patch removes the `ParmVarDecl` creation from
`ParseChildParameters` and instead creates a
`TypeSystemClang::CreateParameterDeclarations` that ensures we set the
DeclContext correctly.

Note there is one differences in how `ParmVarDecl`s would be created
now: we won't set a ClangASTMetadata entry for any of the parameters. I
don't think this was ever actually useful for parameter DIEs anyway.

This wasn't causing any concrete issues (that I know of), but was quite
surprising. And this way of setting the parameters seems easier to
reason about (in my opinion).
…ter instead of copying it (llvm#124305)

We used to copy the `SourceLocation` instead of importing it, which
isn't correct since the `SourceManager`'s of the source and target
ASTContext might differ.

Also adds test that confirms that we import the explicit object
parameter location for `ParmVarDecl`s. This is how Clang determines
whether a parameter `isExplicitObjectParamater`. The LLDB expression
evaluator relies on this for calling "explicit object member functions".
The index computation is meant to be signed. Using unsigned could lead
to subtle errors. Fix places where some index math was using unsigned
operations.

Signed-off-by: MaheshRavishankar <[email protected]>
…pr (llvm#124533)

We used to always transform the pattern declaration for SizeOfPackExpr
to ensure the constraint expression's profile produced the desired
result. However, this approach failed to handle pack expansions when the
pack referred to function parameters. In such cases, the function
parameters were formerly expanded to 1 to avoid building Subst* nodes
(see e6974da). That workaround caused us to transform a pack without a
proper ArgumentPackSubstitutionIndex, leading to crashes when
transforming the pattern.

It turns out that profiling the pattern for partially substituted
SizeOfPackExprs is unnecessary because their transformed forms are also
profiled within the partial arguments.

Fixes llvm#124161
… for `tensor.expand_shape` op. (llvm#113501)

The op carries the output-shape directly. This can be used directly.
Also adds a method to get the shape as a `SmallVector<OpFoldResult>`.

Signed-off-by: MaheshRavishankar <[email protected]>
A bulk commit of true16 support for v_cmpx_xx_f16 instructions
including:

v_cmpx_f_f16
v_cmpx_le_f16
v_cmpx_gt_f16
v_cmpx_lg_f16
v_cmpx_ge_f16
v_cmpx_o_f16
v_cmpx_u_f16
v_cmpx_nge_f16
v_cmpx_nlg_f16
v_cmpx_ngt_f16
v_cmpx_nle_f16
v_cmpx_neq_f16
v_cmpx_nlt_f16
v_cmpx_t_f16

v_cmpx_eq_f16 is not in this patch and will be added in the following
patch
Patch created using the following command line:
```bash
codespell polly --skip="*.pdf,polly/lib/External/*" --write-changes \
  --ignore-words-list=couter,createor,distribues,doble,identty,indention,indx,olt,ore,padd,sais,te,theses
```
…es (llvm#124291)

As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
If IRBuilder folds the result to a constant expression, don't try
to set nowrap flags on it.

Fixes llvm#124526.
…ot needed"

Relands llvm#124146 but without changes to the sorting algorithm and the following
reverse.
abhishek-kaushik22 and others added 30 commits January 28, 2025 18:54
…R` with series of `INSERT_VECTOR_ELT` (llvm#124420)

If the operands to `INSERT_SUBVECTOR` can't be widened legally, just
replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`.

Closes llvm#124255 (and possibly llvm#102016)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
…prototypes (llvm#123378)

On lowering from `memref` to LLVM, `malloc` and other intrinsic
functions from `libc` will be declared in the current module. User's
redefinition of these reserved functions will poison the internal
analysis with wrong prototype. This patch adds assertion on the found
function's type and reports if it mismatch with the intended type.

Related to llvm#120950


---------

Co-authored-by: Luohao Wang <[email protected]>
…alars

Need to adjust NumParts value, when GatheredScalars scalars are adjusted
after extractelements analysis, to fix compiler crash
Fixes c5840cc.

On platforms where UL is 32 bit, like Windows or 32 bit Linux,
this shift was not correct, so we assumed GCS was not present.

Use ULL instead, to match the other HWCAP constants.
…ge metadata type (llvm#121247)

This is a fix for:
llvm#97290
Please let me know if that is the right way to address the issue. Thank
you!

---------

Co-authored-by: Renat Idrisov <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
…ses (llvm#111551)

In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are
only used by fake uses, as well as the fake use in question. There are
two existing errors with the pass however: it incorrectly examines every
operand of each FAKE_USE, when only the first is relevant (extra
operands will just be "killed" regs assigned by a previous pass), and it
ignores cases where the FAKE_USE register is not an exact match for the
loaded register, which is incorrect as regalloc may choose to load a
wider value than the FAKE_USE required pre-regalloc. This patch fixes
both of these cases.
Need to include MainOp into the analysis of the instructions in
getSameOpcode to be sure that it is checked for the requirements to
prevent crashes during further analysis.
…PHI (llvm#124290)

The RemoveDIs project [0] makes debug intrinsics obsolete and to support
this instruction iterators carry an extra bit of debug information. To
maintain debug information accuracy insertion needs to be performed with a
BasicBlock::iterator rather than with Instruction pointers, otherwise the
extra bit of debug information is lost.

To that end, we're deprecating getFirstNonPHI and moveBefore for
instruction pointers. They're replaced by getFirstNonPHIIt and an
iterator-taking moveBefore: switching to the replacement is
straightforwards, and 99% of call-sites need only to unwrap the iterator
with &* or call getIterator() on an Instruction pointer.

The exception is when inserting instructions at the start of a block: if
you call getFirstNonPHI() (or begin() or getFirstInsertionPt()) and then
insert something at that position, you must pass the BasicBlock::iterator
returned into the insertion method. Unwrapping with &* and then calling
getIterator strips the debug-info bit we wish to preserve. Please do
contact us about any use case that's confusing or unclear [1].

[0] https://llvm.org/docs/RemoveDIsDebugInfo.html
[1] https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
…llvm#124575)

This test:
```c++
extern Swim& trident; // expected-note {{declared here}}

constexpr auto& gallagher = typeid(trident);    // expected-error {{constexpr variable 'gallagher' must be initialized by a constant expression}}
                                                // expected-note@-1 {{initializer of 'trident' is not a constant expression}}
```
diagnosed the initializer of `trident` as not constant, but `trident`
doesn't even have an initializer. Remove that diagnostic in this case.
…escacalating expressions (reland) (llvm#124708)

HandleImmediateInvocation can call MarkExpressionAsImmediateEscalating 
and should always be called before
CheckImmediateEscalatingFunctionDefinition.
    
However, we were not doing that in `ActFunctionBody`.

Fixes llvm#119046
Fixes failures for builds without AMDGPU enabled for test
added in 11db7fb
…lvm#124594)

If -funroll-loops tests are not restricted to specific targets the tests
may behave differently based on the host platform. This patch restricts
the tests to aarch64 and x86_64, and removes the PowerPC XFAIL.
Since 3494ee9 APInt stopped to
implicitly truncate values, therefore it asserts on a big signed value
converted to (implicitly) unsigned APInt.

The change explicitly marks offset as a signed value.
… be CostKind compatible. NFC. (llvm#124753)

No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.
…vm#120623)

The DwarfDebug.cpp implementation expects the epilogue instructions to
have source location of last non-debug instruction after which the epilogue
instructions are inserted. The epilogue_begin is set on location of the first
FrameDestroy instruction with source line information that has been seen in
the epilogue basic block.

In the trunk, the risc-v backend sets the epilogue_begin after the epilogue has
actually begun i.e. after callee saved register reloads and the source line
information is not set on those reload instructions. This is leading to llvm#120553
where, while debugging, breaking on or single stepping to the epilogue_begin
location will make accessing the variables from wrong place as the FP has been
restored to the parent frame's FP.

To fix that, this patch sets FrameSetup/FrameDestroy flags on the callee saved
register spill/reload instructions which is actually correct. Then the
RISCVInstrInfo::loadRegFromStackSlot uses FrameDestroy flag to identify a
reload of the callee saved register in the epilogue and copies the source
line information from insert position instruction to that reload instruction.

Requires PR llvm#120622

Fixes llvm#120553
This is important during debugging to be able to dump a pass pipeline.
It is also what is used by `--mlir-print-ir-tree-dir` to compute filenames
during dumps.
gcc and clang won't complain about calls to deprecated functions, if
you're calling from a function that is deprecated too. However, MSVC
does care, and expands into maaany deprecation warnings for
getFirstNonPHI.

Suppress this by converting the inlineable copy of getFirstNonPHI into a
non-inline copy.
…124605)"

This reverts commit f949f87.

This commit introduces an llvm_unreachable call that is actually
reachable. I posted a reproducer on the pull request discussion.
Closes llvm#124635.

Some parameter types in the definition of `posix_spawn` currently do not
match the standard. This patch resolves the issue.
ref: https://man7.org/linux/man-pages/man3/posix_spawn.3.html
The builtins we were using to implement __clc_is(finite|inf|nan|normal)
-- __builtin_isfinite, etc. -- don't take vector types so we were
previously scalarizing. The __builtin_isfpclass builtin does take vector
types and thus allows us to keep things in vectors.

There is no change in codegen to the scalar versions of any of these
builtins.
Closes llvm#124633.

Some parameter types in the definition of `{get, set}rlimit` currently
do not match the standard. This patch resolves the issue.
ref: https://man7.org/linux/man-pages/man2/getrlimit.2.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.