forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[AutoBump] Merge with 5a8fe9e9 (Jan 28) (22) #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
431
commits into
bump_to_eb206e9e
Choose a base branch
from
bump_to_5a8fe9e9
base: bump_to_eb206e9e
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
llvm#116833) SVE2.2 introduces instructions with predicated forms with zeroing of the inactive lanes. This allows in some cases to save a `movprfx` or a `mov` instruction when emitting code for `_x` or `_z` variants of intrinsics. This patch adds support for emitting the zeroing forms of certain `FLOGB` instructions.
…llvm#123918) When the Guarded Control Stack (GCS) is enabled, returns cause the processor to validate that the address at the location pointed to by gcspr_el0 matches the one in the link register. ``` ret (lr=A) << pc | GCS | +=====+ | A | | B | << gcspr_el0 Fault: tried to return to A when you should have returned to B. ``` Therefore when an expression wrapper function tries to return to the expression return address (usually `_start` if there is a libc), it would fault. ``` ret (lr=_start) << pc | GCS | +============+ | user_func1 | | user_func2 | << gcspr_el0 Fault: tried to return to _start when you should have returned to user_func2. ``` To fix this we must push that return address to the GCS in PrepareTrivialCall. This value is then consumed by the final return and the expression completes as expected. If for some reason that fails, we will manually restore the value of gcspr_el0, because it turns out that PrepareTrivialCall does not restore registers if it fails at all. So for now I am handling gcspr_el0 specifically, but I have filed llvm#124269 to address the general problem. (the other things PrepareTrivialCall does are exceedingly likely to not fail, so we have never noticed this) ``` ret (lr=_start) << pc | GCS | +============+ | user_func1 | | user_func2 | | _start | << gcspr_el0 No fault, we return to _start as normal. ``` The gcspr_el0 register will be restored after expression evaluation so that the program can continue correctly. However, due to restrictions in the Linux GCS ABI, we will not restore the enable bit of gcs_features_enabled. Re-enabling GCS via ptrace is not supported because it requires memory to be allocated by the kernel. We could disable GCS if the expression enabled GCS, however this would use up that state transition that the program might later rely on. And generally it is cleaner to ignore the enable bit, rather than one state transition of it. We will also not restore the GCS entry that was overwritten with the expression's return address. On the grounds that: * This entry will never be used by the program. If the program branches, the entry will be overwritten. If the program returns, gcspr_el0 will point to the entry before the expression return address and that entry will instead be validated. * Any expression that calls functions will overwrite even more entries, so the user needs to be aware of that anyway if they want to preserve the contents of the GCS for inspection. * An expression could leave the program in a state where restoring the value makes the situation worse. Especially if we ever support this in bare metal debugging. I will later document all this on https://lldb.llvm.org/use/aarch64-linux.html. Tests have been added for: * A function call that does not interact with GCS. * A call that does, and disables it (we do not re-enable it). * A call that does, and enables it (we do not disable it again). * Failure to push an entry to the GCS stack.
On the CPUs listed below, we want to avoid LDAPUR for performance reasons. Add a tuning feature to disable them when using: -mcpu=neoverse-v2 -mcpu=neoverse-v3 -mcpu=cortex-x3 -mcpu=cortex-x4 -mcpu=cortex-x925
Clang knows how to perform relational operations on OpenCL vectors, so we don't need to use the Clang builtins. The builtins we were using didn't support vector types, so we were previously scalarizing. This commit generates the same LLVM fcmp operations as before, just without the scalarization.
Got put in the wrong place during a rebase.
…llvm#122674) Extends rewriting of `loop` directives by supporting `bind` clause for standalone directives. This follows both the spec and the current state of clang as follows: * No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`. * `bind(parallel)`: the `loop` is rewritten to `do`. * `bind(teams)`: the `loop` is rewritten to `distribute`. This is a follow-up PR for llvm#122632, only the latest commit in this PR is relevant to the PR.
This should match x86 for the basic implementation, but its useful to check it actually runs correctly.
Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.
…nd above (llvm#117149) Since `__STDC_NO_THREADS__` is a reserved identifier, - If `MSVC version < 17.9` - C version < C11(201112L) - When `<threads.h>` is unavailable `!__has_include(<threads.h>)` is `__has_include` is defined. Closes llvm#115529
- The FP8 scalar type (`__mfp8`) was described as a vector type - The FP8 vector types were described/assumed to have integer element type (the element type ought to be `__mfp8`) - Add support for `m` type specifier (denoting `__mfp8`) in `DecodeTypeFromStr` and create builtin function prototypes using that specifier, instead of `int8_t`
…ries No need to check poison values if they have been vectorized and/or mark them as vectorized, it should work only for instructions.
These are similar to 347fb20, but these builtins are expressed in terms of other builtins. The LLVM IR generated features the same fcmp ord/uno comparisons as before, but consistently in vector form.
llvm#124404) We record whether an expression is immediate escalating in the FunctionScope. However, that only happen when parsing or transforming an expression. This might not happen when transforming a non dependent expression. This patch fixes that by considering a function immediate when instantiated from an immediate function. Fixes llvm#123405
…ext (llvm#124279) While sifting through this part of the code I noticed that when we parse C++ methods, `DWARFASTParserClang` creates two sets of `ParmVarDecls`, one in `ParseChildParameters` and once in `AddMethodToCXXRecordType`. The former is unused when we're dealing with methods. Moreover, the `ParmVarDecls` we created in `ParseChildParameters` were created with an incorrect `clang::DeclContext` (namely the DeclContext of the function, and not the function itself). In Clang, there's `ParmVarDecl::setOwningFunction` to adjust the DeclContext of a parameter if the parameter was created before the FunctionDecl. But we never used it. This patch removes the `ParmVarDecl` creation from `ParseChildParameters` and instead creates a `TypeSystemClang::CreateParameterDeclarations` that ensures we set the DeclContext correctly. Note there is one differences in how `ParmVarDecl`s would be created now: we won't set a ClangASTMetadata entry for any of the parameters. I don't think this was ever actually useful for parameter DIEs anyway. This wasn't causing any concrete issues (that I know of), but was quite surprising. And this way of setting the parameters seems easier to reason about (in my opinion).
…ter instead of copying it (llvm#124305) We used to copy the `SourceLocation` instead of importing it, which isn't correct since the `SourceManager`'s of the source and target ASTContext might differ. Also adds test that confirms that we import the explicit object parameter location for `ParmVarDecl`s. This is how Clang determines whether a parameter `isExplicitObjectParamater`. The LLDB expression evaluator relies on this for calling "explicit object member functions".
The index computation is meant to be signed. Using unsigned could lead to subtle errors. Fix places where some index math was using unsigned operations. Signed-off-by: MaheshRavishankar <[email protected]>
…pr (llvm#124533) We used to always transform the pattern declaration for SizeOfPackExpr to ensure the constraint expression's profile produced the desired result. However, this approach failed to handle pack expansions when the pack referred to function parameters. In such cases, the function parameters were formerly expanded to 1 to avoid building Subst* nodes (see e6974da). That workaround caused us to transform a pack without a proper ArgumentPackSubstitutionIndex, leading to crashes when transforming the pattern. It turns out that profiling the pattern for partially substituted SizeOfPackExprs is unnecessary because their transformed forms are also profiled within the partial arguments. Fixes llvm#124161
… for `tensor.expand_shape` op. (llvm#113501) The op carries the output-shape directly. This can be used directly. Also adds a method to get the shape as a `SmallVector<OpFoldResult>`. Signed-off-by: MaheshRavishankar <[email protected]>
A bulk commit of true16 support for v_cmpx_xx_f16 instructions including: v_cmpx_f_f16 v_cmpx_le_f16 v_cmpx_gt_f16 v_cmpx_lg_f16 v_cmpx_ge_f16 v_cmpx_o_f16 v_cmpx_u_f16 v_cmpx_nge_f16 v_cmpx_nlg_f16 v_cmpx_ngt_f16 v_cmpx_nle_f16 v_cmpx_neq_f16 v_cmpx_nlt_f16 v_cmpx_t_f16 v_cmpx_eq_f16 is not in this patch and will be added in the following patch
Patch created using the following command line: ```bash codespell polly --skip="*.pdf,polly/lib/External/*" --write-changes \ --ignore-words-list=couter,createor,distribues,doble,identty,indention,indx,olt,ore,padd,sais,te,theses ```
…es (llvm#124291) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.
If IRBuilder folds the result to a constant expression, don't try to set nowrap flags on it. Fixes llvm#124526.
…ot needed" Relands llvm#124146 but without changes to the sorting algorithm and the following reverse.
…R` with series of `INSERT_VECTOR_ELT` (llvm#124420) If the operands to `INSERT_SUBVECTOR` can't be widened legally, just replace the `INSERT_SUBVECTOR` with a series of `INSERT_VECTOR_ELT`. Closes llvm#124255 (and possibly llvm#102016)
Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.
…prototypes (llvm#123378) On lowering from `memref` to LLVM, `malloc` and other intrinsic functions from `libc` will be declared in the current module. User's redefinition of these reserved functions will poison the internal analysis with wrong prototype. This patch adds assertion on the found function's type and reports if it mismatch with the intended type. Related to llvm#120950 --------- Co-authored-by: Luohao Wang <[email protected]>
…alars Need to adjust NumParts value, when GatheredScalars scalars are adjusted after extractelements analysis, to fix compiler crash
Fixes c5840cc. On platforms where UL is 32 bit, like Windows or 32 bit Linux, this shift was not correct, so we assumed GCS was not present. Use ULL instead, to match the other HWCAP constants.
…ge metadata type (llvm#121247) This is a fix for: llvm#97290 Please let me know if that is the right way to address the issue. Thank you! --------- Co-authored-by: Renat Idrisov <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
…ses (llvm#111551) In the RemoveLoadsIntoFakeUses pass, we try to remove loads that are only used by fake uses, as well as the fake use in question. There are two existing errors with the pass however: it incorrectly examines every operand of each FAKE_USE, when only the first is relevant (extra operands will just be "killed" regs assigned by a previous pass), and it ignores cases where the FAKE_USE register is not an exact match for the loaded register, which is incorrect as regalloc may choose to load a wider value than the FAKE_USE required pre-regalloc. This patch fixes both of these cases.
Need to include MainOp into the analysis of the instructions in getSameOpcode to be sure that it is checked for the requirements to prevent crashes during further analysis.
…PHI (llvm#124290) The RemoveDIs project [0] makes debug intrinsics obsolete and to support this instruction iterators carry an extra bit of debug information. To maintain debug information accuracy insertion needs to be performed with a BasicBlock::iterator rather than with Instruction pointers, otherwise the extra bit of debug information is lost. To that end, we're deprecating getFirstNonPHI and moveBefore for instruction pointers. They're replaced by getFirstNonPHIIt and an iterator-taking moveBefore: switching to the replacement is straightforwards, and 99% of call-sites need only to unwrap the iterator with &* or call getIterator() on an Instruction pointer. The exception is when inserting instructions at the start of a block: if you call getFirstNonPHI() (or begin() or getFirstInsertionPt()) and then insert something at that position, you must pass the BasicBlock::iterator returned into the insertion method. Unwrapping with &* and then calling getIterator strips the debug-info bit we wish to preserve. Please do contact us about any use case that's confusing or unclear [1]. [0] https://llvm.org/docs/RemoveDIsDebugInfo.html [1] https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
…llvm#124575) This test: ```c++ extern Swim& trident; // expected-note {{declared here}} constexpr auto& gallagher = typeid(trident); // expected-error {{constexpr variable 'gallagher' must be initialized by a constant expression}} // expected-note@-1 {{initializer of 'trident' is not a constant expression}} ``` diagnosed the initializer of `trident` as not constant, but `trident` doesn't even have an initializer. Remove that diagnostic in this case.
…escacalating expressions (reland) (llvm#124708) HandleImmediateInvocation can call MarkExpressionAsImmediateEscalating and should always be called before CheckImmediateEscalatingFunctionDefinition. However, we were not doing that in `ActFunctionBody`. Fixes llvm#119046
Test coverage to llvm#124751.
Fixes failures for builds without AMDGPU enabled for test added in 11db7fb
…lvm#124594) If -funroll-loops tests are not restricted to specific targets the tests may behave differently based on the host platform. This patch restricts the tests to aarch64 and x86_64, and removes the PowerPC XFAIL.
clc_isnan.cl is needed since llvm#124097
Since 3494ee9 APInt stopped to implicitly truncate values, therefore it asserts on a big signed value converted to (implicitly) unsigned APInt. The change explicitly marks offset as a signed value.
… be CostKind compatible. NFC. (llvm#124753) No change in actual costs yet, but split the costs per cost kind to make it easier to tweak the numbers in future patches.
…vm#120623) The DwarfDebug.cpp implementation expects the epilogue instructions to have source location of last non-debug instruction after which the epilogue instructions are inserted. The epilogue_begin is set on location of the first FrameDestroy instruction with source line information that has been seen in the epilogue basic block. In the trunk, the risc-v backend sets the epilogue_begin after the epilogue has actually begun i.e. after callee saved register reloads and the source line information is not set on those reload instructions. This is leading to llvm#120553 where, while debugging, breaking on or single stepping to the epilogue_begin location will make accessing the variables from wrong place as the FP has been restored to the parent frame's FP. To fix that, this patch sets FrameSetup/FrameDestroy flags on the callee saved register spill/reload instructions which is actually correct. Then the RISCVInstrInfo::loadRegFromStackSlot uses FrameDestroy flag to identify a reload of the callee saved register in the epilogue and copies the source line information from insert position instruction to that reload instruction. Requires PR llvm#120622 Fixes llvm#120553
This is important during debugging to be able to dump a pass pipeline. It is also what is used by `--mlir-print-ir-tree-dir` to compute filenames during dumps.
gcc and clang won't complain about calls to deprecated functions, if you're calling from a function that is deprecated too. However, MSVC does care, and expands into maaany deprecation warnings for getFirstNonPHI. Suppress this by converting the inlineable copy of getFirstNonPHI into a non-inline copy.
Closes llvm#124635. Some parameter types in the definition of `posix_spawn` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man3/posix_spawn.3.html
The builtins we were using to implement __clc_is(finite|inf|nan|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors. There is no change in codegen to the scalar versions of any of these builtins.
Closes llvm#124633. Some parameter types in the definition of `{get, set}rlimit` currently do not match the standard. This patch resolves the issue. ref: https://man7.org/linux/man-pages/man2/getrlimit.2.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.