forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
99
commits into
bump_to_977d744b
Choose a base branch
from
bump_to_2a8c12b2
base: bump_to_977d744b
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155, the asan annotations for `std::vector` were modified to unpoison freed backing memory on destruction, instead of leaving it poisoned. However, calling `__clear()` instead of `clear()` skips informing the asan runtime of this decrease in the accessible container size, which breaks the invariant that the value of `old_mid` should match the value of `new_mid` from the previous call to `__sanitizer_annotate_contiguous_container`, which can trip the sanity checks for the partial poison between [d1, d2) and the container redzone between [d2, c), if enabled. To fix this, ensure that `clear()` is called instead, as is already done by `__vdeallocate()`. Also remove `__clear()`, since it is no longer called.
The function getPartialReductionCost is already quite large and is likely to grow in size as we add support for more cases in future. Therefore, I think it's best to move this into the cpp file.
llvm::FixedPoint is not trivially copyable.
…lvm#123611) Summary: We used this globally scoped `ext_no_call_asm` as a sort of hack around the compiler that allowed the attributor to optimize out inline assembly calls to PTX instructions. Quite some time ago I got rid of every inline assembly call and replaced it with a builitin, so this can just be deleted. Furthermore, I use the `[[omp::assume]]` attribute directly for the aligned barrier usage. This prints an unknown assumption warning (even though it isn't) so I'm just silencing that for now until I fix it later. --------- Co-authored-by: Michael Kruse <[email protected]>
After a30e50f, AMDGPUAAResult is being called in more situations where BasicAA isn't sure. This exposed some regressions where NoAlias is being incorrectly returned for two identical pointers. The fix is to check the underlying objects for equality before returning NoAlias.
… option private-headers (llvm#121226) [llvm-objdump] Print out xcoff load section of xcoff object file with option private-headers
…#123076) 1. Remove `%c0 = arith.constant 0 : index` from testt functions. This extra Op is not needed (the index can be passed as an argument), so this is just noise. 2. Replaced `%cst_0` with `%pad` to communicate what the underlying SSA value is intended for. 3. Unified some comments.
Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.
) For the following variable DenseMap<const Instruction *, std::pair<PartialReductionChain, unsigned>> ScaledReductionExitInstrs; we never actually need the PartialReductionChain when using the map. I've cleaned this up so that this now becomes DenseMap<const Instruction *, unsigned> ScaledReductionMap;
The indices of SGPR register pairs need to be 2-aligned and SGPR quadruplets need to be 4-aligned. With this patch, we report an error when inline asm register constraints specify a misaligned register index, instead of silently dropping the specified index. Fixes llvm#123208 --------- Co-authored-by: Matt Arsenault <[email protected]>
…es (llvm#121079) This is a follow-up for llvm#119110 and a fix for llvm#118450 RemoveDeadValues used to delete Values and analyzing the IR at the same time, because of that, `isMemoryEffectFree` got invalid IR with half-deleted linalg.generic operation. This PR separates analysis and cleanup to prevent such situation. Thank you! --------- Co-authored-by: Renat Idrisov <[email protected]> Co-authored-by: Andrzej Warzyński <[email protected]>
…us. (llvm#122807) For similar reasons for fixed-width being prefered to scalable for Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to Neoverse V2.
…_mul(x, C0)) (llvm#123468) This PR introduces the following transformations: - If C0 is not 0: umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0) - If C0 is not 0 or 1: umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0) Fixes llvm#122388. Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U
…m#123617) In accordance with llvm#123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.
…auses (llvm#121356) No functional change. (Also, tried to filter out all ALLOCATOR modifiers, but that makes some other tests fail).
…tom flags (llvm#123577) The test was failing in the case where a `multilib.yaml` file was present in the installation. This is because the presence of a multilib YAML file leads to the diagnosing of validity of the multilib custom flags. This patch fixes the test by creating a new YAML file with multilib custom flags to be used by the test.
All targets build `__clc_mad` -- even SPIR-V targets -- since it compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to the bytecode generated for non-SPIR-V targets. The `mix` builtin, which is implemented as a wrapper around `mad`, is left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's worth having a specific CLC version of `mix`. The changes to the other CLC files/functions are moving uses of `mad` to `__clc_mad`, and reformatting. There is an additional instance of `trunc` becoming `__clc_trunc`, which was missed before.
…m#123629) These instructions return a 64-bit result and a 1-bit carry, unlike smul_lohi and umul_lohi which return a pair of 32-bit results. This does not appear to make any difference in practice because the DAG types are not used for anything before these nodes are converted to MachineInstrs.
…123619) But keep evaluating. This is what the current interpreter does as well.
Failed to notice them when landing that patch - apologies!
In gcc-15, explicit includes of `<cstdint>` are required when fixed-size integers are used. In this file, this include only happened as a side effect of including SmallVector.h Although llvm compiles fine, the root-project would benefit from explicitly including it here, so we can backport the patch. Maybe interesting for @hahnjo and @vgvassilev
Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.
ilogb libcall was not being constant folded correctly. This patch adds ilogb case in isMathLibCallNoop with correct error condition. Fixes llvm#101873
Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized before running runOnFunction. This change adds a new function getPreferredInstructionSet in SPIRVSubtarget.
…m#123477) Use `mlir_target_link_libraries()` to link dependencies of libraries that are not included in libMLIR, to ensure that they link to the dylib when they are used in Flang. Otherwise, they implicitly pull in all their static dependencies, effectively causing Flang binaries to simultaneously link to the dylib and to static libraries, which is never a good idea. I have only covered the libraries that are used by Flang. If you wish, I can extend this approach to all non-libMLIR libraries in MLIR, making MLIR itself also link to the dylib consistently.
…on (llvm#120771) Replace `static void exitWithError(Twine Message, std::string Whence = "", std::string Hint = "")` std::string with StringRef to remove constructing Strings on every call or passing by value Fixes: llvm#100065
Reland the patch after fixing the lit test.
Add some generated tests with every shuffle permutation for relevant vector element types and sizes. Not sure if this is going overboard with the number of tests. I pruned out the largest cases (16 and 32-bit cases are impractically large), and there's redundancy when testing the pointer cases (at least for SelectionDAG). This uses inline assembly to produce sample values because of how the ABI is lowered when using a function argument. Since we break all arguments into 32-bit pieces, a shuffle never ends up forming. We need separate handling to reconstruct shuffles in contexts involving physical registers in ABI contexts. I wrote a small tool to generate these, so I can easily change the exact test body. Not sure if it's worth posting anywhere. This is in preparation for making better use of v_pk_mov_b32, v_mov_b64 and s_mov_b64 in shuffles.
Primarily around uses of getSubReg/getSuperReg.
) Implementation details: The UNTIED clause is recognized by setting the flag=0 for the default case or performing logical OR to flag if other clauses are specified, and this flag is passed as an argument to the `__kmpc_omp_task_alloc` runtime call. Resubmitting the PR with fix for the failure, as it was reverted here: 927a70d and previously merged here: llvm#115283
These passed prechecks but failed after cc5eba1
Summary: We used to avoid a lot of this stuff because we didn't properly handle variadics in device code. That's been solved for now, so we can just make an internal printf handler that forwards to the external `vprintf` function. This is either provided by NVIDIA's SDK or by the GPU libc implementation. The main reason for doing this is because it prevents the stupid AMDGPU printf pass from mangling our beautiful printfs!
…122786) Summary: Right now we just default to device for each type, and mix an ad-hoc scope with the one used by the compiler's builtins. Unify this can make each version take the scope optionally. For @ronlieb, this will remove the need for `add_system` in the fork as well as the extra `cas` with system scope, just pass `system`.
It appears that omp_lib is not correctly (or maybe not at all?) found from the build directory. This made a few buildbots break after [PR#121356](llvm#121356) landed. This is a workaround to unblock the buildbots. https://lab.llvm.org/staging/#/builders/130/builds/12654 https://lab.llvm.org/buildbot/#/builders/140/builds/15102 https://lab.llvm.org/staging/#/builders/105/builds/13855
) In the current state of the code, the transform computes entries for the dependency matrix until `MaxMemInstrCount` which is 100. After 99th entry, it terminates and thus overall wastes compile-time. It would be nice if we can compute total number of entries upfront and early exit if the number of entries > 100. However, computing the number of entries is not always possible as it depends on two factors: 1. Number of load-store pairs in a loop. 2. Number of common loop levels for each of the pair. This patch constrains the whole computation on the number of loads and stores instructions in the loop. In another approach, I experimented with computing 1 and constraining the number of pairs, but that did not lead to any additional benefit in terms of compile time. However, when other issues are fixed, I can revisit this approach.
…CallOperatorInstantiationRAII (llvm#123687) Now that the RAII object has a dedicate logic for handling nested lambdas, where the inner lambda could reference any captures/variables/parameters from the outer lambda, we can shift the responsibility for managing lambdas away from SetupConstraintScope(). I think this also makes the structure clearer. Fixes llvm#123441
…form (llvm#123575) Enable ELFNixPlatform support for loongarch64. These are few simple changes, but it allows us to use the orc runtime in ELF/LoongArch64 backend. This change adds test cases targeting the LoongArch64 Linux platform to the ORC runtime integration test suite. Since jitlink for loongarch64 is ready for general use, and ELF-based platforms support defining multiple static initializer table sections with differing priorities, some relevant test cases in compiler-rt for ELFNixPlatform support can be enabled.
…3465) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.
…FFLE` (llvm#123555) This PR fixes operand order of `ILVOD.df` when lowering `VECTOR_SHUFFLE`, the result was `<y[1], x[1]>` while it should be `<x[1], y[1]>`. * This PR is split from llvm#123040.
…20912) On Windows, imported symbols must be searched with '__imp_' prefix. Support imported global variables and imported functions.
PR llvm#122344 adds intrinsics for Bulk Async Copy (non-tensor variants) using TMA. This patch adds the corresponding NVVM Dialect Ops. lit tests are added to verify the lowering to all variants of the intrinsics. Signed-off-by: Durgadoss R <[email protected]>
… (NFC) (llvm#123621) In accordance with llvm#123569
…123344) Commit 1eed469 added logic to reassociate a (add (add x y) -c) operand to a CSEL instruction with a comparison involving x and c (or a similar constant) in order to obtain a common (SUBS x c) instruction. This commit extends this logic to non-constants. In this way, we also reassociate a (sub (add x y) z) operand of a CSEL instruction to (add (sub x z) y) if the CSEL compares x and z, for example. Alive proof: https://alive2.llvm.org/ce/z/SEVpR
…vm#120363) This started out as trying to combine bf16 fpround to BFCVT2 instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in favour of generating fpround instructions directly. This simplifies the patterns and can lead to other optimizations. The BFCVT2 instruction is adjusted to makes sure the types are valid, and a bfcvt2 is now generated in more place. The old intrinsics are auto-upgraded to fptrunc instructions too.
This will be sent by Arm's Guarded Control Stack extension when an invalid return is executed. The signal does have an address we could show, but it's the PC at which the fault occured. The debugger has plenty of ways to show you that already, so I've left it out. ``` (lldb) c Process 460 resuming Process 460 stopped * thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault frame #0: 0x0000000000400784 test`main at main.c:57:1 54 afunc(); 55 printf("return from main\n"); 56 return 0; -> 57 } (lldb) dis <...> -> 0x400784 <+100>: ret ``` The new test case generates the signal by corrupting the link register then attempting to return. This will work whether we manually enable GCS or the C library does it for us. (in the former case you could just return from main and it would fault)
…rray is indexed by const evaluatable expressions (llvm#119340)"" (llvm#123713) This reverts commit 7dd34ba. Fixed the assertion violation reported by 7dd34ba Co-authored-by: MalavikaSamak <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.